• Title/Summary/Keyword: matrix vector multiplication

Search Result 34, Processing Time 0.028 seconds

A Study on the Convergence Characteristics Improvement of the Modified-Multiplication Free Adaptive Filer (변형 비적 적응 필터의 수렴 특성 개선에 관한 연구)

  • 김건호;윤달환;임제탁
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.6
    • /
    • pp.815-823
    • /
    • 1993
  • In this paper, the structure of modified multiplication-free adaptive filter(M-MADF) and convergence analysis are presented. To evaluate the performance of proposed M-MADF algorithm, fractionally spaced equalizer (FSE) is used. The input signals are quantized using DPCM and the reference signals is processed using a first-order linear prediction filter, and the outputs are processed by a conventional adaptive filter. The filter coefficients are updated using the Sign algorithm. Under the assumption that the primary and reference signals are zero mean, wide-sense stationary and Gaussian, theoretical results for the coefficient misalignment vector and its autocorrelation matrix of the filter are driven. The convergence properties of Sign. MADF and M-MADF algorithm for updating of the coefficients of a digital filter of the fractionally spaced equalizer (FSE) are investigated and compared with one another. The convergence properties are characterized by the steady state error and the convergence speed. It is shown that the convergence speed of M-MADF is almost same as Sign algorithm and is faster that MADF in the condition of same steady error. Especially it is very useful for high correlated signals.

  • PDF

The Limit Distribution of an Invariant Test Statistic for Multivariate Normality

  • Kim Namhyun
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.71-86
    • /
    • 2005
  • Testing for normality has always been an important part of statistical methodology. In this paper a test statistic for multivariate normality is proposed. The underlying idea is to investigate all the possible linear combinations that reduce to the standard normal distribution under the null hypothesis and compare the order statistics of them with the theoretical normal quantiles. The suggested statistic is invariant with respect to nonsingular matrix multiplication and vector addition. We show that the limit distribution of an approximation to the suggested statistic is representable as the supremum over an index set of the integral of a suitable Gaussian process.

The Limit Distribution and Power of a Test for Bivariate Normality

  • Kim, Namhyun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.187-196
    • /
    • 2002
  • Testing for normality has always been a center of practical and theoretical interest in statistical research. In this paper a test statistic for bivariate normality is proposed. The underlying idea is to investigate all the possible linear combinations that reduce to the standard normal distribution under the null hypothesis and compare the order statistics of them with the theoretical normal quantiles. The suggested statistic is invariant with respect to nonsingular matrix multiplication and vector addition. We show that the limit distribution of an approximation to the suggested statistic is represented as the supremum over an index set of the integral of a suitable Gaussian Process. We also simulate the null distribution of the statistic and give some critical values of the distribution and power results.

Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor (인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서)

  • Yang, J.M.;Kwon, Y.S.;Kang, S.W.
    • Electronics and Telecommunications Trends
    • /
    • v.32 no.6
    • /
    • pp.57-65
    • /
    • 2017
  • Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.

CSR Sparse Matrix Vector Multiplication Using Zero Copy (Zero Copy를 이용한 CSR 희소행렬 연산)

  • Yoon, SangHyeuk;Jeon, Dayun;Park, Neungsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.45-47
    • /
    • 2021
  • APU(Accelerated Processing Unit)는 CPU와 GPU가 통합되어있는 프로세서이며 같은 메모리 공간을 사용한다. CPU와 GPU가 분리되어있는 기존 이종 컴퓨팅 환경에서는 GPU가 작업을 처리하기 위해 CPU에서 GPU로 메모리 복사가 이루어졌지만, APU는 같은 메모리 공간을 사용하므로 메모리 복사 없이 가상주소 할당으로 같은 물리 주소에 접근할 수 있으며 이를 Zero Copy라 한다. Zero Copy 성능을 테스트하기 위해 희소행렬 연산을 사용하였으며 기존 메모리 복사대비 크기가 큰 데이터는 약 4.67배, 크기가 작은 데이터는 약 6.27배 빨랐다.

A Study on Image Data Compression by using Hadamard Transform (Hadamard변환을 이용한 영상신호의 전송량 압축에 관한 연구)

  • 박주용;이문호;김동용;이광재
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.11 no.4
    • /
    • pp.251-258
    • /
    • 1986
  • There is much redundancy in image data such as TV signals and many techniques to redice it have been studied. In this paper, Hadamard transform is studied through computer simulation and experimental model. Each element of hadamard matrix is either +1 or -1, and the row vectors are orthogonal to another. Its hardware implementation is the simplest of the usual orthogonal transforms because addition and sulbraction are necessary to calculate transformed signals, while not only addition but multiplication are necessary in digital Fourier transform, etc. Linclon data (64$ imes$64) are simulated using 8th-order and 16th-order Hadamard transform, and 8th-order is implemented to hardware. Theoretical calculation and experimental result of 8th-order show that 2.0 bits/sample are required for good quality.

  • PDF

An Efficient Matrix-Vector Product Algorithm for the Analysis of General Interconnect Structures (일반적인 연결선 구조의 해석을 위한 효율적인 행렬-벡터 곱 알고리즘)

  • Jung, Seung-Ho;Baek, Jong-Humn;Kim, Joon-Hee;Kim, Seok-Yoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.12
    • /
    • pp.56-65
    • /
    • 2001
  • This paper proposes an algorithm for the capacitance extraction of general 3-dimensional conductors in an ideal uniform dielectric that uses a high-order quadrature approximation method combined with the typical first-order collocation method to enhance the accuracy and adopts an efficient matrix-vector product algorithm for the model-order reduction to achieve efficiency. The proposed method enhances the accuracy using the quadrature method for interconnects containing corners and vias that concentrate the charge density. It also achieves the efficiency by reducing the model order using the fact that large parts of system matrices are of numerically low rank. This technique combines an SVD-based algorithm for the compression of rank-deficient matrices and Gram-Schmidt algorithm of a Krylov-subspace iterative technique for the rapid multiplication of matrices. It is shown through the performance evaluation procedure that the combination of these two techniques leads to a more efficient algorithm than Gaussian elimination or other standard iterative schemes within a given error tolerance.

  • PDF

Random Partial Haar Wavelet Transformation for Single Instruction Multiple Threads (단일 명령 다중 스레드 병렬 플랫폼을 위한 무작위 부분적 Haar 웨이블릿 변환)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.805-813
    • /
    • 2015
  • Many researchers expect the compressive sensing and sparse recovery problem can overcome the limitation of conventional digital techniques. However, these new approaches require to solve the l1 norm optimization problems when it comes to signal reconstruction. In the signal reconstruction process, the transform computation by multiplication of a random matrix and a vector consumes considerable computing power. To address this issue, parallel processing is applied to the optimization problems. In particular, due to huge size of original signal, it is hard to store the random matrix directly in memory, which makes one need to design a procedural approach in handling the random matrix. This paper presents a new parallel algorithm to calculate random partial Haar wavelet transform based on Single Instruction Multiple Threads (SIMT) platform.

New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing

  • Truong, Son Ngoc;Min, Kyeong-Sik
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.3
    • /
    • pp.356-363
    • /
    • 2014
  • In this paper, we propose a new memristor-based crossbar array architecture, where a single memristor array and constant-term circuit are used to represent both plus-polarity and minus-polarity matrices. This is different from the previous crossbar array architecture which has two memristor arrays to represent plus-polarity and minus-polarity connection matrices, respectively. The proposed crossbar architecture is tested and verified to have the same performance with the previous crossbar architecture for applications of character recognition. For areal density, however, the proposed crossbar architecture is twice better than the previous architecture, because only single memristor array is used instead of two crossbar arrays. Moreover, the power consumption of the proposed architecture can be smaller by 48% than the previous one because the number of memristors in the proposed crossbar architecture is reduced to half compared to the previous crossbar architecture. From the high areal density and high energy efficiency, we can know that this newly proposed crossbar array architecture is very suitable to various applications of analog neuromorphic computing that demand high areal density and low energy consumption.

High-throughput and low-area implementation of orthogonal matching pursuit algorithm for compressive sensing reconstruction

  • Nguyen, Vu Quan;Son, Woo Hyun;Parfieniuk, Marek;Trung, Luong Tran Nhat;Park, Sang Yoon
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.376-387
    • /
    • 2020
  • Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real-time application. In this paper, we propose a novel high-speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least-squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed-arithmetic-based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi-stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal.