• Title/Summary/Keyword: Matrix Multiplication

Search Result 167, Processing Time 0.027 seconds

A Covariance Type ARMA Fast Transversal Filter (공분산형 ARMA 고속 Transversal 필터에 관한 연구)

  • Lee, Chul-Heui;Jang, Young-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.67-79
    • /
    • 1992
  • For effective on-line ARMA parameter estimation, a covariance type ARMA fast transversal filter (FTF) algorithm is presented. The proposed algorithm is a covariance type implementation of ELS(Extended Least Squares) estimator and it is a fast time update recursion which is based on the fact that the correlation matrix of ARMA model satisfies the shift invariance property in each sub-block. The geometric approach is used in the derivation of the proposed algorithm. It takes small computational burden of 13N+37 MADPR(Multiplication And Division Per Recursion). Also, AR and MA orders can be independetly and arbitrarily specified.

  • PDF

Optimization of ARIA Block-Cipher Algorithm for Embedded Systems with 16-bits Processors

  • Lee, Wan Yeon;Choi, Yun-Seok
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.42-52
    • /
    • 2016
  • In this paper, we propose the 16-bits optimization design of the ARIA block-cipher algorithm for embedded systems with 16-bits processors. The proposed design adopts 16-bits XOR operations and rotated shift operations as many as possible. Also, the proposed design extends 8-bits array variables into 16-bits array variables for faster chained matrix multiplication. In evaluation experiments, our design is compared to the previous 32-bits optimized design and 8-bits optimized design. Our 16-bits optimized design yields about 20% faster execution speed and about 28% smaller footprint than 32-bits optimized code. Also, our design yields about 91% faster execution speed with larger footprint than 8-bits optimized code.

The Limit Distribution and Power of a Test for Bivariate Normality

  • Kim, Namhyun
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.187-196
    • /
    • 2002
  • Testing for normality has always been a center of practical and theoretical interest in statistical research. In this paper a test statistic for bivariate normality is proposed. The underlying idea is to investigate all the possible linear combinations that reduce to the standard normal distribution under the null hypothesis and compare the order statistics of them with the theoretical normal quantiles. The suggested statistic is invariant with respect to nonsingular matrix multiplication and vector addition. We show that the limit distribution of an approximation to the suggested statistic is represented as the supremum over an index set of the integral of a suitable Gaussian Process. We also simulate the null distribution of the statistic and give some critical values of the distribution and power results.

Quadratic polynomial fitting algorithm for peak point detection of white light scanning interferograms (백색광주사간섭무늬의 정점검출을 위한 이차다항식맞춤 알고리즘)

  • 박민철;김승우
    • Korean Journal of Optics and Photonics
    • /
    • v.9 no.4
    • /
    • pp.245-250
    • /
    • 1998
  • A new computational algorithm is presented for the peak point detection of white light interferograms. Assuming the visibility function of white light interferograms as a quadratic polynomial, the peak point is searched so as to minimize the error sum between the measured intensity data and the analytical intensity. As compared with other existing algorithms, this new algorithm requires less computation since the peak point is simply determined with a single step matrix multiplication. In addition, a good robustness is obtained against external random disturbances on measured intensities since the algorithm is based upon least squares principles.

  • PDF

Trends in AI Computing Processor Semiconductors Including ETRI's Autonomous Driving AI Processor (인공지능 컴퓨팅 프로세서 반도체 동향과 ETRI의 자율주행 인공지능 프로세서)

  • Yang, J.M.;Kwon, Y.S.;Kang, S.W.
    • Electronics and Telecommunications Trends
    • /
    • v.32 no.6
    • /
    • pp.57-65
    • /
    • 2017
  • Neural network based AI computing is a promising technology that reflects the recognition and decision operation of human beings. Early AI computing processors were composed of GPUs and CPUs; however, the dramatic increment of a floating point operation requires an energy efficient AI processor with a highly parallelized architecture. In this paper, we analyze the trends in processor architectures for AI computing. Some architectures are still composed using GPUs. However, they reduce the size of each processing unit by allowing a half precision operation, and raise the processing unit density. Other architectures concentrate on matrix multiplication, and require the construction of dedicated hardware for a fast vector operation. Finally, we propose our own inAB processor architecture and introduce domestic cutting-edge processor design capabilities.

Implementation and Performance Evaluation of a Tightly-Coupled Multiprocessor System (밀결합 멀티프로세서 시스템의 구현 및 성능평가)

  • 김덕진;김영천;박석천
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.5
    • /
    • pp.777-785
    • /
    • 1987
  • In this paper, a tightly-coupled multiprocessor system is implemented with four processing elements based on MC68000 CPU, a common menory (128KB), and a single time-shared bus. The multi-tasking operating system, MTOS, is modified so that the multiprocessor system can support multitasking and multiprocessing. The performance of the proposed system is evaluated by stochastic Petri Net system modeling. The efficiency and the processing power are simulated for various load factors and up to 16 PEs. By running benchmark programs, such as quicksort, FFT, and matrix-multiplication, the speed of parallel processing is compared with that of a single processor.

  • PDF

Derivation Algorithm of State-Space Equation for Production Systems Based on Max-Plus Algebra

  • Goto, Hiroyuki;Masuda, Shiro
    • Industrial Engineering and Management Systems
    • /
    • v.3 no.1
    • /
    • pp.1-11
    • /
    • 2004
  • This paper proposes a new algorithm for determining an optimal control input for production systems. In many production systems, completion time should be planned within the due dates by taking into account precedence constraints and processing times. To solve this problem, the max-plus algebra is an effective approach. The max-plus algebra is an algebraic system in which the max operation is addition and the plus operation is multiplication, and similar operation rules to conventional algebra are followed. Utilizing the max-plus algebra, constraints of the system are expressed in an analogous way to the state-space description in modern control theory. Nevertheless, the formulation of a system is currently performed manually, which is very inefficient when applied to practical systems. Hence, in this paper, we propose a new algorithm for deriving a state-space description and determining an optimal control input with several constraint matrices and parameter vectors. Furthermore, the effectiveness of this proposed algorithm is verified through execution examples.

Implementation of the Systolic Array for Band Matrix Multiplication using Mutiplexer-based Bit-serial Multiplier (멀티플렉서 기반의 비트 연속 승산기를 이용한 시스톨릭 어레이 며 행렬 승산기 구현)

  • 한영욱;김진만;유명근;송기용
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.288-291
    • /
    • 2003
  • 본 논문에서는 모듈성과 확장성을 갖는 시스톨릭 어레이를 이용한 두 띠 행렬의 비트 연속 승산기 구현에 대하여 기술한다. 띠 폭이 3인 4$\times$4 띠 행렬이 주어질 때 워드 레블 승산기 설계를 위한 3차원 DG로부터 2차원 시스톨릭 어레이를 유도한 후, 워드 레블 PE를 비트 연속 승산기와 가산기를 이용하여 비트 레블 PE로 변환시켜 띠 행렬의 비트 레블 승산기를 설계한다. 구현된 워드 레블 승산기와 비트 레블 승산기는 RT 수준에서 VHDL로 모델링하여 동작을 검증하였다. 검증된 시스톨릭 어레이를 이용한 워드 레블 승산기와 비트 레블 승산기는 Hynix에서 제공하는 0.35$\mu\textrm{m}$ 셀 라이브러리를 사용하여 Synopsys design compiler로 합성되었다.

  • PDF

An Design Exploration Technique of a Hybrid Memory for Artificial Intelligence Applications (인공지능 응용을 위한 하이브리드 메모리 설계 탐색 기법)

  • Cho, Doo-San
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.24 no.5
    • /
    • pp.531-536
    • /
    • 2021
  • As artificial intelligence technology advances, it is being applied to various application fields. Artificial intelligence is performing well in the field of image recognition and classification. Chip design specialized in this field is also actively being studied. Artificial intelligence-specific chips are designed to provide optimal performance for the applications. At the design task, memory component optimization is becoming an important issue. In this study, the optimal algorithm for the memory size exploration is presented, and the optimal memory size is becoming as a important factor in providing a proper design that meets the requirements of performance, cost, and power consumption.

CSR Sparse Matrix Vector Multiplication Using Zero Copy (Zero Copy를 이용한 CSR 희소행렬 연산)

  • Yoon, SangHyeuk;Jeon, Dayun;Park, Neungsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.45-47
    • /
    • 2021
  • APU(Accelerated Processing Unit)는 CPU와 GPU가 통합되어있는 프로세서이며 같은 메모리 공간을 사용한다. CPU와 GPU가 분리되어있는 기존 이종 컴퓨팅 환경에서는 GPU가 작업을 처리하기 위해 CPU에서 GPU로 메모리 복사가 이루어졌지만, APU는 같은 메모리 공간을 사용하므로 메모리 복사 없이 가상주소 할당으로 같은 물리 주소에 접근할 수 있으며 이를 Zero Copy라 한다. Zero Copy 성능을 테스트하기 위해 희소행렬 연산을 사용하였으며 기존 메모리 복사대비 크기가 큰 데이터는 약 4.67배, 크기가 작은 데이터는 약 6.27배 빨랐다.