• Title/Summary/Keyword: matrix-vector multiplication

Search Result 35, Processing Time 0.02 seconds

A Parallelising Algortithm for Matrix Arithmetics of Digital Signal Processings on VLIW Simulator (VLIW 시뮬레이터 상에서의 디지털 신호처리 행렬 연산에 대한 병렬화 알고리즘)

  • Song, Jin-Hee;Jun, Moon-Seog
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.8
    • /
    • pp.1985-1996
    • /
    • 1998
  • A parallelising algorithm for partitioning and mapping methods of matrix/vector multiplication into linear processor array/VLW simulator is presented in this paper. First we discuss the mapping methods for input matrix or vector into the arbitrarily size of processor arrays. Then, we show partitioning the algorithmss of the large size of computational problem into the size of the processor array. We execute the algorithm on VLIW simuhator and show to effectiviness of algorithm. The result which we achived better parallelising performance on our VLIW simulator dsign than on linear processor array.

  • PDF

Signal Processing Logic Implementation for Compressive Sensing Digital Receiver (압축센싱 디지털 수신기 신호처리 로직 구현)

  • Ahn, Woohyun;Song, Janghoon;Kang, Jongjin;Jung, Woong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.21 no.4
    • /
    • pp.437-446
    • /
    • 2018
  • This paper describes the real-time logic implementation of orthogonal matching pursuit(OMP) algorithm for compressive sensing digital receiver. OMP contains various complex-valued linear algebra operations, such as matrix multiplication and matrix inversion, in an iterative manner. Xilinx Vivado high-level synthesis(HLS) is introduced to design the digital logic more efficiently. The real-time signal processing is realized by applying dataflow architecture allowing functions and loops to execute concurrently. Compared with the prior works, the proposed design requires 2.5 times more DSP resources, but 10 times less signal reconstruction time of $1.024{\mu}s$ with a vector of length 48 with 2 non-zero elements.

Acceleration of ECC Computation for Robust Massive Data Reception under GPU-based Embedded Systems (GPU 기반 임베디드 시스템에서 대용량 데이터의 안정적 수신을 위한 ECC 연산의 가속화)

  • Kwon, Jisu;Park, Daejin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.956-962
    • /
    • 2020
  • Recently, as the size of data used in an embedded system increases, the need for an ECC decoding operation to robustly receive a massive data is emphasized. In this paper, we propose a method to accelerate the execution of computations that derive syndrome vectors when ECC decoding is performed using Hamming code in an embedded system with a built-in GPU. The proposed acceleration method uses the matrix-vector multiplication of the decoding operation using the CSR format, one of the data structures representing sparse matrix, and is performed in parallel in the CUDA kernel of the GPU. We evaluated the proposed method using a target embedded board with a GPU, and the result shows that the execution time is reduced when ECC decoding operation accelerated based on the GPU than used only CPU.

A Study on the Optical Digital paralle Processor using Phase Conjugate Mirror (위상 공액경을 이용한 광 디지틀 병렬 연산기에 관한 연구)

  • 은재정;최평석
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.9
    • /
    • pp.135-141
    • /
    • 1995
  • An optical digital parallel processor using Self-Pumped Phase Conjugate Mirror and liquid crystal spatial light modulator is presented and experimentally implemented. To use self-pumped PCM as memory, the mechanism for phase conjugation in two coupled interaction regions with the photorefractive crystal BaTiO$_{3}$ is investigated, especially the temporal behavior and effects of incident beam position. The optical design and implementation of matrix-vector multiplication using LCSLM and PCM memory is presented.

  • PDF

Design of a Digital Neuron Processor Using the Residue Number System (잉여수 체계를 이용한 디지털 뉴론 프로세서의 설계)

  • 윤현식;조원경
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.10
    • /
    • pp.69-76
    • /
    • 1993
  • In this paper we propose a design of a digital neuron processor using the residue number system for efficient matrix.vector multiplication involved in neural processing. Since the residue number system needs no carry propagation for modulus operations, the neuron processor can perform multiplication considerably fast. We also propose a high speed algorithm for computing the sigmoid function using the specially designed look-up table. Our method can be implemented area-effectively using the current technology of digital VLSI and siumlation results positively demonstrate the feasibility of our method. The proposed method would expected to adopt for application field of digital neural network, because it could be realized to currently developed digital VLSI Technology.

  • PDF

Space Deformation of Parametric Surface Based on Extension Function

  • Wang, Xiaoping;Ye, Zhenglin;Meng, Yaqin;Li, Hongda
    • International Journal of CAD/CAM
    • /
    • v.1 no.1
    • /
    • pp.23-32
    • /
    • 2002
  • In this paper, a new technique of space deformation for parametric surfaces with so-called extension function (EF) is presented. Firstly, a special extension function is introduced. Then an operator matrix is constructed on the basis of EF. Finally the deformation of a surface is achieved through multiplying the equation of the surface by an operator matrix or adding the multiplication of some vector and the operator matrix to the equation. Interactively modifying control parameters, ideal deformation effect can be got. The implementation shows that the method is simple, intuitive and easy to control. It can be used in such fields as geometric modeling and computer animation.

$S^{2}MMSE$ Precoding for Multiuser MIMO Broadcast Channels (다중 사용자 MIMO 방송 채널을 위한 $S^{2}MMSE$ 프리코딩)

  • Lee, Min;Oh, Seong-Keun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.12A
    • /
    • pp.1185-1190
    • /
    • 2008
  • In this paper, we propose an simplified successive minimum mean square error ($S^{2}MMSE$) algorithm that can simplify the computational complexity for precoding matrix generation in the successive minimum mean square error (SMMSE) precoding method, which is adopted as a multiuser multiple-input multiple-output (MU-MIMO) precoding technique in the IST (information society technologies)-WINNER (wireless world initiative new radio) project. The original algorithm generates the precoding matrix by calculating all individual precoding vectors with each requiring its own MMSE nulling matrix, over all receive antennas for all users. In contrast, this proposed algorithm first calculates the MMSE nulling matrix for each user, and then calculates all precoding vectors for respective receive antennas of the corresponding user by using the identical MMSE nulling matrix, in which only a simple matrix-vector multiplication is required for each vector. Consequently, it can simplify significantly the computational complexity to generate a precoding matrix for SMMSE precoding.

Cryptographic Analysis of the Post-Processing Procedure in the Quantum Random Number Generator Quantis (양자난수발생기 Quantis의 후처리 과정에 관한 암호학적 분석)

  • Bae, Minyoung;Kang, Ju-Sung;Yeom, Yongjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.3
    • /
    • pp.449-457
    • /
    • 2017
  • In this paper, we analyze the security and performance of the Quantis Quantum random number generator in terms of cryptography through experiments. The Quantis' post-processing is designed to output full-entropy via bit-matrix-vector multiplication based on mathematical background, and we used the min-entropy estimating test of NIST SP 800-90B so as to verify whether the output is full-entropy. Quantis minimizes the effect on the random bit rate by using an optimization technique for bit-matrix-vector multiplication, and compared the performance to conditioning functions of NIST SP 800-90B by measuring the random bit rate. Also, we have distinguished what is in Quantis' post-processing to the standard model of NIST in USA and BSI in Germany, and in case of applying Quantis to cryptographic systems in accordance with the CMVP standard, it is recommended to use the output of Quantis as the seed of the approved DRBG.

GPU-based Sparse Matrix-Vector Multiplication Schemes for Random Walk with Restart: A Performance Study (랜덤워크 기법을 위한 GPU 기반 희소행렬 벡터 곱셈 방안에 대한 성능 평가)

  • Yu, Jae-Seo;Bae, Hong-Kyun;Kang, Seokwon;Yu, Yongseung;Park, Yongjun;Kim, Sang-Wook
    • Annual Conference of KIPS
    • /
    • 2020.11a
    • /
    • pp.96-97
    • /
    • 2020
  • 랜덤워크 기반 노드 랭킹 방식 중 하나인 RWR(Random Walk with Restart) 기법은 희소행렬 벡터 곱셈 연산과 벡터 간의 합 연산을 반복적으로 수행하며, RWR 의 수행 시간은 희소행렬 벡터 곱셈 연산 방법에 큰 영향을 받는다. 본 논문에서는 CSR5(Compressed Sparse Row 5) 기반 희소행렬 벡터 곱셈 방식과 CSR-vector 기반 희소행렬 곱셈 방식을 채택한 GPU 기반 RWR 기법 간의 비교 실험을 수행한다. 실험을 통해 데이터 셋의 특징에 따른 RWR 의 성능 차이를 분석하고, 적합한 희소행렬 벡터 곱셈 방안 선택에 관한 가이드라인을 제안한다.

ERROR REDUCTION FOR HIGHER DERIVATIVES OF CHEBYSHEV COLLOCATION METHOD USING PRECONDITIONSING AND DOMAIN DECOMPOSITION

  • Darvishi, M.T.;Ghoreishi, F.
    • Journal of applied mathematics & informatics
    • /
    • v.6 no.2
    • /
    • pp.523-538
    • /
    • 1999
  • A new preconditioning method is investigated to reduce the roundoff error in computing derivatives using Chebyshev col-location methods(CCM). Using this preconditioning causes ration of roundoff error of preconditioning method and CCm becomes small when N gets large. Also for accuracy enhancement of differentiation we use a domain decomposition approach. Error analysis shows that for this domain decomposition method error reduces proportional to the length of subintervals. Numerical results show that using domain decomposition and preconditioning simultaneously gives super accu-rate approximate values for first derivative of the function and good approximate values for moderately high derivatives.