• Title/Summary/Keyword: Sparse Matrix Multiplication

Search Result 14, Processing Time 0.021 seconds

CSR Sparse Matrix Vector Multiplication Using Zero Copy (Zero Copy를 이용한 CSR 희소행렬 연산)

  • Yoon, SangHyeuk;Jeon, Dayun;Park, Neungsoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.45-47
    • /
    • 2021
  • APU(Accelerated Processing Unit)는 CPU와 GPU가 통합되어있는 프로세서이며 같은 메모리 공간을 사용한다. CPU와 GPU가 분리되어있는 기존 이종 컴퓨팅 환경에서는 GPU가 작업을 처리하기 위해 CPU에서 GPU로 메모리 복사가 이루어졌지만, APU는 같은 메모리 공간을 사용하므로 메모리 복사 없이 가상주소 할당으로 같은 물리 주소에 접근할 수 있으며 이를 Zero Copy라 한다. Zero Copy 성능을 테스트하기 위해 희소행렬 연산을 사용하였으며 기존 메모리 복사대비 크기가 큰 데이터는 약 4.67배, 크기가 작은 데이터는 약 6.27배 빨랐다.

Basis Function Truncation Effect of the Gabor Cosine and Sine Transform (Gabor 코사인과 사인 변환의 기저함수 절단 효과)

  • Lee, Juck-Sik
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.303-308
    • /
    • 2004
  • The Gabor cosine and sine transform can be applied to image and video compression algorithm by representing image frequency components locally The computational complexity of forward and inverse matrix transforms used in the compression and decompression requires O($N^3$)operations. In this paper, the length of basis functions is truncated to produce a sparse basis matrix, and the computational burden of transforms reduces to deal with image compression and reconstruction in a real-time processing. As the length of basis functions is decreased, the truncation effects to the energy of basis functions are examined and the change in various Qualify measures is evaluated. Experiment results show that 11 times fewer multiplication/addition operations are achieved with less than 1% performance change.

A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations (유한요소 비압축성 유동장 해석을 위한 이중공액구배법의 GPU 기반 연산에 대한 연구)

  • Yoon, Jong Seon;Jeon, Byoung Jin;Jung, Hye Dong;Choi, Hyoung Gwon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.40 no.9
    • /
    • pp.597-604
    • /
    • 2016
  • A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1919-1926
    • /
    • 2016
  • Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.