DOI QR코드

DOI QR Code

GPU 기반 행렬 곱셈 병렬처리 알고리즘

Parallel Algorithm for Matrix-Matrix Multiplication on the GPU

  • 박상근 (한국교통대학교 기계공학)
  • Park, Sangkun (Department of Mechanical Engineering, Korea National University of Transportation)
  • 투고 : 2019.09.26
  • 심사 : 2019.11.12
  • 발행 : 2019.11.30

초록

Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. In this paper, we presents a parallel GPU computation algorithm for dense matrix-matrix multiplication using OpenGL compute shader, which can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs about 208 times faster than previous CPU algorithm and achieves performance of 75 GFLOPS in single precision for dense matrices with matrix size 4,096. Such performance proves that our algorithm is practical for real applications.

키워드

참고문헌

  1. http://www.netlib.org/blas.
  2. https://software.intel.com/en-us/mkl.
  3. http://developer.amd.com/tools-and-sdks/archive/acml-product-features/.
  4. https://developer.nvidia.com/cublas.
  5. https://docs.nvidia.com/cuda/nvblas/index.html.
  6. G. Sellers, R. S. Wright, and N. Haemel, OpenGL SuperBible (7th ed.), Addison-Wesley, 2015.