DOI QR코드

DOI QR Code

Parallel Algorithm for Matrix-Matrix Multiplication on the GPU

GPU 기반 행렬 곱셈 병렬처리 알고리즘

  • Park, Sangkun (Department of Mechanical Engineering, Korea National University of Transportation)
  • 박상근 (한국교통대학교 기계공학)
  • Received : 2019.09.26
  • Accepted : 2019.11.12
  • Published : 2019.11.30

Abstract

Matrix multiplication is a fundamental mathematical operation that has numerous applications across most scientific fields. In this paper, we presents a parallel GPU computation algorithm for dense matrix-matrix multiplication using OpenGL compute shader, which can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs about 208 times faster than previous CPU algorithm and achieves performance of 75 GFLOPS in single precision for dense matrices with matrix size 4,096. Such performance proves that our algorithm is practical for real applications.

Keywords

References

  1. http://www.netlib.org/blas.
  2. https://software.intel.com/en-us/mkl.
  3. http://developer.amd.com/tools-and-sdks/archive/acml-product-features/.
  4. https://developer.nvidia.com/cublas.
  5. https://docs.nvidia.com/cuda/nvblas/index.html.
  6. G. Sellers, R. S. Wright, and N. Haemel, OpenGL SuperBible (7th ed.), Addison-Wesley, 2015.