Browse > Article

Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs  

Shin, Bong-Hi (인천대학교 컴퓨터공학부)
Kim, Young-Tae (강릉원주대학교 컴퓨터공학과)
Publication Information
Journal of Internet Computing and Services / v.12, no.3, 2011 , pp. 131-137 More about this Journal
Abstract
GPUs were originally designed for graphic processing, and GPGPUs are general-purpose GPUs for numerical computation with high performance and low electric power. In this paper, we implemented the parallel LU factorization program for GPGPUs. In CUDA, which is computational environment for Nvidia GPGPUs, domains are divided into blocks, and multi-threads compute each sub-blocks Simultaneously. In LU factorization program, computation order should be artificially decided due to the data dependence. To resolve the data dependancy, we suggested a parallel LU program for GPGPUs, and also explained parallel reduction algorithm for partial pivoting of LU factorization. We finally present performance analysis to show efficiency of the parallel LU factorization program based on multi-threads on GPGPUs.
Keywords
GPGPU; CUDA; LU; SIMT;
Citations & Related Records
연도 인용수 순위
  • Reference
1 V. Volkov and J. Demmel, 'LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs', LAPACK Working Note 202, 2008.
2 N. Galoppo, N. Govindraju, M. and D. Henson, 'LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphic Hardware.', Proceedings of 2005 Conference on Super Computing, 2005.
3 G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, 'Solving Problems on concurrent Processors Vol. 1.', Prentice Hall, Englewood Cliffs, NJ, 1988.
4 Y. Kim, 'Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines.', Journal of IEEE Korea Council, Vol. 2, No. 2, pp. 247-255, 1999.
5 NVIDIA CORPORATION. 2009. Nvidia Program Guide Version 2.3.1
6 G. Geist, and C. Romine, 'LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures.', SIAM J. Sci. Stat. Comput., vol. 9, no. 4, pp. 639-649, July 1988.   DOI
7 G. Laszewski, M. Parashar, A. Mohamed, and G. C. Fox, 'On the Parallelization of Blocked LU Factorization Algorithms on Distributed Memory Architectures.', Proceedings of '92 Conference on Super Computing, 1992.