[KSCI] Korea Science Citation Index Service

Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs

Shin, Bong-Hi (인천대학교 컴퓨터공학부)
Kim, Young-Tae (강릉원주대학교 컴퓨터공학과)

Publication Information

Journal of Internet Computing and Services / v.12, no.3, 2011 , pp. 131-137 More about this Journal

Abstract

GPUs were originally designed for graphic processing, and GPGPUs are general-purpose GPUs for numerical computation with high performance and low electric power. In this paper, we implemented the parallel LU factorization program for GPGPUs. In CUDA, which is computational environment for Nvidia GPGPUs, domains are divided into blocks, and multi-threads compute each sub-blocks Simultaneously. In LU factorization program, computation order should be artificially decided due to the data dependence. To resolve the data dependancy, we suggested a parallel LU program for GPGPUs, and also explained parallel reduction algorithm for partial pivoting of LU factorization. We finally present performance analysis to show efficiency of the parallel LU factorization program based on multi-threads on GPGPUs.

Keywords

GPGPU; CUDA; LU; SIMT;

Citations & Related Records

Reference

1	V. Volkov and J. Demmel, 'LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs', LAPACK Working Note 202, 2008.
2	N. Galoppo, N. Govindraju, M. and D. Henson, 'LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphic Hardware.', Proceedings of 2005 Conference on Super Computing, 2005.
3	G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, 'Solving Problems on concurrent Processors Vol. 1.', Prentice Hall, Englewood Cliffs, NJ, 1988.
4	Y. Kim, 'Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines.', Journal of IEEE Korea Council, Vol. 2, No. 2, pp. 247-255, 1999.
5	NVIDIA CORPORATION. 2009. Nvidia Program Guide Version 2.3.1
6	G. Geist, and C. Romine, 'LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures.', SIAM J. Sci. Stat. Comput., vol. 9, no. 4, pp. 639-649, July 1988. DOI
7	G. Laszewski, M. Parashar, A. Mohamed, and G. C. Fox, 'On the Parallelization of Blocked LU Factorization Algorithms on Distributed Memory Architectures.', Proceedings of '92 Conference on Super Computing, 1992.

KSCI

Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs GPGPU의 멀티 쓰레드를 활용한 고성능 병렬 LU 분해 프로그램의 구현

Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs