Browse > Article
http://dx.doi.org/10.7472/jksii.2013.14.6.41

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs  

Kim, Youngtae (Department of Computer Science, Gangneung-Wonju National University)
Kim, Doo-Han (Department of Computer Science, Gangneung-Wonju National University)
Yu, Myoung-Han (Department of Computer Science, Gangneung-Wonju National University)
Publication Information
Journal of Internet Computing and Services / v.14, no.6, 2013 , pp. 41-47 More about this Journal
Abstract
GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.
Keywords
LU decomposition; GP-GPU; Nvidia CUDA; Parallel program;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Nvidia, CUDA Programming Guide 4.2.
2 J. Nickolls, "Scalable Parallel Programming with CUDA", ACM Queue, vol. 6, no. 2, pp.40 -53 2008.
3 E. Lindholm, "NVIDIA Tesla: A Unified Graphics and Computing Architecture", IEEE Micro, vol. 28, no. 2, pp.39-55 2008.   DOI   ScienceOn
4 Nico et al., "LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware", SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing, pp. 3.
5 John L. Hennessy, David A. Patterson. "Computer Architecture: A Quantitative Approach". 2011.
6 Golub, Gene H. Van Loan, Charles F. (1996), Matrix Computations (3rd ed.), Baltimore: Johns Hopkins.
7 Shin, B., Y. Kim, Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs, Journal of Korean Society for Internet Information, Vol. 12, No. 3, pp. 131-137, 2011.   과학기술학회마을
8 Kim, Y., Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines, Journal of IEEE Korea Council, Vol. 2, No. 2, pp. 247-255, 1999.
9 G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, 'Solving Problems on concurrent Processors Vol. 1.', Prentice Hall, Englewood Cliffs, NJ, 1988.
10 Gallivan et al., "Parallel Algorithms for Matrix Computations", SIAM, Philadelphia, 1991.