[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2013.14.6.41

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs

Kim, Youngtae (Department of Computer Science, Gangneung-Wonju National University)
Kim, Doo-Han (Department of Computer Science, Gangneung-Wonju National University)
Yu, Myoung-Han (Department of Computer Science, Gangneung-Wonju National University)

Publication Information

Journal of Internet Computing and Services / v.14, no.6, 2013 , pp. 41-47 More about this Journal

Abstract

GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.

Keywords

LU decomposition; GP-GPU; Nvidia CUDA; Parallel program;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Nvidia, CUDA Programming Guide 4.2.
2	J. Nickolls, "Scalable Parallel Programming with CUDA", ACM Queue, vol. 6, no. 2, pp.40 -53 2008.
3	E. Lindholm, "NVIDIA Tesla: A Unified Graphics and Computing Architecture", IEEE Micro, vol. 28, no. 2, pp.39-55 2008. DOI ScienceOn
4	Nico et al., "LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware", SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing, pp. 3.
5	John L. Hennessy, David A. Patterson. "Computer Architecture: A Quantitative Approach". 2011.
6	Golub, Gene H. Van Loan, Charles F. (1996), Matrix Computations (3rd ed.), Baltimore: Johns Hopkins.
7	Shin, B., Y. Kim, Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs, Journal of Korean Society for Internet Information, Vol. 12, No. 3, pp. 131-137, 2011. 과학기술학회마을
8	Kim, Y., Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines, Journal of IEEE Korea Council, Vol. 2, No. 2, pp. 247-255, 1999.
9	G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, 'Solving Problems on concurrent Processors Vol. 1.', Prentice Hall, Englewood Cliffs, NJ, 1988.
10	Gallivan et al., "Parallel Algorithms for Matrix Computations", SIAM, Philadelphia, 1991.

KSCI

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs GP-GPU의 캐시메모리를 활용하기 위한 병렬 블록 LU 분해 프로그램의 구현

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs