1 |
Nvidia, CUDA Programming Guide 4.2.
|
2 |
J. Nickolls, "Scalable Parallel Programming with CUDA", ACM Queue, vol. 6, no. 2, pp.40 -53 2008.
|
3 |
E. Lindholm, "NVIDIA Tesla: A Unified Graphics and Computing Architecture", IEEE Micro, vol. 28, no. 2, pp.39-55 2008.
DOI
ScienceOn
|
4 |
Nico et al., "LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware", SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing, pp. 3.
|
5 |
John L. Hennessy, David A. Patterson. "Computer Architecture: A Quantitative Approach". 2011.
|
6 |
Golub, Gene H. Van Loan, Charles F. (1996), Matrix Computations (3rd ed.), Baltimore: Johns Hopkins.
|
7 |
Shin, B., Y. Kim, Implementation of high performance parallel LU factorization program for multi-threads on GPGPUs, Journal of Korean Society for Internet Information, Vol. 12, No. 3, pp. 131-137, 2011.
과학기술학회마을
|
8 |
Kim, Y., Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines, Journal of IEEE Korea Council, Vol. 2, No. 2, pp. 247-255, 1999.
|
9 |
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, 'Solving Problems on concurrent Processors Vol. 1.', Prentice Hall, Englewood Cliffs, NJ, 1988.
|
10 |
Gallivan et al., "Parallel Algorithms for Matrix Computations", SIAM, Philadelphia, 1991.
|