1 |
V. Volkov and J. Demmel, 'LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs', LAPACK Working Note 202, 2008.
|
2 |
N. Galoppo, N. Govindraju, M. and D. Henson, 'LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphic Hardware.', Proceedings of 2005 Conference on Super Computing, 2005.
|
3 |
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker, 'Solving Problems on concurrent Processors Vol. 1.', Prentice Hall, Englewood Cliffs, NJ, 1988.
|
4 |
Y. Kim, 'Performance Comparison of Two Parallel LU Decomposition Algorithms on MasPar Machines.', Journal of IEEE Korea Council, Vol. 2, No. 2, pp. 247-255, 1999.
|
5 |
NVIDIA CORPORATION. 2009. Nvidia Program Guide Version 2.3.1
|
6 |
G. Geist, and C. Romine, 'LU Factorization Algorithms on Distributed-Memory Multiprocessor Architectures.', SIAM J. Sci. Stat. Comput., vol. 9, no. 4, pp. 639-649, July 1988.
DOI
|
7 |
G. Laszewski, M. Parashar, A. Mohamed, and G. C. Fox, 'On the Parallelization of Blocked LU Factorization Algorithms on Distributed Memory Architectures.', Proceedings of '92 Conference on Super Computing, 1992.
|