1 |
NVIDIA, "CUDA C Programming Guide," Oct. 2010.
|
2 |
KHRONOS Group, "The OpenCL Specification," Aug 2012.
|
3 |
Govindaraju, Naga K., et al. "A memory model for scientific algorithms on graphics processors," In Proceeding of the 2006 ACM/IEEE Conference on Supercomputing, Nov 2006.
|
4 |
Bakhoda, Ali, et al. "Analyzing CUDA workloads using a detailed GPU simulator," IEEE International symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163-174, April 2009.
|
5 |
Che, Shuai, et al. "Rodinia: A benchmark suite for heterogeneous computing," IEEE International Symposium on Workload Characterization (IISWC), pp. 44-54, Oct 2009.
|
6 |
NVDIA, "CUDA C/C++ SDK Code Samples," 2011.
|
7 |
Harish, Pawan, and P. J. Narayanan. "Accelerating large graph algorithms on the GPU using CUDA," International Conference on High Performance computing, Springer Berlin Heidelberg, pp. 197-208, 2007.
|
8 |
Michalakes, John, and Manish Vachharajani. "GPU acceleration of numerical weather prediction," Parallel Processing Letters, vol. 18, no. 5, pp.531-548, 2008.
|
9 |
Phansalkar, Aashish, Ajay Joshi, and Lizy K. John. "Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite," ACM SIGARCH Computer Architecture News, vol. 35, no. 2, pp. 412-423, 2007.
|