1 |
F. Warg, J. Nilsson and M. Ekman, "An in-depth look at computer performance growth," Workshop on architectural support for security and anti-virus, pp. 144- 147, 2005.
|
2 |
V. W. Lee, C. K. Kim, J. Chhugani, M. Deisher, D. H. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal and P. Dubey, "Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU," International Symposium on Computer Architecture, pp. 451-460, 2010.
|
3 |
General-purpose computation on graphics hardware, available at http://gpgpu.org.
|
4 |
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn and T. Purcell, "A survey of general-purpose computation on graphics hardware," Computer Graphics Forum, Vol. 26, No. 1, pp. 80-113, 2007.
DOI
ScienceOn
|
5 |
AMD, AMD Accelerated Parallel Processing OpenCL Programming Guide, 2012.
|
6 |
NVIDIA, NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009.
|
7 |
P. Conway and B. Hughes, "The AMD Opteron Northbridge Architecture," IEEE Micro, Vol. 27, No. 2, pp. 10-21, 2007.
DOI
|
8 |
P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor," IEEE Micro, Vol. 25, No. 2, pp. 21-29, 2005.
DOI
ScienceOn
|
9 |
S. Rusu, T. Simon, H. Muljono, J. Stinson, D. Ayers, J. Chang, R. Varada, M. Ratta, S. Kottapalli, and S. Vora, "A 45 nm 8-Core Enterprise Xeon Processor," Journal of Solid-State Circuits, Vol. 45, No.1, pp. 7-14, 2010.
DOI
|
10 |
NVIDA Co. Ltd., available at http://www.nvidia.com/
|
11 |
QuadroFX5800, available at http://www.nvidia.com/object/product_quadro_fx_5800_us.html
|
12 |
H. J. Choi and C. H. Kim, "Performance Evaluation of the GPU Architecture Executing Parallel Applications," Journal of the Korea Contents Association, Vol.12, No.5. pp. 10-21, 2012.
|
13 |
H. J. Choi and C. H. Kim, "Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications," Journal of the Korea Contents Association, Vol.13, No.3. pp. 9-21, 2013.
|
14 |
W. W. L. Fung and T. M. Aamodt, "Thread Block Compaction for Efficient SIMT Control Flow," In Proceedings of the 17th International Symposium on High Performance Computer Architecture, pp. 25-36, 2011.
|
15 |
A Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
|
16 |
Booksim simulator, available at http://nocs.stanford.edu/booksim.html
|
17 |
CUDA SDK, available at http://developer.download.nvidia.coml compute/cuda/sdk/website/samples.html
|