1 |
http://nocs.stanford.edu/booksim.html
|
2 |
E. Lindholm, J. Nickolls, S.Oberman, and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE MICRO, Vol.28, No.2, pp.39-55, 2008.
DOI
ScienceOn
|
3 |
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.
|
4 |
D. C. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0," Computer Architecture News, Vol.25, No.3, pp.13-25, 1997.
|
5 |
http://developer.download.nvidia.com/compute/cuda/sdk/website/samples.html
|
6 |
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, "A Survey of General-Purpose Computation on Graphics Hardware," Euro-graphics 2005, State of the Art Reports, pp.21-51, 2005.
|
7 |
Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of 31th Annual Conference on Computer Graphics, pp.777-786, 2004.
|
8 |
H. J. Choi and C. H. Kim, "Performance Evaluation of the GPU Architecture Executing Parallel Applications," Journal of the Korea Contents Association, Vol.12, No.5. pp.10-21, 2012.
과학기술학회마을
DOI
ScienceOn
|
9 |
H. J. Choi and C. H. Kim, "Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications," Journal of the Korea Contents Association, Vol.13, No.3. pp.9-21, 2013.
과학기술학회마을
DOI
ScienceOn
|
10 |
http://www.gpgpu.org
|
11 |
http://www.khronos.org/opencl/
|
12 |
http://www.amd.com/stream
|
13 |
http://developer.nvidia.com/object/cuda_3_1_downloads.html
|
14 |
J. Meng, D. Tarjan, and K. Skadron, "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance," In Proceedings of 37th International Symposium on Computer Architecture, pp.235-246, 2010.
|
15 |
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of 40th Microarchitecture, pp.407-420, 2007.
|
16 |
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch : Enabling Energy Optimizations in GPGPUs," In Proceedings of the 27th International Symposium on Computer Architecture, pp.487-498, 2013.
|
17 |
N. B. Lakshminarayana and H. S. Kim, "Effect of Instruction Fetch and Memory Scheduling on GPU Performance," Workshop on Language, Compiler, and Architecture Support for GPGPU(in conjunction with HPCA/PPoPP 2010), 2010.
|
18 |
http://www.nvidia.com/object/product_quadro_fx_5800_us.html
|
19 |
http://www.isuppli.com/
|
20 |
W. W. L. Fung and T. M. Aamodt, "Thread Block Compaction for Efficient SIMT Control Flow," In Proceedings of the 17th International Symposium on High Performance Computer Architecture, pp.25-36, 2011.
|
21 |
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger, "Dark Silicon and the End of Multicore Scaling," In Proceedings of International Symposium on Computer Architecture, pp.365-376, 2011.
|
22 |
K. Olukotun, B. A. Nayfeh, L. Hammond, K. Wilson, and K. Chang, "The Case for a Single-Chip Multiprocessor," In Proceedings of 7th Conference on Architectural Support for Programming Languages and Operating Systems, pp.2-11, 1996.
|
23 |
V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microarchitectures," In Proceedings of the 27th International Symposium on Computer Architecture, pp.248-259, 2000.
|