1 |
D. Luebke and G. Humphreys. "How GPUs work," Journal of Computer, Vol. 40, No. 2, pp. 96-100, February 2007. DOI: 10.1109/MC.2007.59
|
2 |
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 777-786, August 2004. DOI: 10.1145/1186562.1015800
|
3 |
CUDA Programming Guide Version 3.0, available at https://developer.nvidia.com/cuda-toolkit-30-downloads
|
4 |
Khronos Group, OpenCL, available at http://www.khronos.org/opencl
|
5 |
ATI Stream SDK, available at http://developer.amd.com/community/blog/2009/08/05/ati-stream-sdk-and-opencl
|
6 |
General-purpose computation on graphics hardware, available at http://www.gpgpu.org
|
7 |
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Computer Graphics Forum, Vol. 26, No. 1, pp. 21-51, March 2007. DOI: 10.1111/j.1467-8659.2007.01012.x
|
8 |
Y. Yang, P. Xiang, M. Mantor, and H. Zhou, "CPU-assisted GPGPU on fused CPU-GPU architectures," In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, pp. 1-12, March 2012. DOI: 10.1109/HPCA.2012.6168948
|
9 |
NVIDIA TITAN X, available at http://www.nvidia.co.kr/graphicscards/geforce/pascal/kr/titan-x-pascal
|
10 |
X. Zhang, and K.K. Parhi, "High-speed VLSI architectures for the AES algorithm," In Proceedings of IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 957-967, August 2004. DOI: 10.1109/TVLSI.2004.832943
|
11 |
NVIDA Co. Ltd., available at http://www.nvidia.com
|
12 |
AMD(Advanced Micro Devices) Inc., available at http://www.amd.com
|
13 |
NVIDIA's Next Generation CUDA Compute Architecture: Fermi, available at http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_architecture_whitepaper.pdf
|
14 |
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of International Symposium on Microarchitecture, pp. 407-420, December 2007.
|
15 |
J. E. Thornton, "Parallel operation in the control data 6600," In AFIPS Proceedings of FJCC, Part. 2, Vol. 26, pp. 33-40, 1964. DOI: 10.1109/MICRO.2007.12
|
16 |
M. Lee, S. Song, J. Moon, J. Kim, W. Seo, Y. Cho, and S. Ryu, "Improving GPGPU Resource Utilization Through Alternative Thread Block Scheduling," In Proceedings of the International Symposium on High Performance Computer Architecture, pp. 260-271, June 2014. DOI: 10.1109/HPCA.2014.6835937
|
17 |
K. M. Abdalla, L. V. Shah, J. F. Duluk, T. J. Purcell, T. Mandal, and G. Hirota, "Scheduling and Execution of Compute Tasks," US Patent US20130185725, 2013.
|
18 |
H. Choi, D. Son, J. Kim, and C. Kim, “Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization,” Journal of SuperComputing, Vol. 69, No. 1, pp. 330-356, July 2014. DOI: 10.1007/s11227-014-1155-4
DOI
|
19 |
G. Kim, J. Kim, and C. Kim, “Latency Hiding based Warp Scheduling Policy for High Performance GPUs,” Journal of The Korea Society of Computer and Information, Vol. 24, No. 4, pp. 1-9, April 2019. DOI: 10.9708/jksci.2019.24.04.001
DOI
|
20 |
D. Son, J. Kim, and C. Kim, “An IPC-based Dynamic Cooperative Thread Array Scheduling Scheme for GPUs,” Journal of The Korea Society of Computer and Information, Vol. 21, No. 2, pp. 9-16, February 2016. DOI: 10.9708/jksci.2016.21.2.009
DOI
|
21 |
A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of International Symposium on Performance Analysis of Systems and Software, pp. 163-174, April 2009. DOI: 10.1109/ISPASS.2009.4919648
|
22 |
S. Li, J. H Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi "McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures," In Proceedings of the International Symposium on Microarchitecture, pp. 469-480, January 2009. DOI: 10.1145/1669112.1669172
|
23 |
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," In Proceedings of the International Symposium Computer Architecture, pp. 487-498, June 2013. DOI: 10.1145/2485922.2485964
|
24 |
GTX480 NVIDIA, available at http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-480
|
25 |
M. Abdel-Majeed, D. Wong, and M. Annavaram, "Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs," In Proceedings of International Symposium on Microarchitecture, pp. 111-122, December 2013. DOI: 10.1145/2540708.2540719
|