[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5573/IEIESPC.2015.4.2.078

Study of Cache Performance on GPGPU

Choi, Kyu Hyun (Department of Electrical and Computer Engineering, Korea University)
Kim, Seon Wook (Department of Electrical and Computer Engineering, Korea University)

Publication Information

IEIE Transactions on Smart Processing and Computing / v.4, no.2, 2015 , pp. 78-82 More about this Journal

Abstract

General-purpose graphics processing units (GPGPUs) provide tremendous computational and processing power. Despite the latency hiding mechanism, a GPU architecture requires high memory bandwidth and lower latency between computational units and the memory system. For this reason, the current GPU architecture has private L1 caches in each core and a shared L2 cache to increase performance by reducing memory latency. But in some cases, this CPU-like cache design is not suitable for GPGPUs. In this paper, we analyze detailed cache performance related to GPGPU application characteristics, and suggest technical alternatives for the GPGPU architecture as future work.

Keywords

GPGPU; Cache; Performance analysis; Application characteristics;

Citations & Related Records

Reference

1	NVIDIA, "CUDA C Programming Guide," Oct. 2010.
2	KHRONOS Group, "The OpenCL Specification," Aug 2012.
3	Govindaraju, Naga K., et al. "A memory model for scientific algorithms on graphics processors," In Proceeding of the 2006 ACM/IEEE Conference on Supercomputing, Nov 2006.
4	Bakhoda, Ali, et al. "Analyzing CUDA workloads using a detailed GPU simulator," IEEE International symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163-174, April 2009.
5	Che, Shuai, et al. "Rodinia: A benchmark suite for heterogeneous computing," IEEE International Symposium on Workload Characterization (IISWC), pp. 44-54, Oct 2009.
6	NVDIA, "CUDA C/C++ SDK Code Samples," 2011.
7	Harish, Pawan, and P. J. Narayanan. "Accelerating large graph algorithms on the GPU using CUDA," International Conference on High Performance computing, Springer Berlin Heidelberg, pp. 197-208, 2007.
8	Michalakes, John, and Manish Vachharajani. "GPU acceleration of numerical weather prediction," Parallel Processing Letters, vol. 18, no. 5, pp.531-548, 2008.
9	Phansalkar, Aashish, Ajay Joshi, and Lizy K. John. "Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite," ACM SIGARCH Computer Architecture News, vol. 35, no. 2, pp. 412-423, 2007.