[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9708/jksci.2012.17.7.001

Memory Delay Comparison between 2D GPU and 3D GPU

Jeon, Hyung-Gyu (School of Electronics and Computer Engineering, Chonnam National University)
Ahn, Jin-Woo (School of Electronics and Computer Engineering, Chonnam National University)
Kim, Jong-Myon (School of Computer Engineering and Information Technology, University of Ulsan)
Kim, Cheol-Hong (School of Electronics and Computer Engineering, Chonnam National University)

Publication Information

Journal of the Korea Society of Computer and Information / v.17, no.7, 2012 , pp. 1-11 More about this Journal

Abstract

As process technology scales down, the number of cores integrated into a processor increases dramatically, leading to significant performance improvement. Especially, the GPU(Graphics Processing Unit) containing many cores can provide high computational performance by maximizing the parallelism. In the GPU architecture, the access latency to the main memory becomes one of the major reasons restricting the performance improvement. In this work, we analyze the performance improvement of the 3D GPU architecture compared to the 2D GPU architecture quantitatively and investigate the potential problems of the 3D GPU architecture. In general, memory instructions account for 30% of total instructions, and global/local memory instructions constitutes 60% of total memory instructions. Therefore, the performance of the 3D GPU is expected to be improved significantly compared to the 2D GPU by reducing the delay of memory instructions. However, according to our experimental results, the 3D architecture improves the GPU performance only by 2% compared to the 2D architecture due to the memory bottleneck, since the performance reduction due to memory bottleneck in the 3D GPU architecture increases by 245% compared to the 2D architecture. This paper provides the guideline for suitable memory design by analyzing the efficiency of the memory architecture in 3D GPU architecture.

Keywords

3D Integrated Processor; Graphics Processing Unit; Memory; Performance Analysis;

Citations & Related Records

Reference

1	Samsung 512Mbit GDDR3 SDRAM, http://www.samsung.com/global/system/business/semicond uctor/product/2008/5/22/841580ds_k4j52324qh_rev10.pdf.
2	Booksim interconnection network simulator, http://nocs.stanford.edu/booksim.html.
3	Jiayan Meng, David Tarjan, Kevin Skadron, "Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance," In Proceedings of the 37th annual international symposium on Computer architecture, pp.235-246, Saint-Malo, France, Jun. 2010.
4	Jaekyu Lee, Lakshminarayana N. B, Hyesoon Kim, Vuduc R, "Many-Thread Aware Prefetching Mechanisms for GPGPU Applications," In Proceedings of 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp.213-224, Georgia, USA, Dec. 2010.
5	Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, Xipeng Shen, "On-the-Fly Elimination of Dynamic Irregularities for GPU Computing," In Proceedings of the 16th International Conference on Architectural support for programming languages and operating systems, pp.369-380, California, USA, Mar. 2011.
6	D. Burger, T. M. Austin, S. Bennett, "Evaluating future microprocessors: the SimpleScalar tool set," Technical Report TR-1308, University of Wisconsin-Madison Computer Sciences Department, Jul. 1997.
7	A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software, pp.163-174, Miami, USA, Apr. 2009.
8	Bakhoda A, Kim J, Aamodt T. M, "Throughput-Effective On-Chip Networks for Manycore Accelerators," In Proceedings of the 43th Annual IEEE/ACM International Symposium on Microarchitecture, pp.421-432, Georgia, USA, Dec. 2010.
9	Chang D. W, Jenkins C. D, Garcia P. C, Gilani S. Z, Aguilera P, Nagarajan A, Anderson M. J, Kenny M. A, Bauer S. M, Schulte M. J, Compton K, "ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing," In Proceedings of International Conference on Field Programmable Logic and Applications, pp.408-413, Milano, Italy, Sep. 2010.
10	Goswami N. Shankar R. Joshi M. Tao Li, "Exploring GPGPU Workloads: Characterization Methodology, Analysis and Microarchitecture Evaluation Implications," In Proceedings of IEEE International Symposium on Workload Characterization, pp.1-10, Georgia, USA, Dec. 2010.
11	Maruyama N, Nukada A, Matsuoka S, "A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs," In Proceedings of 24th IEEE International Symposium on Parallel & Distributed Processing, pp.1-12, Atlanta, USA, Apr. 2010.
12	Jishen Zhao, Xiangyu Dong, Yuan Xie, "An Energy-Efficient 3D CMP Design with Fine-Grained voltage Scaling," In Proceedings of Design, Automation & Test in Europe Conference & Exhibition, pp.1-4, Grenoble, France, Mar. 2011.
13	D. H. Kim, K. Athikulwongse, S. K. Lim, "A Study of Through-Silicon-Via Impact on the 3D Stacked IC Layout," In Proceedings of the 2009 International Conference on Computer-Aided Design, pp.674-680, California, USA, Nov. 2009.
14	Joyner J. W, Zarkesh Ha P, Meindl J. D, "A Stochastic Global Net-length Distribution for a Three-Dimensional System on Chip (3D-SoC)," In Proceedings of the 14th IEEE International ASIC/SOC Conference, pp.147-151, Arlington, USA, Sep. 2001.
15	K. Puttaswamy, and G. H. Loh, "Thermal Analysis of a 3D Die Stacked High Performance Microprocessor," In Proceedings of ACM GreatLakes Symposium on VLSI, pp.19-24, Philadelphia, USA, May. 2006.
16	M. R. Thistle, B. J. Smith, "A processor architecture for Horizon," In Proceedings of SuperComputing, Vol. 1, Florida, USA, Nov. 1988.
17	J. Kim, C. Nicopoulos, D. Park, R. Das, Y. Xie, V. Narayanan, M. Yousif, and C. Das, "A Novel Dimensionally-Decomposed Router for On-Chip Communication in 3D Architectures," In Proceedings of the International Symposium on Computer Architecture, pp.138-149, San Diego, USA, Jun. 2007.
18	F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir, "Design and Management of 3D Chip Multiprocessors Using Network-in-Memory," In Proceedings of the International Symposium on Computer Architecture, pp.130-141, Boston, USA, May. 2006.
19	J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, J. C. Phillips, "GPU computing," In Proceedings of IEEE, Vol. 96, no. 5, pp.879-899, California, USA, May. 2008. DOI ScienceOn
20	Cena. G, Cereia. M, Scanzio. S, Valenzano. A, Zunino. C, "A high-performance CUDA-based computing platform for industrial control systems," In Proceedings of the Industrial Electronics 2011 IEEE International Symposium, pp.1169-1174, Gdansk, Poland, Jun. 2011.

KSCI

Memory Delay Comparison between 2D GPU and 3D GPU 2차원 구조 대비 3차원 구조 GPU의 메모리 접근 효율성 분석

Memory Delay Comparison between 2D GPU and 3D GPU