[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/JKCA.2013.13.03.009

Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications

Choi, Hong Jun (전남대학교 전자컴퓨터공학부)
Kim, Cheol Hong (전남대학교 전자컴퓨터공학부)

Publication Information

The Journal of the Korea Contents Association / v.13, no.3, 2013 , pp. 9-21 More about this Journal

Abstract

Due to increased computing power and flexibility of GPU, recent GPUs execute general purpose parallel applications as well as graphics applications. Programmers can use GPGPU by using the APIs from GPU vendors. Unfortunately, computational resources of GPU are not fully utilized when executing general purpose applications because of frequent branch instructions. To handle the branch problem, several warp formations have been proposed. Intuitively, we expect that the warp formations providing higher computational resource utilization show higher performance. Contrary to our expectations, according to simulation results, the performance of the warp formation providing better utilization is lower than that of the warp formation providing worse utilization. This is because warp formation providing high utilization causes serious memory bottleneck due to increased memory request. Therefore, warp formation providing high computation utilization cannot guarantee high performance without proper hardware resources. For this reason, we will analyze the correlation between hardware configuration and warp formation. Our simulation results present the guideline to solve the underutilization problem due to branch instructions when designing recent GPU.

Keywords

GPU; GGPPU; General-purpose Application; Branch Instruction; Warp Formation;

Citations & Related Records

Times Cited By KSCI : 4 (Citation Analysis)

Reference
Cited By KSCI

1	E. Lindholm, M. J. Kligard, and H. P. Moreton, "A user-programmable vertex engine," In Proceedings of 28th Annual Conference on Computer Graphics (SIGGRAPH), pp.149-158, 2001.
2	J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, "A Survey of General-Purpose Computation on Graphics Hardware," Eurographics 2005, State of the Art Reports, pp.21-51, 2005.
3	http://developer.nvidia.com/object/cuda_3_1_do wnloads.html
4	http://www.khronos.org/opencl/
5	J. Helin, "Performance analysis of the CM-2, a massively parallel SIMD computer," In Proceedings of 6th International Conference on Supercomputing, pp.45-52, 1992.
6	A. Levinthal and T. Porter, "Chap-a SIMD graphics processor," In Proceedings of 11th Annual Conference on Computer Graphics (SIGGRAPH), pp.77-82, 1984.
7	S. Che, J. Meng, J. Sheaffer, and K. Skadron, "A performance study of general purpose applications on graphics processors using CUDA," Journal of Parallel and Distributed Computing, Vol.68, No.10, pp.1370-1380, 2008. DOI ScienceOn
8	R. A. Lorie and H. R. Strong, "Method for conditional branch execution in SIMD vector processors," US Patent 4435758, Vol.6, 1984(3).
9	S. Moy and E. Lindholm, "Method and system for programmable pipelined graphics processing with branching instructions," US Patent 6947047, Vol.20, 2005(9).
10	E. Rotenberg, Q. Jacobson, and J. E. Smith, "A study of control independence in superscalar processors," In Proceedings of 5th International Symposium on High-Performance Computer Architecture, pp.115-124, 1999.
11	B. W. Coon and J. E. Lindholm, "System and method for managing divergent threads in a SIMD architecture," US Patent 7353369, Vol.1, 2008(4).
12	http://www.nvidia.com/object/product_quadro_fx_5800_us.html
13	http://nocs.stanford.edu/booksim.html
14	http://developer.download.nvidia.com/compute/ cuda/sdk/website/samples.html
15	http://www.nvidia.com/content/cudazone/
16	M. J. Flynn, "Very high-speed computing systems," Proceedings of the IEEE, Vol.54, No.12, pp. 1901-1909, 1966. DOI ScienceOn
17	Y. H. Jang, C. Park, J. H. Park, N. Kim, and K. H. Yoo, "Parallel Processing for Integral Imaging Pickup using Multiple Threads," International Journal of Korea Contents, Vol.5, No.4, pp.30-34, 2009. 과학기술학회마을 DOI ScienceOn
18	V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger, "Clock rate versus IPC: the end of the road for conventional microArchitectures," In Proceedings of 27th International Symposium on Computer Architecture, pp.248-259, 2000.
19	N. P. Jouppi and D. W. Wall, "Available instruction-level parallelism for superscalar and superpipelined machines," In Proceedings of 3th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.272-282, 1989.
20	D. M. Tullsen, S. J. Eggers, and H. M. Levy, "Simultaneous multithreading: maximizing on-chip parallelism," In Proceedings of 22th International Symposium on Computer Architecture, pp.392-403, 1995.
21	I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan, "Brook for GPUs: stream computing on graphics hardware," In Proceedings of 31th Annual Conference on Computer Graphics (SIGGRAPH), pp.777-786, 2004.
22	H. J. Choi, H. G. Jeon, and C. H. Kim, "Quantitative Anaysis of the Negative Factors on the GPU Performance," Journal of KIISE : Computing Practices and Letters, Vol.18, No.4, pp.282-287, 2012.
23	E. Rotenberg, Q. Jacobson, and J. Smith, "A study of control independence in superscalar processors," In Proceedings of 5th International Symposium on High-Performance Computer Architecture, pp.115-124, 1999.
24	W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow," In Proceedings of 40th Microarchitecture, pp.407-420, 2007.
25	H. J. Choi and C. H. Kim, "Performance Evaluation of the GPU Architecture Executing Parallel Applications," Journal of the Korea Contents Association, Vol.12, No.5, pp.10-21, 2012. 과학기술학회마을 DOI ScienceOn
26	H. J. Choi, S. G. Kang, J. M. Kim, and C. H. Kim, "Analysis of the CPU/GPU Temperature and Energy Efficiency depending on Executed Applications," Journal of The Korea Society of Computer and Information, Vol.17, No.5, pp.9-20, 2012. 과학기술학회마을 DOI ScienceOn
27	http://www.amd.com/stream
28	https://developer.nvidia.com/cg-toolkit
29	http://msdn2.microsoft.com/en-us/library/bb50 9638.aspx
30	http://www.opengl.org/registry/doc/GLSLangS pec.Full.1.20.8.pdf
31	http://www.simplescalar.com
32	A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt, "Analyzing CUDA Workloads Using a Detailed GPU Simulator," In Proceedings of 9th International Symposium on Performance Analysis of Systems and Software, pp.163-174, 2009.

1	Raster Pipeline Implementation based on 3D Graphics Geometry Pipelines / [Baek, Nakhoon;] / The Journal of the Korea Contents Association
2	Analysis on the GPU Performance according to Hierarchical Memory Organization / [Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong;] / The Journal of the Korea Contents Association
3	Analysis on Memory Characteristics of Graphics Processing Units for Designing Memory System of General-Purpose Computing on Graphics Processing Units / [Choi, Hongjun;Kim, Cheolhong;] / Smart Media Journal

KSCI

Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications 범용 응용프로그램 실행 시 하드웨어 구성과 분기 처리 기법에 따른 GPU 성능 분석

Analysis of Impact of Correlation Between Hardware Configuration and Branch Handling Methods Executing General Purpose Applications