[KSCI] Korea Science Citation Index Service

Analysis of Programming Techniques for Creating Optimized CUDA Software

Kim, Sung-Soo (서강대학교 컴퓨터공학과)
Kim, Dong-Heon (서강대학교 컴퓨터공학과)
Woo, Sang-Kyu (서강대학교 컴퓨터공학과)
Ihm, In-Sung (서강대학교 컴퓨터공학과)

Publication Information

Journal of KIISE:Computing Practices and Letters / v.16, no.7, 2010 , pp. 775-787 More about this Journal

Abstract

Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

Keywords

GPU; many-core processor; parallel programming; CUDA; memory hierarchy; latency hiding; occupancy; Sobel operator;

Citations & Related Records

Reference

1	Victor Podlozhnyuk, Image Convolution with CUDA, NVIDIA CUDA 2.0 SDK document, 2007.
2	NVIDIA. NVIDIA CUDA Visual Profiler (Version 2.3), 2009.
3	Joe Stam, Convolution Soup, NVIDIA, 2009.
4	NVIDIA. NVIDIA CUDA Compute Unified Device Architecture: Technical Brief NVIDIA GeForce GTX 200 GPU Architectural Overview, 2008.
5	NVIDIA. Optimizing CUDA, 2009.
6	B. Parhami. Introduction to Parallel Processing: Algorithms and Architectures, Plenum Press, New York, pp.377-379, 1999.
7	Sobel, I., Feldman,G., A 3x3 Isotropic Gradient Operator for Image Processing, presented at a talk at the Stanford Artificial Project, 1968.
8	Mark Segal, Kurt Akeley, The OpenGL Graphics System: A Specification(Version 2.1 - December 1), 2006.
9	Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, David B. Kirk, and Wen-mei W. Hwu, Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA, Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 2008.
10	Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, and kevin Skadron, A Performance Study of General-Purpose Applicaions on Graphics Processors Using CUDA, Journal of Parallel and Distributed Computing, University of Virginia, 2008.
11	NVIDIA. http://www.nvidia.com/object/product_geforc e_gtx_280_us.html, 2009.
12	NVIDIA. NVIDIA CUDA Compute Unified Device Architecture: Programming Guide (Version 2.3), 2009.
13	Maryam Moazeni, Alex Bui, and Majid Sarrafzadeh, A Memory Optimization Technique for Software- Managed Scratchpad Memory in GPUs, University of California, 2009.

1	Optimization of Color Format Conversion of WebCam Images Using the CUDA / [Kim, Jin-Woo;Jung, Yun-Hye;Park, Jin-Hong;Park, Yong-Jin;Han, Tack-Don;] / Journal of Korea Game Society
2	Multiple Camera Based Imaging System with Wide-view and High Resolution and Real-time Image Registration Algorithm / [Lee, Seung-Hyun;Kim, Min-Young;] / Journal of the Institute of Electronics Engineers of Korea SC

KSCI

Analysis of Programming Techniques for Creating Optimized CUDA Software 최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석

Analysis of Programming Techniques for Creating Optimized CUDA Software