Browse > Article
http://dx.doi.org/10.7838/jsebs.2018.23.1.037

A Study on GPGPU Performance Improvement Technique on GCN Architecture Using OpenCL API  

Woo, DongHee (Graduate School of Computer Science, Sangmyung University)
Kim, YoonHo (Department of Computer Science, Sangmyung University)
Publication Information
The Journal of Society for e-Business Studies / v.23, no.1, 2018 , pp. 37-45 More about this Journal
Abstract
The current system upon which a variety of programs are in operation has continuously expanded its domain from conventional single-core and multi-core system to many-core and heterogeneous system. However, existing researches have focused mostly on parallelizing programs based CUDA framework and rarely on AMD based GCN-GPU optimization. In light of the aforementioned problems, our study focuses on the optimization techniques of the GCN architecture in a GPGPU environment and achieves a performance improvement. Specifically, by using performance techniques we propose, we have reduced more then 30% of the computation time of matrix multiplication and convolution algorithm in GPGPU. Also, we increase the kernel throughput by more then 40%.
Keywords
OpenCL; Optimization; GP-GPU; GCN Architecture; GPU;
Citations & Related Records
연도 인용수 순위
  • Reference
1 AMD OpenCL Programming User Guide.
2 Aritsugi, M., Fukatsu, H., and Kanamori, Y., “Parallel Image Convolution Processing with Replicas in a Network of Workstations,” Institute of Electronics Information and Communication, Vol. 88, No. 6, pp. 1199-1209, 2005.
3 Choi, H. J. and Kim, C. H., "Performance Evaluation of the GPU Architecture Executing Parallel Applications," The Korea Contents Society, Vol. 12, No. 5, 10-21, 2012.
4 Fraire, J. A., Ferreyra, A., and Marques, C., “OpenCL Overview, Implementation, and Performance Comparison,” IEEE, Vol. 11, No. 1, pp. 274-280, 2013.
5 http://www.amd.com/ko-kr.
6 http://www.khronos.org/opencl/.
7 Huang, D., Wen, M., Xun, C., Chen, D., Cai, X., Qiao, Y., Wu, N., and Zhang, C., "Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Muiti-Core/Many-Core CPUs," Lecture Notes in Computer Science, No. 8632, pp. 210-221, 2014.
8 Jung, H. I., Park, I. S., and Ahn, H. C., “Identifying the Key Success Factors of Massively Multiplayer Online Role Playing Game Design using Artificial Neural Networks,” The Journal of Society for e-Business Studies, Vol. 17, No. 1, pp. 23-38, 2012.   DOI
9 Lee, D., Dinov, I., Dong, B., Gutman, B., Yanovsky, I., and Toga, A. W., “CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms,” Computer Methods and Programs in Biomedicine, Vol. 106, No. 3, pp. 175-187, 2012.   DOI
10 Lee, S. G., “Enhancing Performance of Embedded System using FPGA Processor,” Namseoul University Press, Vol. 7, No. 1, pp. 56-67, 2010.
11 Lee, Y. H. and Kim, Y. J., “Parallel Intersection Detection Algorithm using CUDA,” HCI, Vol. 2008, No. 2, pp. 451-455, 2008.
12 Moon, H. J., Jeon, J. N., and Kim, S., “A Performance Analysis for Benchmarks on Heterogeneous Environment,” KISS, Vol. 23, No. 2B, pp. 1635-1638, 1996.
13 Oyarzun, G., Borrell, R., Gorobets, A., and Oliva, A., "MPI-CUDA sparse matrixvector multiplication for the conjugate gradient method with an approximate inverse preconditioner," Computers & Fluids, Vol. 92, pp. 244-252, 2014.   DOI
14 Venetillo, J. S. and Celes, W., "GPU-based particle simulation with inter-collisions," The Visual Computer, Vol. 23, No. 9-11, pp. 851-860, 2007   DOI