[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7471/ikeee.2021.25.4.664

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution

Nam, Ki-Hun (Dept. of Computer Eng.)
Lee, Kwang-Yeob (Dept. of Computer Eng.)
Jung, Jun-Mo (Dept. of Electronics Eng., Seokyeong University)

Publication Information

Journal of IKEEE / v.25, no.4, 2021 , pp. 664-671 More about this Journal

Abstract

This paper proposes a GPGPU structure that can reduce the number of operations and memory access by effectively applying a data reuse method to a convolutional neural network(CNN). Convolution is a two-dimensional operation using kernel and input data, and the operation is performed by sliding the kernel. In this case, a reuse method using an internal register is proposed instead of loading kernel from a cache memory until the convolution operation is completed. The serial operation method was applied to the convolution to increase the effect of data reuse by using the principle of GPGPU in which instructions are executed by the SIMT method. In this paper, for register-based data reuse, the kernel was fixed at 4×4 and GPGPU was designed considering the warp size and register bank to effectively support it. To verify the performance of the designed GPGPU on the CNN, we implemented it as an FPGA and then ran LeNet and measured the performance on AlexNet by comparison using TensorFlow. As a result of the measurement, 1-iteration learning speed based on AlexNet is 0.468sec and the inference speed is 0.135sec.

Keywords

Data Reuse; CNN; GPGPU; Row stationary; SIMT; Warp; Register bank;

Citations & Related Records

Reference

1	Ahmad Lashgar, A. Baniasadi, & A. Khonsari. "Investigating Warp Size Impact in GPUs. Computer Sciencear," ArXiv:1205.4967, 2012.
2	https://www.image-net.org/challenges/LSVRC/
3	Kwang Yeob Lee, "Design of a High-Performance Mobile GPGPU with SIMT Architecture based on a Small-size Warp Scheduler," j.inst.Korean. electr.electron.eng, Vol.25, No.3, pp.479-484, 2021. DOI: 10.7471/ikeee.2021.25.3.479 DOI
4	Cheol-Won Jo, Kwang-Yeob Lee, Chi-Yong Kim, "Low-area DNN Core using data reuse technique," j.inst.Korean.electr.electron.eng, Vol.25, No.1, pp.229-233, 2021. DOI: 10.7471/ikeee.2021.25.1.229 DOI
5	Chen, Yu-Hsin, Joel Emer, and Vivienne Sze. "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," 43rd ACM/IEEE International Symposium on Computer Architecture (ISCA), 2016. DOI: 10.1109/ISCA.2016.40 DOI
6	Firas Al-Ali, Thilina Doremure Gamage, Hewa WTX Nanayakkara, Farhad Methdipour, Sayan Kumar Ray, "Novel Casestudy and Benchmarking of AlexNet for Edge AI: From CPU and GPU to FPGA," 2020 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), 2020. DOI: 10.1109/ CCECE477 87.2020.9255739, DOI
7	Sunayana Arya, Rajeev Singh, "A Comparative Study of CNN and AlexNet for Detection of Disease in Potato and Mango leaf," 2019 2nd International Conference on Issues and Challenges in Intelligent Computing Techniques(ICICT), Vol.1, pp.1-6, 2019. DOI: 10.1109/ICICT46931.2019.8977648, DOI
8	Alex Krizhevsky, Ilya Sutshever, Geoffrey E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Communications of the ACM, Vol.60, No.6, pp.84-90, 2017. DOI: 10.1145/3065386. DOI

KSCI

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution 딥러닝 합성곱에서 데이터 재사용에 최적화된 GPGPU 설계

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution