[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7471/ikeee.2021.25.1.180

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network

Cho, Seok-Jae (Dept. of Electronics Engineering, Pusan National University)
Park, Sungkyung (Dept. of Electronics Engineering, Pusan National University)
Park, Chester Sungchung (Dept. of Electronics Engineering, Konkuk University)

Publication Information

Journal of IKEEE / v.25, no.1, 2021 , pp. 180-192 More about this Journal

Abstract

One widely used algorithm for image recognition and pattern detection is the convolution neural network (CNN). To efficiently handle convolution operations, which account for the majority of computations in the CNN, we use hardware accelerators to improve the performance of CNN applications. In using these hardware accelerators, the CNN fetches data from the off-chip DRAM, as the massive computational volume of data makes it difficult to derive performance improvements only from memory inside the hardware accelerator. In other words, data communication between off-chip DRAM and memory inside the accelerator has a significant impact on the performance of CNN applications. In this paper, a simulator for the CNN is developed to analyze the main memory or DRAM access with respect to the size of the on-chip memory or global buffer inside the CNN accelerator. For AlexNet, one of the CNN architectures, when simulated with increasing the size of the global buffer, we found that the global buffer of size larger than 100kB has 0.8x as low a DRAM access count as the global buffer of size smaller than 100kB.

Keywords

CNN; Simulator; Main Memory Access; Hardware Accelerator; Global Buffer; Scratchpad;

Citations & Related Records

Reference

1	K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6, 2015.
2	LeCun, Yann, Leon Bottou, Yoshua Bengio, and Patrick Haffner, "Gradient-based learning applied to document recognition," in IEEE, Vol.86, No.11, 1998. DOI: 10.1109/5.726791 DOI
3	C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. "CNP: An FPGA-based processor for convolutional networks," in Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp.32-37. 2009.
4	Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.
5	S. Ji, W. Xu, M. Yang, and K. Yu. "3D convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., Vol.35, No.1, pp.221-231, 2013. DOI: 10.1109/TPAMI.2012.59 DOI
6	S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. "A programmable parallel Accelerator for learning and classification," In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pp.273-284, 2010.
7	C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. "CNP: An FPGA-based processor for convolutional networks," In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp.32-37, 2009. DOI: 10.1109/FPL.2009.5272559 DOI
8	Y. Ma, Y. Cao, S. Vrudhula and J. Seo, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol.26, No.7, pp.1354-1367, 2018. DOI: 10.1109/TVLSI.2018.2815603 DOI
9	Lukas Cavigelli, Luca Benini "Origami: A 803 GOP/s/W Convolutional Network Accelerator," IEEE Transactions on Circuits and Systems for Video Technology, Vol.27, No.11, pp.2461-245, 2017. DOI: 10.1109/TCSVT.2016.2592330 DOI
10	V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. DOI: 10.1109/CVPRW.2014.106 DOI
11	Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam "DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning," in ASPLOS '14 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, pp.269-284, 2014. DOI: 10.1145/2541940.2541967 DOI
12	Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam, "Shidiannao: shifting vision processing closer to the sensor," in Proceedings of the 42nd. Annual International Symposium on Computer Architecture, pp.92-104, 2015. DOI: 10.1145/2749469.2750389 DOI
13	Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, XiaobingFeng, Xuehai Zhou, and Yunji Chen "PuDianNao: A Polyvalent Machine Learning Accelerator," in ASPLOS '15 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp.369-381, 2015. DOI: 10.1145/2694344.2694358 DOI
14	Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An energy-efficient reconfigurable Accelerator for deep convolutional neural networks," in IEEE Journal of Solid-State Circuits (JSSC), Vol.52, No.1, pp.127-138, 2017. DOI: 10.1109/JSSC.2016.2616357 DOI
15	Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," in 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. DOI: 10.1109/ISCA.2016.40 DOI
16	Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam, "DaDianNao: A Machine-Learning Supercomputer," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014. DOI: 10.1109/MICRO.2014.58 DOI
17	Radu Dogaru; Ioana Dogaru, "BCONV-ELM: Binary Weights Convolutional Neural Network Simulator based on Keras/Tensorflow, for Low Complexity Implementations," in ISEEE, 2019. DOI: 10.1109/ISEEE48094.2019.9136102 DOI
18	C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in FPGA, 2015. DOI: 10.1145/2684746.2689060 DOI
19	Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna, "SCALE-Sim: Systolic CNN Accelerator Simulator," in arXiv: 1811.02883, 2018.
20	Ananda Samajdar; Jan Moritz Joseph; Yuhao Zhu; Paul Whatmough; Matthew Mattina; Tushar Krishna, "A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim," in ISPASS, 2020.
21	Mengshu Sun, Pu Zhao, Yetang Wang Naehyuck Chang. Xue Lin "HSIM-DNN: Hardware Simulator for Computation-, Storage-and Power-Efficient Deep Neural Networks," in GLSVLSI, 2019. DOI: 10.1145/3299874.3317996 DOI
22	Francisco Munoz-Martinez, Jose L. Abellan, Manuel E. Acacio, Tushar Krishna "STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators," in arXiv:2006.07137v1, 2020.

KSCI

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network 인공 신경망 가속기 온칩 메모리 크기에 따른 주메모리 접근 횟수 추정에 대한 연구

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network