Browse > Article
http://dx.doi.org/10.7471/ikeee.2021.25.1.180

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network  

Cho, Seok-Jae (Dept. of Electronics Engineering, Pusan National University)
Park, Sungkyung (Dept. of Electronics Engineering, Pusan National University)
Park, Chester Sungchung (Dept. of Electronics Engineering, Konkuk University)
Publication Information
Journal of IKEEE / v.25, no.1, 2021 , pp. 180-192 More about this Journal
Abstract
One widely used algorithm for image recognition and pattern detection is the convolution neural network (CNN). To efficiently handle convolution operations, which account for the majority of computations in the CNN, we use hardware accelerators to improve the performance of CNN applications. In using these hardware accelerators, the CNN fetches data from the off-chip DRAM, as the massive computational volume of data makes it difficult to derive performance improvements only from memory inside the hardware accelerator. In other words, data communication between off-chip DRAM and memory inside the accelerator has a significant impact on the performance of CNN applications. In this paper, a simulator for the CNN is developed to analyze the main memory or DRAM access with respect to the size of the on-chip memory or global buffer inside the CNN accelerator. For AlexNet, one of the CNN architectures, when simulated with increasing the size of the global buffer, we found that the global buffer of size larger than 100kB has 0.8x as low a DRAM access count as the global buffer of size smaller than 100kB.
Keywords
CNN; Simulator; Main Memory Access; Hardware Accelerator; Global Buffer; Scratchpad;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6, 2015.
2 LeCun, Yann, Leon Bottou, Yoshua Bengio, and Patrick Haffner, "Gradient-based learning applied to document recognition," in IEEE, Vol.86, No.11, 1998. DOI: 10.1109/5.726791   DOI
3 C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. "CNP: An FPGA-based processor for convolutional networks," in Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp.32-37. 2009.
4 Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.
5 S. Ji, W. Xu, M. Yang, and K. Yu. "3D convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., Vol.35, No.1, pp.221-231, 2013. DOI: 10.1109/TPAMI.2012.59   DOI
6 S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. "A programmable parallel Accelerator for learning and classification," In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pp.273-284, 2010.
7 C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. "CNP: An FPGA-based processor for convolutional networks," In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp.32-37, 2009. DOI: 10.1109/FPL.2009.5272559   DOI
8 Y. Ma, Y. Cao, S. Vrudhula and J. Seo, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol.26, No.7, pp.1354-1367, 2018. DOI: 10.1109/TVLSI.2018.2815603   DOI
9 Lukas Cavigelli, Luca Benini "Origami: A 803 GOP/s/W Convolutional Network Accelerator," IEEE Transactions on Circuits and Systems for Video Technology, Vol.27, No.11, pp.2461-245, 2017. DOI: 10.1109/TCSVT.2016.2592330   DOI
10 V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. DOI: 10.1109/CVPRW.2014.106   DOI
11 Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam "DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning," in ASPLOS '14 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, pp.269-284, 2014. DOI: 10.1145/2541940.2541967   DOI
12 Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam, "Shidiannao: shifting vision processing closer to the sensor," in Proceedings of the 42nd. Annual International Symposium on Computer Architecture, pp.92-104, 2015. DOI: 10.1145/2749469.2750389   DOI
13 Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, XiaobingFeng, Xuehai Zhou, and Yunji Chen "PuDianNao: A Polyvalent Machine Learning Accelerator," in ASPLOS '15 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp.369-381, 2015. DOI: 10.1145/2694344.2694358   DOI
14 Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An energy-efficient reconfigurable Accelerator for deep convolutional neural networks," in IEEE Journal of Solid-State Circuits (JSSC), Vol.52, No.1, pp.127-138, 2017. DOI: 10.1109/JSSC.2016.2616357   DOI
15 Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," in 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. DOI: 10.1109/ISCA.2016.40   DOI
16 Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam, "DaDianNao: A Machine-Learning Supercomputer," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014. DOI: 10.1109/MICRO.2014.58   DOI
17 Radu Dogaru; Ioana Dogaru, "BCONV-ELM: Binary Weights Convolutional Neural Network Simulator based on Keras/Tensorflow, for Low Complexity Implementations," in ISEEE, 2019. DOI: 10.1109/ISEEE48094.2019.9136102   DOI
18 C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in FPGA, 2015. DOI: 10.1145/2684746.2689060   DOI
19 Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna, "SCALE-Sim: Systolic CNN Accelerator Simulator," in arXiv: 1811.02883, 2018.
20 Ananda Samajdar; Jan Moritz Joseph; Yuhao Zhu; Paul Whatmough; Matthew Mattina; Tushar Krishna, "A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim," in ISPASS, 2020.
21 Mengshu Sun, Pu Zhao, Yetang Wang Naehyuck Chang. Xue Lin "HSIM-DNN: Hardware Simulator for Computation-, Storage-and Power-Efficient Deep Neural Networks," in GLSVLSI, 2019. DOI: 10.1145/3299874.3317996   DOI
22 Francisco Munoz-Martinez, Jose L. Abellan, Manuel E. Acacio, Tushar Krishna "STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators," in arXiv:2006.07137v1, 2020.