DOI QR코드

DOI QR Code

Research on the Main Memory Access Count According to the On-Chip Memory Size of an Artificial Neural Network

인공 신경망 가속기 온칩 메모리 크기에 따른 주메모리 접근 횟수 추정에 대한 연구

  • Received : 2021.03.07
  • Accepted : 2021.03.23
  • Published : 2021.03.31

Abstract

One widely used algorithm for image recognition and pattern detection is the convolution neural network (CNN). To efficiently handle convolution operations, which account for the majority of computations in the CNN, we use hardware accelerators to improve the performance of CNN applications. In using these hardware accelerators, the CNN fetches data from the off-chip DRAM, as the massive computational volume of data makes it difficult to derive performance improvements only from memory inside the hardware accelerator. In other words, data communication between off-chip DRAM and memory inside the accelerator has a significant impact on the performance of CNN applications. In this paper, a simulator for the CNN is developed to analyze the main memory or DRAM access with respect to the size of the on-chip memory or global buffer inside the CNN accelerator. For AlexNet, one of the CNN architectures, when simulated with increasing the size of the global buffer, we found that the global buffer of size larger than 100kB has 0.8x as low a DRAM access count as the global buffer of size smaller than 100kB.

이미지 인식 및 패턴 감지를 위해 널리 사용되는 알고리즘 중 하나는 convolution neural network(CNN)이다. CNN에서 대부분의 연산량을 차지하는 convolution 연산을 효율적으로 처리하기 위해 외부 하드웨어 가속기를 사용하여 CNN 어플리케이션의 성능을 향상 시킬 수 있다. 이러한 하드웨어 가속기를 사용함에 있어서 CNN은 막대한 연산량을 처리하기 위해 오프칩 DRAM에서 가속기 내부의 메모리로 데이터를 갖고 와야 한다. 즉 오프칩 DRAM과 가속기 내부의 온칩 메모리 혹은 글로벌 버퍼 사이의 데이터 통신이 CNN 어플리케이션의 성능에 큰 영향을 끼친다. 본 논문에서는 CNN 가속기 내의 온칩 메모리 혹은 글로벌 버퍼의 크기에 따른 주메모리 혹은 DRAM으로의 접근 횟수를 추산할 수 있는 시뮬레이터를 개발하였다. CNN 아키텍처 중 하나인 AlexNet에서, CNN 가속기 내부의 글로벌 버퍼의 크기를 증가시키면서 시뮬레이션 했을 때, 글로벌 버퍼 크기가 100kB 이상인 경우가 100kB 미만인 경우보다 가속기 내부와 오프칩 DRAM 간의 접근 횟수가 0.8배 낮은 것을 확인 했다.

Keywords

References

  1. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6, 2015.
  2. LeCun, Yann, Leon Bottou, Yoshua Bengio, and Patrick Haffner, "Gradient-based learning applied to document recognition," in IEEE, Vol.86, No.11, 1998. DOI: 10.1109/5.726791
  3. C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. "CNP: An FPGA-based processor for convolutional networks," in Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp.32-37. 2009.
  4. Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.
  5. S. Ji, W. Xu, M. Yang, and K. Yu. "3D convolutional neural networks for human action recognition," IEEE Trans. Pattern Anal. Mach. Intell., Vol.35, No.1, pp.221-231, 2013. DOI: 10.1109/TPAMI.2012.59
  6. S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. "A programmable parallel Accelerator for learning and classification," In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pp.273-284, 2010.
  7. C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. "CNP: An FPGA-based processor for convolutional networks," In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pp.32-37, 2009. DOI: 10.1109/FPL.2009.5272559
  8. Y. Ma, Y. Cao, S. Vrudhula and J. Seo, "Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol.26, No.7, pp.1354-1367, 2018. DOI: 10.1109/TVLSI.2018.2815603
  9. Lukas Cavigelli, Luca Benini "Origami: A 803 GOP/s/W Convolutional Network Accelerator," IEEE Transactions on Circuits and Systems for Video Technology, Vol.27, No.11, pp.2461-245, 2017. DOI: 10.1109/TCSVT.2016.2592330
  10. V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. DOI: 10.1109/CVPRW.2014.106
  11. Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam, "Shidiannao: shifting vision processing closer to the sensor," in Proceedings of the 42nd. Annual International Symposium on Computer Architecture, pp.92-104, 2015. DOI: 10.1145/2749469.2750389
  12. Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, XiaobingFeng, Xuehai Zhou, and Yunji Chen "PuDianNao: A Polyvalent Machine Learning Accelerator," in ASPLOS '15 Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp.369-381, 2015. DOI: 10.1145/2694344.2694358
  13. Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An energy-efficient reconfigurable Accelerator for deep convolutional neural networks," in IEEE Journal of Solid-State Circuits (JSSC), Vol.52, No.1, pp.127-138, 2017. DOI: 10.1109/JSSC.2016.2616357
  14. Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks," in 43rd Annual International Symposium on Computer Architecture (ISCA), 2016. DOI: 10.1109/ISCA.2016.40
  15. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam, "DaDianNao: A Machine-Learning Supercomputer," in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014. DOI: 10.1109/MICRO.2014.58
  16. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam "DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning," in ASPLOS '14 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, pp.269-284, 2014. DOI: 10.1145/2541940.2541967
  17. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in FPGA, 2015. DOI: 10.1145/2684746.2689060
  18. Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna, "SCALE-Sim: Systolic CNN Accelerator Simulator," in arXiv: 1811.02883, 2018.
  19. Ananda Samajdar; Jan Moritz Joseph; Yuhao Zhu; Paul Whatmough; Matthew Mattina; Tushar Krishna, "A Systematic Methodology for Characterizing Scalability of DNN Accelerators using SCALE-Sim," in ISPASS, 2020.
  20. Radu Dogaru; Ioana Dogaru, "BCONV-ELM: Binary Weights Convolutional Neural Network Simulator based on Keras/Tensorflow, for Low Complexity Implementations," in ISEEE, 2019. DOI: 10.1109/ISEEE48094.2019.9136102
  21. Mengshu Sun, Pu Zhao, Yetang Wang Naehyuck Chang. Xue Lin "HSIM-DNN: Hardware Simulator for Computation-, Storage-and Power-Efficient Deep Neural Networks," in GLSVLSI, 2019. DOI: 10.1145/3299874.3317996
  22. Francisco Munoz-Martinez, Jose L. Abellan, Manuel E. Acacio, Tushar Krishna "STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators," in arXiv:2006.07137v1, 2020.