[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.2020-0125

Automated optimization for memory-efficient high-performance deep neural network accelerators

Kim, HyunMi (AI SoC Research Division, Electronics and Telecommunications Research Institute)
Lyuh, Chun-Gi (AI SoC Research Division, Electronics and Telecommunications Research Institute)
Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute)

Publication Information

ETRI Journal / v.42, no.4, 2020 , pp. 505-517 More about this Journal

Abstract

The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.

Keywords

accelerators; architecture; automation; deep neural network (DNN); optimization;

Citations & Related Records

Times Cited By KSCI : 4 (Citation Analysis)

Reference
Cited By KSCI

1	K. Arulkumaran, A. Cully, and J. Togelius, AlphaStar: An evolutionary computation perspective, arXiv preprint arXiv:1902.01724v2, 2019.
2	M. Fatima and M. Pasha, Survey of machine learning algorithms for disease diagnostic, J. Intell. Learn. Syst. Aapplicat. 9 (2017), 1-16. DOI
3	S. Grigorescu et al., A survey of deep learning techniques for autonomous driving, arXiv preprint arXiv:1910.07738, 2019.
4	I. S. Krizhevsky and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. Int. Conf. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1097-1105.
5	J. Albericio et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing, in Proc. Int. Symp. Comput. Architecture (Seoul, Rep. of Korea), (2016), 1-13.
6	S. Han et al., EIE: Efficient inference engine on compressed deep neural network, in Proc. Int. Symp. Computer Architecture (Seoul, Rep. of Korea), (2016), 243-254.
7	Y.-H. Chen et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, IEEE J. Solid- State Circuits 52 (2017), 127-138. DOI
8	Y. Chen et al., DaDianNao: A machine-learning supercomputer, in Proc. Int. Symp. Microarchitecture (Cambridge, UK), (2014), 609-622.
9	N. Jouppi et al., In-datacenter performance analysis of a tensor processing unit, in Proc. Int. Symp. Computer Architecture (Toronto, Canada), (2017), 1-12.
10	Y. H. Chen et al., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices, IEEE J. Emerg. Sel. Top. Circuits Syst. 9 (2019), 292-308. DOI
11	V. Sze et al., Efficient processing of deep neural networks: A tutorial and survey, Proc. IEEE 105 (2017), 2295-2329. DOI
12	R. Andri et al., YodaNN: An architecture for ultra-low power binary-weight cnn acceleration, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37 (2018), 48-60. DOI
13	Y. C. Yoon et al., Image classification and captioning model considering a CAM-based disagreement loss, ETRI J. 42 (2020), 67-77. DOI
14	J. Jung and J. Park, Improving visual relationship detection using linguistic and spatial cues, ETRI J. 42 (2020), 399-410. DOI
15	Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521 (2015), 436-444. DOI
16	I. Sutskever Krizhevsky and G. Hinton. Imagenet classification with deep convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1106-1114.
17	L. Besacier et al., Automatic speech recognition for under-resourced languages: a survey, Speech Commun. 56 (2014), 85-100. DOI
18	J. A. B. Fortes and B. W. Benjamin, Systolic arrays - From concept to implementation, IEEE Comput 20 (1987), 12-17.
19	K. He et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Nevada, USA), 2016, 770-778.
20	K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn. Representations (San Diego, USA), 2015.
21	J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, arXiv preprint, arXiv1612.08242, 2016.
22	J. Redmon and A. Farhadi, Yolov3: An incremental improvement, arXiv preprint, arXiv:1804.02767, 2018.
23	F. N. Iandola et al., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and; 0.5 mb model size, arXiv preprint, arXiv:1602.07360, 2016.
24	Y. Kwon et al., Function-safe vehicular ai processor with nano core-in-memory architecture, in Proc. IEEE Int. Conf. Art. Intel. Circuits Syst. (Hsinchu, Taiwan), 2019, 127-131.
25	O. Russakovsky et al., ImageNet large scale visual recognition challenge, arXiv preprint, arXiv:1409.0575, 2014.

2	(2020) 전자통신동향분석 인공지능 프로세서 컴파일러 개발 동향 / 36 (2) , 32
7	(2020) Sensors Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory / 21 (7) , 2364
6	(2020) International journal of engineering and advanced technology Memory Optimization Techniques in Neural Networks: A Review / 10 (6) , 44