Browse > Article
http://dx.doi.org/10.4218/etrij.2020-0125

Automated optimization for memory-efficient high-performance deep neural network accelerators  

Kim, HyunMi (AI SoC Research Division, Electronics and Telecommunications Research Institute)
Lyuh, Chun-Gi (AI SoC Research Division, Electronics and Telecommunications Research Institute)
Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute)
Publication Information
ETRI Journal / v.42, no.4, 2020 , pp. 505-517 More about this Journal
Abstract
The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.
Keywords
accelerators; architecture; automation; deep neural network (DNN); optimization;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 K. Arulkumaran, A. Cully, and J. Togelius, AlphaStar: An evolutionary computation perspective, arXiv preprint arXiv:1902.01724v2, 2019.
2 M. Fatima and M. Pasha, Survey of machine learning algorithms for disease diagnostic, J. Intell. Learn. Syst. Aapplicat. 9 (2017), 1-16.   DOI
3 S. Grigorescu et al., A survey of deep learning techniques for autonomous driving, arXiv preprint arXiv:1910.07738, 2019.
4 I. S. Krizhevsky and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. Int. Conf. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1097-1105.
5 J. Albericio et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing, in Proc. Int. Symp. Comput. Architecture (Seoul, Rep. of Korea), (2016), 1-13.
6 S. Han et al., EIE: Efficient inference engine on compressed deep neural network, in Proc. Int. Symp. Computer Architecture (Seoul, Rep. of Korea), (2016), 243-254.
7 Y.-H. Chen et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, IEEE J. Solid- State Circuits 52 (2017), 127-138.   DOI
8 Y. Chen et al., DaDianNao: A machine-learning supercomputer, in Proc. Int. Symp. Microarchitecture (Cambridge, UK), (2014), 609-622.
9 N. Jouppi et al., In-datacenter performance analysis of a tensor processing unit, in Proc. Int. Symp. Computer Architecture (Toronto, Canada), (2017), 1-12.
10 Y. H. Chen et al., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices, IEEE J. Emerg. Sel. Top. Circuits Syst. 9 (2019), 292-308.   DOI
11 V. Sze et al., Efficient processing of deep neural networks: A tutorial and survey, Proc. IEEE 105 (2017), 2295-2329.   DOI
12 R. Andri et al., YodaNN: An architecture for ultra-low power binary-weight cnn acceleration, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37 (2018), 48-60.   DOI
13 Y. C. Yoon et al., Image classification and captioning model considering a CAM-based disagreement loss, ETRI J. 42 (2020), 67-77.   DOI
14 J. Jung and J. Park, Improving visual relationship detection using linguistic and spatial cues, ETRI J. 42 (2020), 399-410.   DOI
15 Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521 (2015), 436-444.   DOI
16 I. Sutskever Krizhevsky and G. Hinton. Imagenet classification with deep convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1106-1114.
17 L. Besacier et al., Automatic speech recognition for under-resourced languages: a survey, Speech Commun. 56 (2014), 85-100.   DOI
18 J. A. B. Fortes and B. W. Benjamin, Systolic arrays - From concept to implementation, IEEE Comput 20 (1987), 12-17.
19 K. He et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Nevada, USA), 2016, 770-778.
20 K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn. Representations (San Diego, USA), 2015.
21 J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, arXiv preprint, arXiv1612.08242, 2016.
22 J. Redmon and A. Farhadi, Yolov3: An incremental improvement, arXiv preprint, arXiv:1804.02767, 2018.
23 F. N. Iandola et al., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and; 0.5 mb model size, arXiv preprint, arXiv:1602.07360, 2016.
24 Y. Kwon et al., Function-safe vehicular ai processor with nano core-in-memory architecture, in Proc. IEEE Int. Conf. Art. Intel. Circuits Syst. (Hsinchu, Taiwan), 2019, 127-131.
25 O. Russakovsky et al., ImageNet large scale visual recognition challenge, arXiv preprint, arXiv:1409.0575, 2014.