Automated optimization for memory-efficient high-performance deep neural network accelerators

Kim, HyunMi;Lyuh, Chun-Gi;Kwon, Youngsu;

doi:10.4218/etrij.2020-0125

ETRI Journal

Volume 42 Issue 4
/
Pages.505-517
/
2020
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Automated optimization for memory-efficient high-performance deep neural network accelerators

Kim, HyunMi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Lyuh, Chun-Gi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute)

Received : 2020.03.28
Accepted : 2020.07.02
Published : 2020.08.18

https://doi.org/10.4218/etrij.2020-0125 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The increasing size and complexity of deep neural networks (DNNs) necessitate the development of efficient high-performance accelerators. An efficient memory structure and operating scheme provide an intuitive solution for high-performance accelerators along with dataflow control. Furthermore, the processing of various neural networks (NNs) requires a flexible memory architecture, programmable control scheme, and automated optimizations. We first propose an efficient architecture with flexibility while operating at a high frequency despite the large memory and PE-array sizes. We then improve the efficiency and usability of our architecture by automating the optimization algorithm. The experimental results show that the architecture increases the data reuse; a diagonal write path improves the performance by 1.44× on average across a wide range of NNs. The automated optimizations significantly enhance the performance from 3.8× to 14.79× and further provide usability. Therefore, automating the optimization as well as designing an efficient architecture is critical to realizing high-performance DNN accelerators.

Keywords

References

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature 521 (2015), 436-444. https://doi.org/10.1038/nature14539
L. Besacier et al., Automatic speech recognition for under-resourced languages: a survey, Speech Commun. 56 (2014), 85-100. https://doi.org/10.1016/j.specom.2013.07.008
K. Arulkumaran, A. Cully, and J. Togelius, AlphaStar: An evolutionary computation perspective, arXiv preprint arXiv:1902.01724v2, 2019.
M. Fatima and M. Pasha, Survey of machine learning algorithms for disease diagnostic, J. Intell. Learn. Syst. Aapplicat. 9 (2017), 1-16. https://doi.org/10.4236/jilsa.2017.91001
S. Grigorescu et al., A survey of deep learning techniques for autonomous driving, arXiv preprint arXiv:1910.07738, 2019.
I. S. Krizhevsky and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. Int. Conf. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1097-1105.
J. Albericio et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing, in Proc. Int. Symp. Comput. Architecture (Seoul, Rep. of Korea), (2016), 1-13.
S. Han et al., EIE: Efficient inference engine on compressed deep neural network, in Proc. Int. Symp. Computer Architecture (Seoul, Rep. of Korea), (2016), 243-254.
Y.-H. Chen et al., Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, IEEE J. Solid- State Circuits 52 (2017), 127-138. https://doi.org/10.1109/JSSC.2016.2616357
Y. Chen et al., DaDianNao: A machine-learning supercomputer, in Proc. Int. Symp. Microarchitecture (Cambridge, UK), (2014), 609-622.
N. Jouppi et al., In-datacenter performance analysis of a tensor processing unit, in Proc. Int. Symp. Computer Architecture (Toronto, Canada), (2017), 1-12.
Y. H. Chen et al., Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices, IEEE J. Emerg. Sel. Top. Circuits Syst. 9 (2019), 292-308. https://doi.org/10.1109/JETCAS.2019.2910232
V. Sze et al., Efficient processing of deep neural networks: A tutorial and survey, Proc. IEEE 105 (2017), 2295-2329. https://doi.org/10.1109/JPROC.2017.2761740
R. Andri et al., YodaNN: An architecture for ultra-low power binary-weight cnn acceleration, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37 (2018), 48-60. https://doi.org/10.1109/TCAD.2017.2682138
Y. C. Yoon et al., Image classification and captioning model considering a CAM-based disagreement loss, ETRI J. 42 (2020), 67-77. https://doi.org/10.4218/etrij.2018-0621
J. Jung and J. Park, Improving visual relationship detection using linguistic and spatial cues, ETRI J. 42 (2020), 399-410. https://doi.org/10.4218/etrij.2019-0093
J. A. B. Fortes and B. W. Benjamin, Systolic arrays - From concept to implementation, IEEE Comput 20 (1987), 12-17.
K. He et al., Deep residual learning for image recognition, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Nevada, USA), 2016, 770-778.
I. Sutskever Krizhevsky and G. Hinton. Imagenet classification with deep convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst. (Nevada, USA), 2012, 1106-1114.
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. Int. Conf. Learn. Representations (San Diego, USA), 2015.
J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, arXiv preprint, arXiv1612.08242, 2016.
J. Redmon and A. Farhadi, Yolov3: An incremental improvement, arXiv preprint, arXiv:1804.02767, 2018.
F. N. Iandola et al., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and; 0.5 mb model size, arXiv preprint, arXiv:1602.07360, 2016.
Y. Kwon et al., Function-safe vehicular ai processor with nano core-in-memory architecture, in Proc. IEEE Int. Conf. Art. Intel. Circuits Syst. (Hsinchu, Taiwan), 2019, 127-131.
O. Russakovsky et al., ImageNet large scale visual recognition challenge, arXiv preprint, arXiv:1409.0575, 2014.

Cited by

인공지능 프로세서 컴파일러 개발 동향 vol.36, pp.2, 2020, https://doi.org/10.22648/etri.2021.j.360204
Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory vol.21, pp.7, 2020, https://doi.org/10.3390/s21072364
Memory Optimization Techniques in Neural Networks: A Review vol.10, pp.6, 2020, https://doi.org/10.35940/ijeat.f2991.0810621

ETRI Journal

Automated optimization for memory-efficient high-performance deep neural network accelerators

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)