[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.6109/jkiice.2022.26.6.842

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing

Kim, Ji-Min (Department of IT Convergence Engineering, Hansung University)
Kim, In-Mo (Department of IT Convergence Engineering, Hansung University)
Kim, Myung-Sun (Department of Applied Artificial Intelligence, Hansung University)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.26, no.6, 2022 , pp. 842-849 More about this Journal

Abstract

With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.

Keywords

Tucker Decomposition; Multi-DNN; Multi-Stream; Embedded GPU;

Citations & Related Records

Reference

1	Jetson AGX Xavier Developer Kit [Internet]. Available: https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.
2	F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size," arXiv preprint arXiv:1602.07360, 2016. DOI: 10.48550/arXiv.1602.07360.
3	K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proceedings of the International Conference on Learning Representations, San Diego: CA, USA, pp. 1-14, 2015.
4	ImageNet 1-crop error rates [Internet]. Available: https://pytorch.org/vision/stable/modes.html.
5	MPS(multi-process service) [Internet]. Available: https://docs.nvidia.com/deploy/mps/index.html.
6	A. Krizhevsky, "Learning Multiple Layers of Features from Tiny Images," M. S. theses, University of Tront, Toronto: ON, Canada, 2009.
7	A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brabury, G. Chanan, T. Kileen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in Proceedings of the Neural Information Processing Systems (NeurIPS), Vancouver: BC, Canada, pp. 8024-8035, 2019.
8	C. Lim and M. Kim, "ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerators," IEEE Access, vol. 9, pp. 85403-85417, Jun. 2021. DOI: 10.1109/ACCESS.2021.308861. DOI
9	C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas: NV, USA, pp. 2818-2826, 2016.
10	A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolution neural networks," in Proceedings of the 25th Conference of Neural Information Processing Systems, Lake Tahoe, Neveda, pp. 1106-1114, 2012.
11	I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, "Designing Network Design Spaces," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle: WA, USA, pp. 10428-10436, 2020.
12	M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steniner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: A System for Large-Scale Machine Learning," in Proceedings of the 12th USENIX Symposium on Operating System Design and Implementation (OSDI'16), Savannah: GA, USA, pp. 265-283, 2016.
13	T. G. Kolda and B. W. Bader "Tensor Decompositions and Applications," SIAM Review, vol. 51, no. 3, pp. 455-500, Sep. 2009. DOI
14	L. R. Tucker, "Some Mathematical notes on three-mode factor analysis," Psychometrika, vol. 31, no. 3, pp. 279-311, Sep. 1966. DOI
15	T. Amert, N. Otterness, M. Yang, J. H. Anderson, and F. D. Smith, "GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed," in Proceedings of the IEEE Real-Time Systems Symposium (RTSS), Paris, France, pp. 104-115, 2017. DOI: 10.1109/RTSS.2017.00017. DOI
16	L. D. Nguyen, D. Lin, Z. Lin, and J. Cao, "Deep CNNs for microscopic image classification by exploiting transfer learning and feature concatenation," in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, pp. 1-5, 2018.
17	J. Peng, L. Tian, X. Jia, H. Guo, Y. Xu, D. Xie, H. Luo, Y. Shan, Y. Shan, and Y. Wang, "Multi-task ADAS system on FPGA," in Proceedings of the IEEE Interntaional Conference on Artificial Intelligence Circuits and Systems (AICAS), Hsinchu, Taiwan, pp. 171-174, 2019. DOI: 10.1109/AICAS.2019.877615. DOI
18	J. Chandrasekaran, Y. Lei, R. Kacker, and D. R. Kuhn, "A Combinatorial Approach to Testing Deep Neural Network-based Autonomous Driving Systems," in Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Proto de Galinhad, Brazil, pp. 57-66, 2021. DOI: 10.1109/ICSTW52544.2021.00022. DOI

KSCI

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing 터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing