Browse > Article
http://dx.doi.org/10.6109/jkiice.2020.24.6.714

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue  

Cho, Ho-jin (Department of Applied IT Engineering, Hansung University)
Kim, Myung-sun (Department of IT Convergence Engineering, Hansung University)
Abstract
DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.
Keywords
Autonomous Vehicles; Deep Neural Network (DNN); Embedded Distributed System; Multicore;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 S. Lin, Y. Zhang, C. Hong, M. Skach, M. Haque L. Tang and J. Mars, " The Architectural Implications of Autonomous Driving: Constraints and Acceleration," in Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, Williamsburg, USA, pp. 751-66, 2018.
2 J. Dyrstad and J. Mathiassen, "Grasping virtual fish: A step towards robotic deep learning from demonstration in virtual reality," in Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, China, 2017.
3 D. Vasisht. Z. Kapetanovic, J. Won, X. Jin, R. Chandra, A. Kapoor, N. sinha, and M. Sudarshan, "FarmBeats: An IoT Platform for Data-Driven Agriculture," in Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, Boston, USA, 2017.
4 T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake, Utah, pp. 269-284, 2014.
5 V. Sze, Y. Chen, T. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Jan. 2017.   DOI
6 Jetson AGX Xavier Developer Kit [Internet]. Available: https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit.
7 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770-778, 2016.
8 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," in Proceedings of the 26th Conference on Neural Information Processing Systems, Lake Tahoe, pp. 1097-1105, 2012.
9 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proceedings of the International Conference on Learning Representations, San Diego, CA, 2015.
10 H. Kim, J. Kim, and H. Jung, "Convolutional Neural Network Based Image Processing System," Journal of Information and Communication Convergence Engineering, vol. 16, no. 3, pp. 160-165, Sep. 2018.   DOI
11 Caffe, Deep learning framework by BAIR [Internet]. Available: http://caffe.berkeleyvision.org/.
12 Torch, [Internet]. Available: http://torch.ch/.
13 L. Nguyen, D. Lin, Z. Lin and J. Cao, "Deep CNNs for microscopic image classification by exploiting transfer learning and feature concatenation", in Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 2018.
14 TensorFlow, [Internet]. Available: http://download.tensorflow.org/paper/whitepaper2015.pdf/.
15 S. Huh, J. Yoo, M. Kim and S. Hong, "Providing Fair Share Scheduling on Multicore Cloud Servers via Virtual Runtime-based Task Migration Algorithm", in Proceedings of the 32nd IEEE International Conference on Distributed Computing Systems (ICDCS), Macau, China pp. 606-614, 2012.
16 S. Eyerman and L. Eeckhout,"System-Level Performance Metrics for Multiprogram Workloads" in Micro, IEEE. vol. 28, pp. 42-53, 2008.
17 X. Yu, N. Zeng, S. Liu and Y. Zhang, "Utilization of DenseNet201 for diagnosis of breast abnormality", in Machine Vision and Applications. vol. 30, Oct. 2019.