DOI QR코드

DOI QR Code

TVM-based Performance Optimization for Image Classification in Embedded Systems

임베디드 시스템에서의 객체 분류를 위한 TVM기반의 성능 최적화 연구

  • Received : 2023.03.23
  • Accepted : 2023.05.18
  • Published : 2023.06.30

Abstract

Optimizing the performance of deep neural networks on embedded systems is a challenging task that requires efficient compilers and runtime systems. We propose a TVM-based approach that consists of three steps: quantization, auto-scheduling, and ahead-of-time compilation. Our approach reduces the computational complexity of models without significant loss of accuracy, and generates optimized code for various hardware platforms. We evaluate our approach on three representative CNNs using ImageNet Dataset on the NVIDIA Jetson AGX Xavier board and show that it outperforms baseline methods in terms of processing speed.

Keywords

Acknowledgement

본 논문은 2021년도 정부 (과학기술정보통신부)의 재원으로 '자율주행기술개발혁신사업'의 지원을 받아 수행된 연구임 (No.2021-0-00905, (3세부) Cloud, Edge, Car 3-Tier 연계 인지/판단/제어 SW 및 공통 SW 플랫폼 기술 개발).

References

  1. R. Padilla, S. L. Netto, E. A. Da Silva, "A Survey on Performance Metrics for Object-detection Algorithms," 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 237-242, 2020.
  2. A. Kumar, A. Kaur, M. Kumar, "Face Detection Techniques: A Review," Artificial Intelligence Review Vol. 52, pp. 927-948, 2019. https://doi.org/10.1007/s10462-018-9650-2
  3. Y. Kang, Z. Cai, C. W. Tan, Q. Huang, H. Liu, "Natural Language Processing (NLP) in Management Research: A Literature Review," Journal of Management Analytics Vol. 7, No. 2, pp. 139-172, 2020. https://doi.org/10.1080/23270012.2020.1756939
  4. J ,Chen, X. Ran, "Deep Learning with Edge Computing: A Review," Proceedings of the IEEE Vol. 107, No. 8, pp. 1655-1674, 2019. https://doi.org/10.1109/JPROC.2019.2921977
  5. T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, "TVM: An Automated End-to-end Optimizing Compiler for Deep Learning," arXiv preprint arXiv:1802.04799, 2018.
  6. H. Wu, P. Judd, X. Zhang, M. Isaev, P. Micikevicius, "Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation," arXiv preprint arXiv:2004.09602, 2020.
  7. M. A. C. Fernandes, H. T. Kung, "A Novel Training Strategy for Deep Learning Model Compression Applied to Viral Classifications," 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1-9, 2021.
  8. Y. Zhou, X. Hu, L. Wang, G. Zhou, S. Duan, "QuantBayes: Weight Optimization for Memristive Neural Networks Via Quantization-aware Bayesian Inference," IEEE Transactions on Circuits and Systems I: Regular Papers Vol. 68, No. 12, pp. 4851-4861, 2021. https://doi.org/10.1109/TCSI.2021.3115787
  9. N. Shoghi, A. Bersatti, M. Qureshi, H. Kim, "SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering," IEEE Computer Architecture Letters Vol. 20, No. 2, pp. 126-129, 2021. https://doi.org/10.1109/LCA.2021.3108505
  10. B. Liberatori, C. A. Mami, G. Santacatterina, M. Zullich, F. A. Pellegrino, "YOLO-Based Face Mask Detection on Low-End Devices Using Pruning and Quantization," 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), pp. 900-905, 2022.
  11. S. Zhang, X. Li, C. Zhang, "Neural Network Quantization Methods For Voice Wake Up Network," Journal of Physics: Conference Series Vol. 1871, No. 1, pp. 012049, 2021.
  12. https://github.com/apache/tvm-rfcs/blob/main/rfcs/0006-AMP_pass.md
  13. M. H. Shin, I. K. Ye, D. W. Lee, "Performance Analysis on TVM Optimization for AI Framework in Autonomous Vehicles," Institute of Embedded Engineering of Korea (IEMEK), 2021 (in Korean).
  14. L. Zheng, C. Jia, M. Sun, Z. Wu, C. H. Yu, A. Haj-Ali, Y. Wang, J. Yang, D. Zhuo, K. Sen, "Ansor: Generating High-performance Tensor Programs for Deep Learning," Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, pp. 863-879, 2020.
  15. M. H. Shin, I. K. Ye, D. W. Lee, "A Study on the Effect of Low-level Code Optimization for DNNs via TVM Optimization Performance Analysis," Korea Institute of Military Science and Technology (KIMST), 2021 (in Korean).
  16. M. H. Shin, L. K. Ye, D. W. Lee, "A Study on TVM for the Embedded Software in Weapon Systems," The Korea Institute of Intelligent Transport Systems Vol. 2022, No. 6, pp. 246-251, 2022 (in Korean).
  17. H. A. Abdelhafez, H. Halawa, K. Pattabiraman, M. Ripeanu, "Snowflakes at the Edge: A Study of Variability Among NVIDIA Jetson AGX Xavier Boards," Proceedings of the 4th International Workshop on Edge Systems, Analytics and Networking, pp. 1-6, 2021.
  18. K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  19. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
  20. M. Tan, Q. Le, "Efficientnetv2: Smaller Models and Faster Training," International Conference on Machine Learning, pp. 10096-10106, 2021.
  21. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei, "Imagenet: A Large-scale Hierarchical Image Database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
  22. Y. Wei-Wei, J. ZHANG, "Real-Time Drivers' Violation Behaviors Detection Based on Improved YOLOv3-tiny Algorithm-Based on Model Pruning and Half-Precision Acceleration [J]," Computer Systems & Applications Vol. 29, No. 04, pp. 41-47, 2020.
  23. D. Lin, S. Talathi, S. Annapureddy, "Fixed Point Quantization of Deep Convolutional Networks," International Conference on Machine Learning, pp. 2849-2858, 2016.
  24. P. Nayak, D. Zhang, S. Chai, "Bit Efficient Quantization for Deep Neural Networks," 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pp. 52-56, 2019.
  25. S. Yang, R. Wu, M. Wang, L. Jiao, "Evolutionary Clustering Based Vector Quantization and SPIHT Coding for Image Compression," Pattern Recognition Letters Vol. 31, No. 13, pp. 1773-1780, 2010.
  26. R. C. O. Rocha, V. Porpodas, P. Petoumenos, L. F. Goes, Z. Wang, M. Cole, H. Leather, "Vectorization-aware Loop Unrolling with Seed Forwarding," Proceedings of the 29th International Conference on Compiler Construction, pp. 1-13, 2020.
  27. K. Hammond, S. P. Jones, "Profiling Scheduling Strategies on the GRIP Parallel Reducer," Submitted to Journal of Parallel and Distributed Computing, 1991.
  28. S. D. Hammond, C. T. Vaughan, D. Dinge, P. Lin, C. Hughes, C. R. Trott, J. Cook, R. J. Hoekstra, "Sandia ATDM Performance Execution Tools & Analysis," 2018.
  29. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, "Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems," arXiv preprint arXiv:1603.04467, 2016.
  30. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, "Pytorch: An Imperative Style, High-performance Deep Learning Library," Advances in Neural Information Processing Systems Vol. 32, 2019.
  31. J. Bai, F. Lu, K. Zhang, "Onnx: Open Neural Network Exchange," GitHub Repository, pp. 54, 2019.
  32. https://gist.github.com/masahi/e4c611694e3dfd307a8b6bba45eb1658
  33. G. Bradski, "The OpenCV Library," Dr. Dobb's Journal: Software Tools for the Professional Programmer Vol. 25, No. 11, pp. 120-123, 2000.
  34. S32V Vision and Sensor Fusion Evaluation Board. https://www.nxp.com/products/processors-and-microcontrollers/armbased-processors-and-mcus/s32-automotiveplatform/s32v-vision-andsensor-fusion-evaluation-board:SBC-S32V234
  35. D. H. Son, H. Y. Lee, D. H. Im, "Development of High Reliable Real-Time Operating System (RTWORKS) Based on Partitioning and Application of Weapon System," Communications of the Korean Institute of Information Scientists and Engineers Vol. 34, No. 10, pp. 53-59, 2016.