Performance Evaluation of Efficient Vision Transformers on Embedded Edge Platforms

Minha Lee;Seongjae Lee;Taehyoun Kim;

doi:10.14372/IEMEK.2023.18.3.89

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Volume 18 Issue 3
/
Pages.89-100
/
2023
/
1975-5066(pISSN)

Institute of Embedded Engineering of Korea (대한임베디드공학회)

DOI QR Code

Performance Evaluation of Efficient Vision Transformers on Embedded Edge Platforms

임베디드 엣지 플랫폼에서의 경량 비전 트랜스포머 성능 평가

Minha Lee (University of Seoul) ;
Seongjae Lee (University of Seoul) ;
Taehyoun Kim (University of Seoul)

Received : 2023.02.15
Accepted : 2023.03.21
Published : 2023.06.30

https://doi.org/10.14372/IEMEK.2023.18.3.89 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, on-device artificial intelligence (AI) solutions using mobile devices and embedded edge devices have emerged in various fields, such as computer vision, to address network traffic burdens, low-energy operations, and security problems. Although vision transformer deep learning models have outperformed conventional convolutional neural network (CNN) models in computer vision, they require more computations and parameters than CNN models. Thus, they are not directly applicable to embedded edge devices with limited hardware resources. Many researchers have proposed various model compression methods or lightweight architectures for vision transformers; however, there are only a few studies evaluating the effects of model compression techniques of vision transformers on performance. Regarding this problem, this paper presents a performance evaluation of vision transformers on embedded platforms. We investigated the behaviors of three vision transformers: DeiT, LeViT, and MobileViT. Each model performance was evaluated by accuracy and inference time on edge devices using the ImageNet dataset. We assessed the effects of the quantization method applied to the models on latency enhancement and accuracy degradation by profiling the proportion of response time occupied by major operations. In addition, we evaluated the performance of each model on GPU and EdgeTPU-based edge devices. In our experimental results, LeViT showed the best performance in CPU-based edge devices, and DeiT-small showed the highest performance improvement in GPU-based edge devices. In addition, only MobileViT models showed performance improvement on EdgeTPU. Summarizing the analysis results through profiling, the degree of performance improvement of each vision transformer model was highly dependent on the proportion of parts that could be optimized in the target edge device. In summary, to apply vision transformers to on-device AI solutions, either proper operation composition and optimizations specific to target edge devices must be considered.

Keywords

Acknowledgement

이 논문은 정부 (과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임 (No. 2022R1F1A1060231).

References

W. Ahmad, A. Rasool, A. R. Javed, T. Baker, Z. Jalil, "Cyber Security in Iot-based Cloud Computing: A Comprehensive Survey," Electronics, Vol. 11, No. 1, pp. 16, 2022.
M. Ham, J. Moon, G. Lim, J. Jung, H. Ahn, W. Song, S. Woo, P. Kapoor, D. Chae, G. Jang, Y. Ahn, J. Lee, "NNStreamer: Efficient and Agile Development of On-Device AI Systems," Proc. of the IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 198-207, 2021.
S. Dhar, J. Guo, J. Liu, S. Tripathi, U. Kurup, M. Shah, "A Survey of On-device Machine Learning: An Algorithms and Learning Theory Perspective," ACM Transactions on Internet of Things, Vol. 2, No. 3, pp. 1-49, 2021. https://doi.org/10.1145/3450494
D. Kong, "Science Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device," arXiv preprint arXiv:1711.07580, 2017.
"Nvidia Embedded Systems for Next-Gen Autonomous Machines," NVIDIA. [Online]. Available: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/. [Accessed: 30-Jan-2023].
"Edge TPU - run Inference at the Edge | Google Cloud," Google. [Online]. Available: https://cloud.google.com/edge-tpu. [Accessed: 30-Jan-2023].
W. Vijitkunsawat, P. Chantngarm, "Comparison of Machine Learning Algorithm's on Self-driving Car Navigation Using Nvidia Jetson Nano," Proc. of the International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 201-204, 2020.
A. Basulto-Lantsova, J. A. Padilla-Medina, F. J. Perez-Pinal, A. I. Barranco-Gutierrez, "Performance comparative of OpenCV Template Matching method on Jetson TX2 and Jetson Nano Developer Kits," Proc. of the Annual Computing and Communication Workshop and Conference (CCWC), pp. 0812-0816, 2020.
K. Alibabaei, E. Assuncao, P. D. Gaspar, V. N. Soares, J. M. Caldeira, "Real-Time Detection of Vine Trunk for Robot Localization Using Deep Learning Models Developed for Edge TPU Devices," Future Internet, Vol. 14, No. 7, pp. 199, 2022.
Y. H. Tseng, S. S. Jan, "Combination of Computer Vision Detection and Segmentation for Autonomous Driving," Proc. of the IEEE/ION Position, Location and Navigation Symposium (PLANS), pp. 1047-1052, 2018.
M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, K. Zieba, "End to end Learning for Self-driving Cars," arXiv preprint arXiv:1604.07316, 2016.
D. N. N. Tran, H. H. Nguyen, L. H. Pham, J. W. Jeon, "Object Detection with Deep Learning on Drive PX2," Proc. of the IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1-4, 2020.
K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
M. Tan, Q. Le, "Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks," Proc. of the International Conference on Machine Learning, pp. 6105-6114,
A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, H. Adam, "Searching for Mobilenetv3," Proc. of the IEEE/CVF International Conference on Computer Vision, pp. 1314-1324, 2019.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv preprint arXiv:2010.11929, 2020.
M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, L. Schmidt, "Model Soups: Averaging Weights of Multiple Fine-tuned Models Improves Accuracy Without Increasing Inference Time," Proc. of the International Conference on Machine Learning, pp. 23965-23998, 2022.
X. Zhai, A. Kolesnikov, N. Houlsby, L. Beyer, "Scaling Vision Transformers," Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12104-12113, 2022.
H. Bao, L. Dong, S. Piao, F. Wei, "Beit: Bert Pre-training of Image Transformers," arXiv preprint arXiv:2106.08254, 2021.
Z. Liu, Y. Wang, K. Han, W. Zhang, S. Ma, W. Gao, "Post-training Quantization for Vision Transformer," Advances in Neural Information Processing Systems, Vol. 34, pp. 28092-28103, 2021.
Y. Tang, K. Han, Y. Wang, C. Xu, J. Guo, C. Xu, D. Tao, "Patch Slimming for Efficient Vision Transformers," Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12165-12174, 2022.
L. Song, S. Zhang, S. Liu, Z. Li, X. He, H. Sun, J. Sun, N. Zheng, "Dynamic Grained Encoder for Vision Transformers," Advances in Neural Information Processing Systems, Vol. 34, pp. 5770-5783, 2021.
B. Chen, P. Li, B. Li, C. Li, L. Bai, C. Lin, M. Sun, J. Yan, W. Ouyang, "Psvit: Better Vision Transformer Via Token Pooling and Attention Sharing," arXiv preprint arXiv:2108.03428, 2021.
B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jegou, M. Douze, "Levit: A Vision Transformer in Convnet's Clothing for Faster Inference," Proc. of the IEEE/CVF International Conference on Computer Vision, pp. 12259-12269, 2021.
S. Mehta, M. Rastegari, "Mobilevit: Light-weight, General-purpose, and Mobile-friendly Vision Transformer," arXiv preprint arXiv:2110.02178, 2021.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei, "Imagenet Large Scale Visual Recognition Challenge," International Journal of Computer Vision, Vol. 115, pp. 211-252, 2015. https://doi.org/10.1007/s11263-015-0816-y
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. Jegou, "Training Data-efficient Image Transformers & Distillation Through Attention," Proc. of the International Conference on Machine Learning, pp. 10347-10357, 2021.
B. Pan, R. Panda, Y. Jiang, Z. Wang, R. Feris, A. Oliva, "IA-RED2: Interpretability-Aware Redundancy Reduction for Vision Transformers," Advances in Neural Information Processing Systems, Vol. 34, pp. 24898-24911, 2021.
A. Ignatov, R. Timofte, W. Chou, K. Wang, M. Wu, T. Hartley, L. V. Gool, "Ai Benchmark: Running Deep Neural Networks on Android Smartphones," Proc. of the European Conference on Computer Vision (ECCV) Workshops, pp. 0-0, 2018.
A. Ignatov, R. Timofte, A. Kulik, S. Yang, K. Wang, F. Baum, M. Wu, L. Xu, L. V. Gool, "Ai Benchmark: All About Deep Learning on Smartphones in 2019," Proc. of the IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 3617-3635, 2019.
A. A. Suzen, B. Duman, B. Sen, "Benchmark Analysis of Jetson tx2, Jetson Nano and Raspberry pi Using Deep-cnn," Proc. of the International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1-5, 2020.
P. Kang, J. Jo, "Benchmarking Modern Edge Devices for Ai Applications," IEICE TRANSACTIONS on Information and Systems, Vol. 104, No. 3, pp. 394-403, 2021. https://doi.org/10.1587/transinf.2020EDP7160
X. Wang, L. L. Zhang, Y. Wang, M. Yang, "Towards Efficient Vision Transformer Inference: A First Study of Transformers on Mobile Devices," Proc. of the Annual International Workshop on Mobile Computing Systems and Applications, pp. 1-7, 2022.
T. Xiao, M. Singh, E. Mintun, T. Darrell, P. Dollar, R. Girshick, "Early Convolutions Help Transformers See Better," Advances in Neural Information Processing Systems, Vol. 34, pp. 30392-30400, 2021.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
"Higher Accuracy on Vision Models with EfficientNet-Lite," The TensorFlow Blog. [Online]. Available: https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models-with-efficientnet-lite.html. [Accessed: 30-Jan-2023].
R. Wightman, PyTorch Image Models. GitHub, 2019. doi: 10.5281/zenodo.4414861.
Raspberry Pi, "Raspberry pi 4 Model B Specifications," Raspberry Pi. [Online]. Available: https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/. [Accessed: 30-Jan-2023].
"CUDA Toolkit Documentation," CUDA Toolkit Documentation v12.0 - landing 12.0 documentation, 09-Dec-2022. [Online]. Available: https://docs.nvidia.com/cuda/index.html. [Accessed: 30-Jan-2023].
"Jetson Nano Developer Kit," NVIDIA Developer, 28-Sep-2022. [Online]. Available: https://developer.nvidia.com/embedded/jetson-nano-developer-kit. [Accessed: 30-Jan-2023].
"EdgeTPU USB Accelerator," Coral. [Online]. Available: https://coral.ai/products/accelerator/. [Accessed: 30-Jan-2023].
"Tensorflow," TensorFlow. [Online]. Available: https://www.tensorflow.org/. [Accessed: 30-Jan-2023].
"Tensorflow Lite," TensorFlow. [Online]. Available: https://www.tensorflow.org/lite/guide. [Accessed: 30-Jan-2023].
"TensorFlow-TensorRT (TF-TRT)," NVIDIA Documentation Center. [Online]. Available: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html. [Accessed: 30-Jan-2023].
"Edge Tpu Compiler," Coral. [Online]. Available: https://coral.ai/docs/edgetpu/compiler/. [Accessed: 30-Jan-2023].
T. Sheng, C. Feng, S. Zhuo, X. Zhang, L. Shen, M. Aleksic, "A Quantization-friendly Separable Convolution for Mobilenets," Proc. of the Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), pp. 14-18, 2018.