DOI QR코드

DOI QR Code

Neural Network Model Compression Algorithms for Image Classification in Embedded Systems

임베디드 시스템에서의 객체 분류를 위한 인공 신경망 경량화 연구

  • Received : 2022.03.07
  • Accepted : 2022.03.25
  • Published : 2022.05.31

Abstract

This paper introduces model compression algorithms which make a deep neural network smaller and faster for embedded systems. The model compression algorithms can be largely categorized into pruning, quantization and knowledge distillation. In this study, gradual pruning, quantization aware training, and knowledge distillation which learns the activation boundary in the hidden layer of the teacher neural network are integrated. As a large deep neural network is compressed and accelerated by these algorithms, embedded computing boards can run the deep neural network much faster with less memory usage while preserving the reasonable accuracy. To evaluate the performance of the compressed neural networks, we evaluate the size, latency and accuracy of the deep neural network, DenseNet201, for image classification with CIFAR-10 dataset on the NVIDIA Jetson Xavier.

Keywords

Acknowledgement

This work was supported by Theater Defense Research Center funded by Defense Acquisition Program Administration under Grant UD200043CD

References

  1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, May, 2017, DOI: 10.1145/3065386.
  2. J. Deng, W. Dong, R. Socher, L.-J. Li, Kai Li, and Li Fei-Fei, "ImageNet: A Large-scale Hierarchical Image Database," 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, DOI: 10.1109/CVPR.2009.5206848.
  3. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/CVPR.2016.90.
  4. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale," International Conference on Learning Representations, Vienna, Austria, 2021. [Online], https://openreview.net/forum?id=YicbFdNTTy.
  5. S. Oh, H. Kim, S. Cho, J. You, Y. Kwon, W.-S. Ra, and Y.-K. Kim, "Development of a Compressed Deep Neural Network for Detecting Defected Electrical Substation Insulators Using a Drone," Journal of Institute of Control, Robotics and Systems, vol. 26, no. 11, pp. 884-890, Nov., 2020, DOI: 10.5302/j.icros.2020.20.0117.
  6. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017, [Online], https://arxiv.org/abs/1704.04861.
  7. T. Uetsuki, Y. Okuyama, and J. Shin, "CNN-based End-to-end Autonomous Driving on FPGA Using TVM and VTA," 2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 2021, DOI: 10.1109/MCSoC51149.2021.00028..
  8. S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding," arXiv preprint arXiv:1510. 00149, 2015, [Online], https://arxiv.org/abs/1510.00149.
  9. Y. LeCun, D. John, and S. Sara, "Optimal Brain Damage," Advances in Neural Information Processing Systems, 1989, [Online], https://papers.nips.cc/paper/1989/hash/6c9882bbac1c7093bd25041881277658-Abstract.html.
  10. B. Hassibi and D. Stork, "Second Order Derivatives for Network Pruning: Optimal Brain Surgeon," Advances in Neural Information Processing Systems, 1992, [Online], https://proceedings.neurips.cc/paper/1992/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html.
  11. H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, "Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation," arXiv preprint arXiv:2004.09602. 2020, [Online], https://arxiv.org/abs/2004.09602.
  12. G. E. Hinton, O. Vinyals, and J. Dean, "Distilling the Knowledge in a Neural Network," Advances in Neural Information Processing Systems, 2014, [Online], https://arxiv.org/abs/1503.02531.
  13. M. Zhu and S. Gupta, "To prune, or not to Prune: Exploring the Efficacy of Pruning for Model Compression," International Conference on Learning Representations Workshop, 2018, [Online], https://arxiv.org/abs/1710.01878.
  14. B. Heo, M. Lee, S. Yun, and J. Choi, "Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons," AAAI Conference on Artificial Intelligence, 2019, DOI: 10.1609/aaai.v33i01.33013779.
  15. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based Learning Applied to Document Recognition," IEEE, vol. 86, no. 11, pp. 2278-2324, Nov., 1998, DOI: 10.1109/5.726791.
  16. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large Scale Image Recognition," International Conference on Learning Representations, 2015, [Online], https://arxiv.org/abs/1409.1556.
  17. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, DOI: 10.1109/CVPR.2017.243.
  18. S. Han, J. Pool, J. Tran, and W. J. Dally, "Learning Both Weights and Connections for Efficient Neural Networks," arXiv:1506.02626, 2015, [Online], https://arxiv.org/abs/1506.02626.
  19. A. See, M.-T. Luong, and C. D. Manning, "Compression of Neural Machine Translation via Pruning," The 20th Signll Conference on Computational Natural Language Learning, Aug., 2016, DOI: 10.18653/v1/K16-1029.
  20. S. Narang, E. Elsen, G. Diamos, and S. Sengupta, "Exploring Sparsity in Recurrent Neural Networks," arXiv preprint arXiv: 1704.05119, 2017, [Online], https://arxiv.org/abs/1704.05119.
  21. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, "Fitnets: Hints for Thin Deep Nets," arXiv:1412.6550, 2015, [Online], https://arxiv.org/abs/1412.6550.
  22. S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim, "Fast Auto-augment," Advances in Neural Information Processing Systems, vol. 32, 2019, [Online], https://papers.nips.cc/paper/2019/hash/6add07cf50424b14fdf649da87843d01-Abstract.html.