Browse > Article
http://dx.doi.org/10.5909/JBE.2021.26.6.778

Compression of DNN Integer Weight using Video Encoder  

Kim, Seunghwan (Department of Computer Education, Sungkyunkwan University)
Ryu, Eun-Seok (Department of Computer Education, Sungkyunkwan University)
Publication Information
Journal of Broadcast Engineering / v.26, no.6, 2021 , pp. 778-789 More about this Journal
Abstract
Recently, various lightweight methods for using Convolutional Neural Network(CNN) models in mobile devices have emerged. Weight quantization, which lowers bit precision of weights, is a lightweight method that enables a model to be used through integer calculation in a mobile environment where GPU acceleration is unable. Weight quantization has already been used in various models as a lightweight method to reduce computational complexity and model size with a small loss of accuracy. Considering the size of memory and computing speed as well as the storage size of the device and the limited network environment, this paper proposes a method of compressing integer weights after quantization using a video codec as a method. To verify the performance of the proposed method, experiments were conducted on VGG16, Resnet50, and Resnet18 models trained with ImageNet and Places365 datasets. As a result, loss of accuracy less than 2% and high compression efficiency were achieved in various models. In addition, as a result of comparison with similar compression methods, it was verified that the compression efficiency was more than doubled.
Keywords
Deep Learning Model Parameter Quantization; Weight compression; Lightweight model;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Ioffe, "Batch renormalization: Towards reducing minibatch dependence in batch-normalized models," arXiv preprint arXiv:1702.03275, 2017.
2 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
3 R. Liu, J. Cao, P. Li, W. Sun, Y. Zhang, and Y. Wang, "Nfp: A no finet-uning pruning approach for convolutional neural network compression," 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD). IEEE, 2020, pp. 74-77.
4 G. K. Wallace, "The jpeg still picture compression standard," IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii-xxxiv, 1992.   DOI
5 J. H. Ko, D. Kim, T. Na, J. Kung, and S. Mukhopadhyay, "Adaptive weight compression for memory-efficient neural networks," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 2017, pp. 199-204.
6 S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," International conference on machine learning. PMLR, 2015, pp. 448-456.
7 T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the h. 264/avc video coding standard," IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 560-576, 2003.   DOI
8 H. Wu, P. Judd, X. Zhang, M. Isaev, and P. Micikevicius, "Integer quantization for deep learning inference: Principles and empirical evaluation," arXiv preprint arXiv:2004.09602, 2020.
9 Y. Guo, "A survey on methods and theories of quantized neural networks," arXiv preprint arXiv:1808.04752, 2018.
10 G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, "Overview of the high efficiency video coding (hevc) standard," IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649-1668, 2012.   DOI
11 B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, "Places: A 10 million image database for scene recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1452-1464, 2017.   DOI
12 J.-H. Luo, J. Wu, and W. Lin, "Thinet: A filter level pruning method for deep neural network compression," Proceedings of the IEEE international conference on computer vision, 2017, pp. 5058-5066.
13 D. Marpe, H. Schwarz, and T. Wiegand, "Context-based adaptive binary arithmetic coding in the h.264/avc video compression standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 620-636, July 2003.   DOI
14 S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," arXiv preprint arXiv:1510.00149, 2015.
15 A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer, "A survey of quantization methods for efficient neural network inference," arXiv preprint arXiv:2103.13630, 2021.
16 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
17 S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision,", International conference on machine learning. PMLR, 2015, pp. 1737-1746.
18 B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704 - 2713.
19 R. David, J. Duke, A. Jain, V. J. Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, S. Regev et al., "Tensorflow lite micro: Embedded machine learning on tinyml systems," arXiv preprint arXiv:2010.08678, 2020.
20 N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," IEEE transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.
21 Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, "Learning efficient convolutional networks through network slimming," Proceedings of the IEEE international conference on computer vision, 2017, pp. 2736-2744.
22 S. Kim, E.-S. Park, M. Ghulam, and E.-S. Ryu, "Compression method for cnn models using dct," Proceedings of the Korean Society of Broad-cast Engineers Conference. The Korean Institute of Broadcast and Media Engineers, 2020, pp. 553-556.
23 Y. Wang, C. Xu, S. You, D. Tao, and C. Xu, "Cnnpack: Packing convolutional neural networks in the frequency domain." NIPS, vol. 1, 2016, p. 3.
24 H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, "Pruning filters for efficient convnets," arXiv preprint arXiv:1608.08710, 2016.
25 S. Wiedemann, H. Kirchhoffer, S. Matlage, P. Haase, A. Marban,T. Marinc, D. Neumann, A. Osman, D. Marpe, H. Schwarzet al.,"Deepcabac: Context-adaptive binary arithmetic coding for deep neuralnetwork compression,"arXiv preprint arXiv:1905.08318, 2019.
26 J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized convolutional neural networks for mobile devices," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820-4828.
27 J. Frankle and M. Carbin, "The lottery ticket hypothesis: Finding sparse, trainable neural networks," arXiv preprint arXiv:1803.03635, 2018.
28 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248-255.
29 M. Schmidt, G. Fung, and R. Rosales, "Fast optimization methods for l 1 regularization: A comparative study and two new approaches," European Conference on Machine Learning. Springer, 2007, pp. 286-297.