Browse > Article
http://dx.doi.org/10.5909/JBE.2022.27.3.283

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network  

Kim, Minsub (Department of Computer Engineering, Kwangwoon University)
Sim, Donggyu (Department of Computer Engineering, Kwangwoon University)
Publication Information
Journal of Broadcast Engineering / v.27, no.3, 2022 , pp. 283-294 More about this Journal
Abstract
The feature map used in the network for deep learning generally has larger data than the image and a higher compression rate than the image compression rate is required to transmit the feature map. This paper proposes a method for transmitting a pyramid feature map with high compression rate, which is used in a network with an FPN structure that has robustness to object size in deep learning-based image processing. In order to efficiently compress the pyramid feature map, this paper proposes a structure that predicts a pyramid feature map of a level that is not transmitted with pyramid feature map of some levels that transmitted through the proposed prediction network to efficiently compress the pyramid feature map and restores compression damage through the proposed reconstruction network. Suggested mAP, the performance of object detection for the COCO data set 2017 Train images of the proposed method, showed a performance improvement of 31.25% in BD-rate compared to the result of compressing the feature map through VTM12.0 in the rate-precision graph, and compared to the method of performing compression through PCA and DeepCABAC, the BD-rate improved by 57.79%.
Keywords
Feature map compression; Feature pyramid network; Video coding for machine; Deep learning network; Principal component analysis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, "End-to-end blind image quality assessment using deep neural networks," IEEE Transactions on Image Processing, Vol.27, No.3, pp. 1202-1213, 2017. doi: https://doi.org/10.1109/tip.2017.2774045   DOI
2 M. Lee, H. Choi, S. Park, M. Kim, "[VCM] A feature map compression based on optimal transformation with VVC and DeepCABAC for VCM," ISO/IEC JTC 1/SC 29/WG 2, m58022, Online, October. 2021.
3 J. Lee, S. Cho, H. Y. Kim, J. S. Choi, "A study on nonlinear transform layers in neural networks for image compression," In Proceedings of the Korean Society of Broadcast Engineers Conference, The Korean Institute of Broadcast and Media Engineers, pp. 267-269, 2018. doi: https://www.koreascience.or.kr/article/CFKO201815540966800
4 D. Gwak, C. Kim, J. Lim, "[VCM track 1] Feature data compression based on generalized PCA for object detection," ISO/IEC JTC 1/SC 29/WG 2, m58785, Online, Jan. 2022.
5 J. Balle, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, G. Toderici, "Nonlinear transform coding," IEEE Journal of Selected Topics in Signal Processing, Vol.15, No.2, pp. 339-353, 2021. doi: https://doi.org/10.1109/JSTSP.2020.3034501   DOI
6 COCO2017 validation set, https://cocodataset.org/#download (accessed Nov. 26, 2021).
7 S. Wang, Z. Wang, Y. Ye, S. Wang, "[VCM] Investigation on feature map layer selection for object detection and compression," ISO/IEC JTC 1/SC 29/WG 2, m55787, Online, Dec. 2020.
8 Y. Lee, S., K. Yoon, H. Lim, H. Choo, W. Cheong, J. Seo, "[VCM] Updated FLIR Anchor results for object detection," ISO/IEC JTC 1/SC29/WG 2, m57375, Online, Jul. 2021.
9 S. Wang, Z. Wang, Y. Ye, S. Wang, "[VCM] End-to-end image compression towards machine vision for object detection," ISO/IEC JTC 1/SC 29/WG 2, m57500, Online, Jul. 2021.
10 T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature pyramid networks for object detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117-2125, July. 2017. doi: https://doi.org/10.48550/arXiv.1612.03144   DOI
11 G. Bjontegaard, "Calculation of average PSNR differences between RDcurves," Tech. Rep. VCEGM33, Video Coding Experts Group (VCEG), 2001. doi: https://doi.org/10.3169/itej.67.529   DOI
12 S. Xie, R. Girshick, P. Dollar, Z. Tu, K. He, "Aggregated Residual Transformations for Deep Neural Networks," arXiv, 2017. doi: https://doi.org/10.1109/cvpr.2017.634   DOI
13 B. Bross, Y. K. Wang, Y. Ye, S. Liu, J. Chen, "Overview of the versatile video coding (VVC) standard and its applications," IEEE Transactions on Circuits and Systems for Video Technology, Vol 31, No 10, pp. 3736-3764, 2021. doi: https://doi.org/10.1109/TCSVT.2021.3101953   DOI
14 Detectron2, https://github.com/facebookresearch/detectron2 (accessed 2019).
15 J. Balle, V. Laparra, E. P. Simoncelli, "End-to-end optimized image compression," In 5th International Conference on Learning Representations, May. 2017. doi: https://doi.org/10.48550/arXiv.1611.01704   DOI
16 Y. LeCun, Y. Bengio, G. E. Hinton, "Deep learning," Nature, vol. 512, pp. 436-444, 2015. doi: https://doi.org/10.1038/nature14539   DOI
17 M. F. Mahmood, N. Hussin, "Information in conversion era: Impact and influence from 4th industrial revolution," International Journal of Academic Research in Business and Social Sciences, Vol.8, No.9, pp. 320-328, 2018. doi: https://doi.org/10.6007/IJARBSS/v8-i9/4594   DOI
18 G. Sullivan, J. Ohm, W. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec. 2012. doi: https://doi.org/10.1109/TCSVT.2012.2221191   DOI
19 VTM12.0, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.0 (accessed Nov. 26, 2021).
20 Vedeo Coding for Machines, https://mpeg.chiariglione.org/standards/exploration/video-coding-machines (accessed July. 2019).
21 S. Wiedemann et al., "DeepCABAC: A universal compression algorithm for deep neural networks," IEEE J. Sel. Topics Signal Process., Vol. 14, No. 4, pp. 700-714, May 2020. doi: https://doi.org/10.1109/JSTSP.2020.2969554   DOI
22 K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, June. 2016. doi: https://doi.org/10.1109/cvpr.2016.90   DOI
23 V. Nair, G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," International Conference on Machine Learning, June. 2010. doi:https://dl.acm.org/doi/10.5555/3104322.3104425
24 J. Balle, V. Laparra, E. P. Simoncelli, "Density modeling of images using a generalized normalization transformation," In 4th International Conference on Learning Representations, May. 2016. doi: https://doi.org/10.48550/arXiv.1511.06281   DOI
25 S. Ren, K. He, R. Girshick, J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015. doi: https://doi.org/10.1109/tpami.2016.2577031   DOI