DOI QR코드

DOI QR Code

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network

계층 간 특징 복원-예측 네트워크를 통한 피라미드 특징 압축

  • Kim, Minsub (Department of Computer Engineering, Kwangwoon University) ;
  • Sim, Donggyu (Department of Computer Engineering, Kwangwoon University)
  • Received : 2022.04.12
  • Accepted : 2022.05.03
  • Published : 2022.05.30

Abstract

The feature map used in the network for deep learning generally has larger data than the image and a higher compression rate than the image compression rate is required to transmit the feature map. This paper proposes a method for transmitting a pyramid feature map with high compression rate, which is used in a network with an FPN structure that has robustness to object size in deep learning-based image processing. In order to efficiently compress the pyramid feature map, this paper proposes a structure that predicts a pyramid feature map of a level that is not transmitted with pyramid feature map of some levels that transmitted through the proposed prediction network to efficiently compress the pyramid feature map and restores compression damage through the proposed reconstruction network. Suggested mAP, the performance of object detection for the COCO data set 2017 Train images of the proposed method, showed a performance improvement of 31.25% in BD-rate compared to the result of compressing the feature map through VTM12.0 in the rate-precision graph, and compared to the method of performing compression through PCA and DeepCABAC, the BD-rate improved by 57.79%.

딥 러닝 네트워크에서 사용되는 특징 맵은 일반적으로 영상보다 데이터가 크며 특징 맵을 전송하기 위해서는 영상의 압축률보다 더 높은 압축률이 요구된다. 본 논문은 딥러닝 기반의 영상처리에서 객체의 크기에 대한 강인성을 가지는 FPN 구조의 네트워크에서 사용되는 피라미드 특징 맵을 높은 압축률로 전송하기 위해 제안한 복원-예측 네트워크를 통해 전송된 일부 계층의 피라미드 특징 맵으로 전송하지 않은 계층의 피라미드 특징 맵을 예측하며, 압축으로 인한 손상을 복원하는 구조를 제안한다. 제안한 방법의 COCO 데이터셋 2017 Train images에 대한 객체 탐지의 성능은 rate-precision 그래프에서 VTM12.0을 통해 특징 맵을 압축한 결과 대비 BD-rate 31.25%의 성능향상을 보였고, PCA와 DeepCABAC을 통한 압축을 수행한 방법 대비 BD-rate 57.79%의 성능향상을 보였다.

Keywords

Acknowledgement

이 논문은 2022년도 광운대학교 교내학술연구비 지원 및 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업(NRF-2021R1A2C2092848)의 지원을 받아 작성되었습니다.

References

  1. Y. LeCun, Y. Bengio, G. E. Hinton, "Deep learning," Nature, vol. 512, pp. 436-444, 2015. doi: https://doi.org/10.1038/nature14539
  2. M. F. Mahmood, N. Hussin, "Information in conversion era: Impact and influence from 4th industrial revolution," International Journal of Academic Research in Business and Social Sciences, Vol.8, No.9, pp. 320-328, 2018. doi: https://doi.org/10.6007/IJARBSS/v8-i9/4594
  3. G. Sullivan, J. Ohm, W. Han, and T. Wiegand, "Overview of the high efficiency video coding (HEVC) standard," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec. 2012. doi: https://doi.org/10.1109/TCSVT.2012.2221191
  4. B. Bross, Y. K. Wang, Y. Ye, S. Liu, J. Chen, "Overview of the versatile video coding (VVC) standard and its applications," IEEE Transactions on Circuits and Systems for Video Technology, Vol 31, No 10, pp. 3736-3764, 2021. doi: https://doi.org/10.1109/TCSVT.2021.3101953
  5. S. Wang, Z. Wang, Y. Ye, S. Wang, "[VCM] Investigation on feature map layer selection for object detection and compression," ISO/IEC JTC 1/SC 29/WG 2, m55787, Online, Dec. 2020.
  6. Vedeo Coding for Machines, https://mpeg.chiariglione.org/standards/exploration/video-coding-machines (accessed July. 2019).
  7. VTM12.0, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.0 (accessed Nov. 26, 2021).
  8. Y. Lee, S., K. Yoon, H. Lim, H. Choo, W. Cheong, J. Seo, "[VCM] Updated FLIR Anchor results for object detection," ISO/IEC JTC 1/SC29/WG 2, m57375, Online, Jul. 2021.
  9. S. Wang, Z. Wang, Y. Ye, S. Wang, "[VCM] End-to-end image compression towards machine vision for object detection," ISO/IEC JTC 1/SC 29/WG 2, m57500, Online, Jul. 2021.
  10. M. Lee, H. Choi, S. Park, M. Kim, "[VCM] A feature map compression based on optimal transformation with VVC and DeepCABAC for VCM," ISO/IEC JTC 1/SC 29/WG 2, m58022, Online, October. 2021.
  11. D. Gwak, C. Kim, J. Lim, "[VCM track 1] Feature data compression based on generalized PCA for object detection," ISO/IEC JTC 1/SC 29/WG 2, m58785, Online, Jan. 2022.
  12. T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature pyramid networks for object detection." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117-2125, July. 2017. doi: https://doi.org/10.48550/arXiv.1612.03144
  13. S. Wiedemann et al., "DeepCABAC: A universal compression algorithm for deep neural networks," IEEE J. Sel. Topics Signal Process., Vol. 14, No. 4, pp. 700-714, May 2020. doi: https://doi.org/10.1109/JSTSP.2020.2969554
  14. COCO2017 validation set, https://cocodataset.org/#download (accessed Nov. 26, 2021).
  15. G. Bjontegaard, "Calculation of average PSNR differences between RDcurves," Tech. Rep. VCEGM33, Video Coding Experts Group (VCEG), 2001. doi: https://doi.org/10.3169/itej.67.529
  16. K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, June. 2016. doi: https://doi.org/10.1109/cvpr.2016.90
  17. S. Xie, R. Girshick, P. Dollar, Z. Tu, K. He, "Aggregated Residual Transformations for Deep Neural Networks," arXiv, 2017. doi: https://doi.org/10.1109/cvpr.2017.634
  18. Detectron2, https://github.com/facebookresearch/detectron2 (accessed 2019).
  19. V. Nair, G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," International Conference on Machine Learning, June. 2010. doi:https://dl.acm.org/doi/10.5555/3104322.3104425
  20. J. Balle, V. Laparra, E. P. Simoncelli, "Density modeling of images using a generalized normalization transformation," In 4th International Conference on Learning Representations, May. 2016. doi: https://doi.org/10.48550/arXiv.1511.06281
  21. J. Balle, V. Laparra, E. P. Simoncelli, "End-to-end optimized image compression," In 5th International Conference on Learning Representations, May. 2017. doi: https://doi.org/10.48550/arXiv.1611.01704
  22. K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, W. Zuo, "End-to-end blind image quality assessment using deep neural networks," IEEE Transactions on Image Processing, Vol.27, No.3, pp. 1202-1213, 2017. doi: https://doi.org/10.1109/tip.2017.2774045
  23. J. Lee, S. Cho, H. Y. Kim, J. S. Choi, "A study on nonlinear transform layers in neural networks for image compression," In Proceedings of the Korean Society of Broadcast Engineers Conference, The Korean Institute of Broadcast and Media Engineers, pp. 267-269, 2018. doi: https://www.koreascience.or.kr/article/CFKO201815540966800
  24. J. Balle, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, G. Toderici, "Nonlinear transform coding," IEEE Journal of Selected Topics in Signal Processing, Vol.15, No.2, pp. 339-353, 2021. doi: https://doi.org/10.1109/JSTSP.2020.3034501
  25. S. Ren, K. He, R. Girshick, J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015. doi: https://doi.org/10.1109/tpami.2016.2577031