Browse > Article
http://dx.doi.org/10.5909/JBE.2022.27.3.273

Evaluation of Video Codec AI-based Multiple tasks  

Kim, Shin (Dept. of Computer Science and Engineering, Konkuk University)
Lee, Yegi (Dept. of Computer Science and Engineering, Konkuk University)
Yoon, Kyoungro (Dept. of Computer Science and Engineering, Konkuk University)
Choo, Hyon-Gon (Immersive Media Research Laboratory, Electronics and Telecommunications Research Institute)
Lim, Hanshin (Immersive Media Research Laboratory, Electronics and Telecommunications Research Institute)
Seo, Jeongil (Immersive Media Research Laboratory, Electronics and Telecommunications Research Institute)
Publication Information
Journal of Broadcast Engineering / v.27, no.3, 2022 , pp. 273-282 More about this Journal
Abstract
MPEG-VCM(Video Coding for Machine) aims to standardize video codec for machines. VCM provides data sets and anchors, which provide reference data for comparison, for several machine vision tasks including object detection, object segmentation, and object tracking. The evaluation template can be used to compare compression and machine vision task performance between anchor data and various proposed video codecs. However, performance comparison is carried out separately for each machine vision task, and information related to performance evaluation of multiple machine vision tasks on a single bitstream is not provided currently. In this paper, we propose a performance evaluation method of a video codec for AI-based multi-tasks. Based on bits per pixel (BPP), which is the measure of a single bitstream size, and mean average precision(mAP), which is the accuracy measure of each task, we define three criteria for multi-task performance evaluation such as arithmetic average, weighted average, and harmonic average, and to calculate the multi-tasks performance results based on the mAP values. In addition, as the dynamic range of mAP may very different from task to task, performance results for multi-tasks are calculated and evaluated based on the normalized mAP in order to prevent a problem that would be happened because of the dynamic range.
Keywords
Video Coding for Machine; Machine Vision; Multiple Machine Vision Tasks; Multi-tasks Performance evaluation;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. Takehiro, H. Choi, and I. V. Bajic. "SFU-HW-Tracks-v1: Object Tracking Dataset on Raw Video Sequences." arXiv preprint arXiv:2112.14934 , 2021. doi: https://doi.org/10.48550/arXiv.2112.14934   DOI
2 ISO/IEC JTC 1/SC 29/WG 2, "Call for Evidence for Video Coding for Machines", the 133rd MPEG meeting, January 2021. https://dms.mpeg.expert/doc_end_user/documents/133_OnLine/wg11/MDS20126_WG02_N00042.zip
3 Y. Lee, S. Kim, K. Yoon, H. Lim, H. Choo, W. Cheong and J. Seo, "[VCM] Response to CfE: Object detection results with the FLIR dataset," the 134th MPEG meeting, April 2021. https://dms.mpeg.expert/doc_end_user/documents/134_OnLine/wg11/m56572-v1-m56572_v2.zip
4 S. Ren, K. He, R. Girshick and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, 28, 2015. https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf
5 T. Y. Lin, P. Dollar, R. Girshick, He, B. Hariharan and S. Belongie, "Feature pyramid networks for object detection," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2117-2125, 2017. https://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf
6 K. He, G. Gkioxari, P. Dollar and R. Girshick, "Mask r-cnn," In Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017. https://link.springer.com/chapter/10.1007/978-3-319-10602-1_48
7 W. Gao, X. Xu and S. Liu "[VCM] Updated anchor results for object detection using TVD dataset", the 135th MPEG meeting, July 2021.
8 T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D and Zitnick, C. L, "Microsoft coco: Common objects in context," In European conference on computer vision, Springer, Cham, pp. 740-755, 2014.
9 ISO/IEC JTC 1/SC 29/WG 2, "Common Test Conditions and Evaluation Methodology for Video Coding for Machines," the 137th MPEG meeting, January 2022. https://dms.mpeg.expert/doc_end_user/documents/137_OnLine/wg11/MDS21288_WG02_N00163.zip
10 ISO/IEC 23090-3, "2021 Information technology - Coded representation of immersive media - Part 3:Versatile video Coding" https://www.iso.org/standard/73022.html
11 Free FLIR Thermal dataset, https://www.flir.com/oem/adas/dataset/ (accessed Jan, 8, 2020).
12 X. Xu, S. Liu and Z. Li, "Tencent Video Dataset (TVD): A Video Dataset for Learning-based Visual Data Compression and Analysis", arXiv:2105.05961, May 2021. doi: https://doi.org/10.48550/arXiv.2105.05961   DOI
13 Open Images V6, https://storage.googleapis.com/openimages/web/index. html (accessed Mar, 1, 2020)
14 B. Zhu, L. Yu, D. Li and Y. Pan, "[VCM] ZJU response to cfe: deep learning-based compression for machine vision ", the 134th MPEG meeting, April 2021. https://dms.mpeg.expert/doc_end_user/documents/134_OnLine/wg11/m56445-v3-m56445[VCM]ZJUresponsetocfe.zip
15 VTM 12.0/VVCSoftware_VTM, https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.0 (accessed April, 1, 2021) https://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf
16 W. Gao, X. Xu, S. Liu and M. Qin, "[VCM] TVD dataset for Object Segmentation", the 135th MPEG meeting, July 2021.