Browse > Article
http://dx.doi.org/10.33851/JMIS.2021.8.3.147

CNN-based Fast Split Mode Decision Algorithm for Versatile Video Coding (VVC) Inter Prediction  

Yeo, Woon-Ha (Dept. of IT Engineering, Sookmyung Women's University)
Kim, Byung-Gyu (Dept. of IT Engineering, Sookmyung Women's University)
Publication Information
Journal of Multimedia Information System / v.8, no.3, 2021 , pp. 147-158 More about this Journal
Abstract
Versatile Video Coding (VVC) is the latest video coding standard developed by Joint Video Exploration Team (JVET). In VVC, the quadtree plus multi-type tree (QT+MTT) structure of coding unit (CU) partition is adopted, and its computational complexity is considerably high due to the brute-force search for recursive rate-distortion (RD) optimization. In this paper, we aim to reduce the time complexity of inter-picture prediction mode since the inter prediction accounts for a large portion of the total encoding time. The problem can be defined as classifying the split mode of each CU. To classify the split mode effectively, a novel convolutional neural network (CNN) called multi-level tree (MLT-CNN) architecture is introduced. For boosting classification performance, we utilize additional information including inter-picture information while training the CNN. The overall algorithm including the MLT-CNN inference process is implemented on VVC Test Model (VTM) 11.0. The CUs of size 128×128 can be the inputs of the CNN. The sequences are encoded at the random access (RA) configuration with five QP values {22, 27, 32, 37, 42}. The experimental results show that the proposed algorithm can reduce the computational complexity by 11.53% on average, and 26.14% for the maximum with an average 1.01% of the increase in Bjøntegaard delta bit rate (BDBR). Especially, the proposed method shows higher performance on the sequences of the A and B classes, reducing 9.81%~26.14% of encoding time with 0.95%~3.28% of the BDBR increase.
Keywords
Versatile Video Coding (VVC); Inter Prediction; Fast algorithm; Convolutional Neural Network (CNN); Deep learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Liu, A. Segall, E. Alshina, and R.-L. Liao, "JVET common test conditions and evaluation procedures for neural network-based video coding technology," Doc. JVET-T2006, Joint Video Exploration Team (JVET), 2020.
2 F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, "Dataset for complexity analysis of VVC encoding and decoding," IEEE Dataport, doi: https://dx.doi.org/10.21227/p0rm-4b03, 2020.   DOI
3 Y. Zhang, S. Kwong, G. Zhang, Z. Pan, H. Yuan, and G. Jiang, "Low complexity HEVC INTRA coding for high-quality mobile video communication," IEEE Transactions on Industrial Informatics, vol. 11, no. 6, pp. 1492-1504, 2015.   DOI
4 L. Shen, Z. Zhang, and Z. Liu, "Adaptive inter-mode decision for HEVC jointly utilizing inter-level and spatiotemporal correlations," IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 10, pp. 1709-1722, 2014.   DOI
5 Jong-Hyeok Lee, C.-S. Park, B.-G. Kim, Dong-San Jun, Soon-Heung Jung, Jin-Soo Choi, "Novel Fast PU Decision Algorithm for The HEVC Video," in Proceeding of IEEE International Conference on Image Processing (ICIP) (IEEE), pp. 1982-1985, doi: 10.1109/ICIP.2013.6738408, Melbourne, VIC, Australia, 15-18 Sept. 2013.   DOI
6 Z. Wang, S. Wang, X. Zhang, S. Wang, and S. Ma, "Fast QTBT partitioning decision for interframe coding with convolution neural network," in Proceeding of the 25th IEEE International Conference on Image Processing (ICIP), pp. 2550-2554, IEEE, 2018.
7 T. Li, M. Xu, R. Tang, Y. Chen, and Q. Xing, "DeepQTMT: A Deep Learning Approach for Fast QTMT-based CU Partition of Intra-mode VVC," arXiv preprint, arXiv:2006.13125, 2020.
8 D. Ma, F. Zhang, and D. R. Bull, "BVI-DVC: a training database for deep video compression," arXiv preprint, arXiv:2003.13552, 2020.
9 I. Loshchilov and F. Hutter, "Sgdr: Stochastic gradient descent with warm restarts," arXiv preprint, arXiv:1608.03983, 2016.
10 K. Kim and W. W. Ro, "Fast CU depth decision for HEVC using neural networks," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 5, pp. 1462-1473, 2018.   DOI
11 B. Bross, J. Chen, S. Liu, and Y.-K. Wang, "Versatile video coding editorial refinements on draft 10," JVET-T2001, October 2020.
12 F. Pakdaman, M. A. Adelimanesh, M. Gabbouj, and M. R. Hashemi, "Complexity analysis of next-generation VVC encoding and decoding," in Proceeding of 2020 IEEE International Conference on Image Processing (ICIP), pp. 3134-3138, IEEE, 2020.
13 J. Chen, Y. Ye, and S. Kim, "Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)," JVETT2002, October 2020.
14 K. Lim, J. Lee, S. Kim, and S. Lee, "Fast PU skip and split termination algorithm for HEVC intra prediction," IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, no. 8, pp. 1335-1346, 2014.   DOI
15 Kalyan Goswami, Jong-Hyeok Lee, Byung-Gyu Kim, "Fast algorithm for the High Efficiency Video Coding (HEVC) encoder using texture analysis," Information Sciences, vol. 364-365, pp. 72-90, 2016.   DOI
16 J. Zhang, B. Li, and H. Li, "An efficient fast mode decision method for inter prediction in HEVC," IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 8, pp. 1502-1515, 2015.   DOI
17 Kalyan Goswami, Byung-Gyu Kim, "A Design of Fast HighEfficiency Video Coding Scheme Based on Markov Chain Monte Carlo Model and Bayesian Classifier," IEEE Transactions Industrial Electronics, vol. 65, no. 11, pp. 8861-8871, 2018.   DOI
18 Z. Jin, P. An, L. Shen, and C. Yang, "CNN oriented fast QTBT partition algorithm for JVET intra coding," in Proceeding of IEEE Visual Communications and Image Processing (VCIP), pp. 1-4, 2017.
19 F. Galpin, F. Racape, S. Jaiswal, P. Bordes, F. Le Leannec, and E. Francois, "CNN-based driving of block partitioning for intra slices encoding," in Proceeding of 2019 Data Compression Conference (DCC), pp. 162-171, 2019.
20 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
21 J. Xiong, H. Li, F. Meng, Q.Wu, and K. N. Ngan, "Fast HEVC inter CU decision based on latent SAD estimation," IEEE Transactions on Multimedia, vol. 17, no. 12, pp. 2147-2159, 2015.   DOI
22 Byung-Gyu Kim, "Novel Inter-Mode Decision Algorithm Based on Macroblock (MB) Tracking for the P-Slice in H.264/AVC Video Coding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 2, pp. 273-279, Feb. 2008.   DOI
23 X. Zhu and M. Bain, "B-CNN: branch convolutional neural network for hierarchical classification," arXiv preprint, arXiv:1709.09890, 2017.
24 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
25 Young-Woon Lee, Ji-Hae Kim, Young-Ju Choi, Byung-Gyu Kim, "CNN-based Approach for Visual Quality Improvement on HEVC," in Proceeding of IEEE International Conference on Consumer Electronics (ICCE), pp. 498-500, Lasvegas USA, Jan. 11-14, 2018.
26 Y. Li, Z. Liu, X. Ji, and D. Wang, "CNN based CU partition mode decision algorithm for HEVC inter coding," in 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 993-997, IEEE, 2018.
27 A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury et al., "PyTorch: An Imperative Style, High-Performance Deep Learning Library," in Proceeding of Advances in Neural Information Processing Systems 32 (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, and R. Garnett, eds.), pp. 8024-8035, Curran Associates, Inc., 2019.
28 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint, arXiv:1412.6980, 2014.
29 J. Chen, Y. Ye, and S. Kim, "Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)," JVETT2002, October 2020.