DOI QR코드

DOI QR Code

Coding Tools for Enhancing Coding Efficiency of MPEG Internet Video Coding (IVC)

MPEG 인터넷 비디오 코딩(IVC)의 부호화 효율 개선을 위한 부호화 툴

  • 양안나 (한국항공대학교 항공전자공학부) ;
  • 이재영 (세종대학교 정보통신공학부) ;
  • 한종기 (세종대학교 정보통신공학부) ;
  • 김재곤 (한국항공대학교 항공전자공학부)
  • Received : 2015.03.15
  • Accepted : 2016.04.06
  • Published : 2016.05.30

Abstract

Internet Video Coding (IVC) is a royalty-free codec currently being developed in MPEG. Coding efficiency of IVC codec has been steadily enhanced and it was reported that the performance of Committee Draft (CD) version is comparable to H.264/AVC High Profile (HP) in terms of objective and subjective qualities. In this paper, we present some coding tools that have been proposed for enhancing the coding efficiency of IVC during the developing process in MPEG along with brief overview of IVC codec architecture and coding algorithms. The coding tools include both of normative tools and informative tools such as non-reference P frame coding, DC mode intra prediction, Lagrange multiplier selection, and extension of chroma intra prediction modes. Improvement obtained by each tool is presented in terms of algorithm and coding gain based on the experiments. As a result of the experiment, the coding tools give the average bit saving of 8.8%, 0.4%, 0.4%, and 0.0%, respectively, in the low-delay coding mode.

인터넷 비디오 코딩(Internet Video Coding: IVC)은 MPEG에서 개발 중인 로열티 무료 비디오 코덱이다. IVC 코덱의 부호화 효율은 지속적으로 향상되어왔으며, CD(Committee Draft) 버전의 IVC는 객관적 화질 및 주관적 화질이 H.264/AVC HP(High Profile)와 견줄 만한 수준의 성능을 낸다고 보고 되었다. 본 논문에서는 IVC 코덱 구조의 개요 및 주요 부호화 알고리즘과 함께 MPEG에서의 IVC 개발 과정 중에 부호화 효율을 향상시키기 위하여 제안된 부호화 툴을 제시한다. 부호화 툴은 비 참조 P 프레임 부호화, DC 모드 화면내 예측, 라그랑지안 승수(Lagrange Multiplier) 선택기법, 색차신호의 화면내 예측모드 확장 기법 등 표준 및 비표준 부호화 기법을 포함한다. 각 부호화 툴에 대한 알고리즘과 부호화 효율 이득을 실험을 통하여 제시하였다. 실험결과 각 부호화 툴은 저지연 부호화 모드에서 각각 8.8%, 0.4%, 0.4%, 0.0%의 비트율 절감의 부호화 이득을 얻었다.

Keywords

Ⅰ. Introduction

Nowadays high quality and high resolution video are getting more popular in a wide range of applications such as digital broadcasting, video streaming, and mobile video. For such next-generation video services offering high quality and high resolution video, Joint Collaborative Team on Video Coding (JCT-VC) formed as a joint activity of ITU VCEG and ISO/IEC MPEG has developed a new international video coding standard called High Efficiency Video Coding (HEVC).

On the other hand, it is expected that high licensing cost would be required in the usage of HEVC in light of the H.264/AVC case. Such licensing issue may delay the proliferation of a royalty-bearing new codec. Furthermore, in several media market segments, there are requests for royalty-free video coding standards. For example, the World Wide Web Consortium (W3C), a well-known royalty-free standardization group, is trying to include a codec specification for the upcoming HTML5. Furthermore, a royalty-free codec will be likely to be adopted for new convergence video services such as Media-centric Internet of Things (IoT) and Wearables [1]. Such market demand for royalty-free codecs led MPEG to investigate the feasibility of royalty-free codec standardization. MPEG has started a new activity for a royalty-free-based coding standard [2]. There are three tracks with similar goals: Internet Video Coding (IVC), Web Video Coding (WVC), and Video Coding for Browser (VCB) [3].

IVC is a royalty-free codec currently being developed in MPEG [4], which is based on royalty-free standards such as MPEG-2 and some royal-free technologies. MPEG-2 is becoming royalty-free codec as its essential patents has been or are expiring. In aspect of coding performance, IVC has originally a goal to achieve the compression performance comparable to H.264/AVC Constrained Baseline Profile (CBP) [2]. After steady enhancement since the beginning of development, it was reported that the performance of IVC Committee Draft (CD) version is comparable to H.264/AVC HP in terms of objective and subjective qualities [5]. Therefore, IVC significantly outperforms another codecs that have been developed as a royalty-free codec in MPEG named WVC and VCB as well as AVC CBP which is an anchor codec in the development of IVC.

In this paper, we present some coding tools that have been adopted to IVC for enhancing the coding efficiency during the developing process of IVC along with brief overview of IVC codec architecture and coding algorithms. The coding tools include both of normative tools and informative tools such as non-reference P frame coding, DC mode intra prediction, Lagrange multiplier selection, and extension of chroma inter prediction modes. Improvement obtained by each tool is presented in terms of algorithm and coding gain based on the experiments. In addition, other possible extensions such as 4Í4 block-based inter predictive coding and chroma intra prediction mode extension are briefly presented.

The rest of the paper is organized as follows. Section II presents a brief overview of the IVC codec. Section III presents some coding tools proposed by us during the development of IVC over several MPEG meetings. Experimental results are shown in Section IV. We conclude out work in Section V.

 

Ⅱ. Overview of IVC CODEC

Fig. 1 shows the coding process of the current Test Model of IVC (ITM) [6]. It is basically similar to MPEG-2 video encoder, but a few aspects are different, such as intra prediction and entropy coding that uses arithmetic coding instead of variable length coding, etc. The key technologies used in the current test model are summarized as follows:

Fig. 1.Functional block diagram of IVC Encoder 그림 1. IVC 인코더 기능 블록도

 

Ⅲ. Proposed Coding Tools

In this section, we present proposed coding tools for improving the coding efficiency of IVC which have been developed during several MPEG meetings with the details of algorithms and their performance.

1. Non-reference P frame coding

Non-reference P frame coding [7] is a useful coding tool to enhance the coding efficiency of IVC in the low-delay encoding configuration, in which a larger gap of coding performance in comparison with H.264.AVC HP is exist. Therefore, more performance improvement is required in IVC in the low-delay mode. The coding efficiency is enhanced by assigning three different values of QPs depending on the importance of frames according to whether the frame is referenced or not. Such non-reference P frame coding structure allowing 3-level QP values is enabled by setting the picture type of P frame which consists of P frame, non-reference P frame, and non-reference P frame with DPB (Decoded Picture Buffer) swapping as shown in Table 1.

Table 1.frame types of non-reference P frame coding in IVC 표 1. IVC의 비 참조 P 프레임 부호화의 P 프레임 타입

A coding structure of the non-reference P frame coding is shown in Fig. 2. As shown in Fig. 2, it is adaptively determined that whether non-reference P frame coding is used or not for every four frames (e.g., P5, P6, P7, and P8 in Fig. 2). Application of non-reference P frame coding is basically determined based on the temporal correlation which is measured by the amount of motion in an adaptive manner [8]. Fig. 3 shows the details on the adaptive non-reference P frame coding algorithm, which consist of steps as follows [8]:

Fig. 2.Coding structure of non-reference P frame coding 그림 2. 비참조 P 프레임 부호화의 부호화 구조

Fig. 3.Overall procedure of non-reference P frame coding algorithm 그림 3. 비 참조 P 프레임 부호화 알고리즘의 처리 과정

As a typical example of QP setting, the lowest value of QP is assigned to the reference P picture (P8), and then the increased value of QP is assigned to the reference P frame (P6), and the largest QP is assigned to non-reference frames (P5, P7). As a result, a 3-level coding structure in terms of QP values to be assigned is used in non-reference P frame coding.

In addition, the P frame type of P7 is set as the type of ‘non-reference P frame with DPB swapping’, which results that the P4 frame is referenced for the P8 frame decoding instead of the P6 frame as shown Fig. 2. Therefore, the decoded pictures are stored in DPB in the nearest order of P4, P6, P3, P2, P1 instead of P6, P4, P3, P2, P1 after swapping which is indicated by the frame type of P7.

By the way, when Multiple Reference Frames (MRF) which is enabled as a default prediction tool is used, we have observed that the P6 frame is more likely selected as a reference frame than P4 frame. Therefore, the coding structure swapping DPB buffer may not be good for the coding performance when MRF is used. Because the more frequently selected reference frame should be predicted in advance which results in the reduced signaling bit with a lower value of reference frame index (ref_idx) as shown in Fig. 3. As a result, the type of ‘non-reference P frame’ is assigned for the P7 frame instead of ‘non-reference P frame DPB swapping’ in the case of using MRF [7].

2. DC mode intra prediction

In ITM, five prediction modes of vertical, horizontal, DC, down-left, and down-right are used for intra prediction in each 8Í8 sized luma block. In the DC mode, the reference samples are smoothed by using a smoothing filter and then the final prediction value is obtained by averaging the smoothed reference samples. A five-tap smoothing filter has been proposed and adopted to improve the coding performance by replacing the existing three-tap filter for DC intra prediction [9].

As shown in Fig. 4, the process of deriving the prediction values for the current block using previously decoded blocks in the same picture is as follows: if both of the upper block and left block of the current block are available, the smoothed reference samples are obtained by using the five-tap smoothing filter ([1 4 6 4 1]/16) instead of the existing three-tap filter ([1 2 1]/4), then the final prediction value is obtained by averaging the nearest reference samples to the current sample in the upper block and left block [9].

Fig. 4.Reference frames stored in DPB 그림 4. DPB에 저장된 참조 프레임

3. Lagrangian multiplier selection

The Lagrange multiplier based Rate-Distortion Optimization (RDO) has been also employed in IVC like current hybrid video coding standards such as H.264/AVC and HEVC. A predefined value of the Lagrange multiplier, λ is selected according to the picture type and the encoding modes of random access or low-delay as

where QP-SHIFT_QP is set to 11 in ITM [10].

In the low-delay encoding mode enabling non-reference P frame coding, the amount of generated bit of a non-reference P frame is quite smaller than that of a reference P frame since a larger value of QP is typically assigned to the non-reference P frame. The existing method of the Lagrange multiplier selection given by (1) has been extended to consider such bitrate characteristics of non-reference P frames [10]. In this extension, a non-reference P frame is regarded as a B picture in the selection of the Lagrange multiplier, which results in selecting a larger value of Lagrange multiplier for a non-reference P frame.

This extension has been further improved to fully reflect bitrate characteristics of the non-reference P frame [10]. In other words, there are three types of P frames in the non-reference P frame coding in terms of QP values to be used, and each of which may have different R-D characteristics results from different values of QP used. As experimental observations, Fig. 5 shows that there are significantly different behaviors of R-D characteristics according to the different types of P frames in the non-reference P frame coding.

Fig. 5.DC intra prediction using the 5-tap smoothing filter 그림 5. 5-탭 평활화 필터를 사용한 DC 화면간 예측

Different values of QP are used according to the coding structure of the non-reference P frames. Namely, QP (QP for P frame), QP+ΔQP1, QP+ΔQP2 (ΔQP2>ΔQP1) are used for the 1st-level, 2nd-level, and 3rd-level P frames, respectively. As illustrated in Fig. 5, as the level of P frame is larger the slope of rate-MSE curve is larger, therefore a larger value of Lagrange multiplier should be selected. In other words, when a larger value of QP is used, the generated bit amount is decreased; in such case a mode generating fewer bits should have higher priority than a mode reducing the distortion in RDO, which is enabled by using a larger value of Lagrange multiplier.

Lagrange multiplier selection method with full considerations of such three different types of R-D characteristics, which is given by (2), has been proposed and adopted in IVC [11].

Fig. 6.R-D characteristics of P frames in non-reference P frame coding 그림 6. 비 참조 P 프레임 부호화의 P 프레임의 R-D 곡선 특성

4. Extension of chroma intra prediction modes

As shown Fig. 7, Table II, and Table III, five prediction modes (vertical, horizontal, DC, down_left, and down_right) and four prediction modes (DC, horizontal, vertical, and plane) are available for intra prediction in luma blocks and chroma blocks in the existing of ITM [6], respectively. To improve the performance of chroma intra prediction, the candidates of prediction modes are extended with down_left and down_right which are used in the luma intra prediction as shown in Table III [12].

Fig. 7.Luma intra prediction modes 그림 7. 휘도 성분의 화면 내 예측 모드

Table 2.Luma intra prediction modes 표 2. 휘도 성분의 화면 내 예측 모드

Table 3.Extended chroma intra prediction modes 표 3. 확장된 색차 성분 화면 내 예측 모드

To measure overall performance of all color components, we measured the average PSNR value, PSNRavg, that utilizes Y, U, and V components as shown in the following equation [13], and then using the average PSNR, the BD-rate of all color components is computed.

5. Other extensions

Work on improving the coding efficiency of IVC is still on going in MPEG even the DIS (Draft International Standard) has been released, and there may be the second phase standard of IVC. As parts of this work, we have proposed two extensions: 4x4 block-based inter predictive coding [14] and chroma intra prediction for 4x4 blocksize [15], which are under consideration for final adoption to IVC

In the current IVC [4], 4x4 block size is available in intra predictive coding only, and the block sizes of 16x16, 16x8, 8x16, and 8x8 are allowed in inter prediction. However, 4x4 block may be effective in inter prediction, especially in more complex regions. Therefore, 4Í4 block-based inter predictive coding has been proposed [14], which results in the extension of variable block sizes including 4Í4, 8Í4, and 4Í8 for inter prediction.

For intra prediction in the current IVC [4], only one blocksize (8x8) is available in chroma prediction and blocksizes of 16x16, 8x8, 4x4 are available for luma prediction. However, 4x4 block-based intra prediction for chroma block may be effective in inter prediction, especially reducing residual signal. Therefore, we are working on 4x4 block-based intra predictive coding for chroma component.

For both coding tools, the meaningful improvements on coding efficiency have been identified in the preliminary implementation and experimental results. Further improvements and refinements are needed for the tools to be adopted as normative tools.

 

Ⅳ. Experimental Results

In this section, we present experimental results of the proposed coding tools presented in the previous section. The test conditions and encoder configurations described in [16] were used in the experiments. The encoder settings of ITM with Constraint Set 1 (CS1), Constraint Set 2 (CS2), and All-Intra are shown in Table IV. For each sequence, a set of fixed QPs specified in [17] is used.

Table. 4.Test conditions (All-Intra, CS1 (IBBP), CS2 (IPPP) ) 표 4. 테스트 환경(All-Intra, CS1(IBBP), CS2(IPPP))

1. Non-reference P frame coding

Table V and VI show that adaptive and non-adaptive non-reference P frame coding give significant coding gains of the average 8.1% and 8.8% bit saving over ITM 13.0, respectively. The results show that the non-reference P frame coding adopted in IVC is a useful coding tool to enhance the coding efficiency of IVC in the low-delay encoding configuration.

Table 5.BD-rate results of non-reference P frame (Anchor: ITM 13.0 , Adpative Case) 표 5. 비 참조 P 프레임의 BD-rate 결과(기준: ITM 13.0, 적응적 기법)

Table 6.BD-rate results of non-reference P frame (Anchor: ITM 13.0 , Non-adpative Case) 표 6. 비 참초 P프레임의 BD-rate 결과(기준: ITM13.0, 비적응적 기법)

2. DC mode intra prediction

The proposed five-tap smoothing filter gives 0.1%, 0.4%, 0% bit saving in average over the existing three-tap smoothing filter in CS1, CS2, and All-Intra encoding configuration without complexity increase, respectively [9].

Table 7.BD-rate results of the 5-tap DC mode over 3-tap DC mode (Anchor: ITM 7.0, CS1) 표 7. 3-탭 DC 모드 대비 5-탭 DC 모드의 BD-rate 결과(기준: ITM7.0, CS1)

3. Lagrangian multiplier selection

Experimental results show that the proposed Lagrange multiplier selection considering the R-D characteristics of non-reference P frame gives the average 1.2% and 1.3% bit saving over ITM 9.0 with non-adaptive and adaptive methods of non-reference P frame coding, respectively. Based on these results, it has been adopted in ITM 10.0. Finally adopted method given by (2) gives additional coding gain of the average 0.4% bit saving over ITM 10.0.

4. Extension of chroma intra prediction modes

As shown Table XIII, XIV, and XV extending 8x8 block-based chroma prediction in intra prediction obtain 2.0%, 1.0%, 0.0% average bitsaving in All-Intra, CS1, and CS2 configuration, respectively.

Table 8.BD-rate results of the 5-tap DC mode over 3-tap DC mode (Anchor: ITM 7.0, CS2) 표 8. 3-탭 DC 모드 대비 5-탭 DC 모드의 BD-rate 결과(기준: ITM7.0, CS2)

Table 9.BD-rate results of the 5-tap DC mode over 3-tap DC mode (Anchor: ITM 7.0, All-Intra) 표 9. 3-탭 DC 모드 대비 5-탭 DC 모드의 BD-rate 결과(기준: ITM7.0, All-Intra)

Table 10.BD-rate results of Lagrange multipler selection in non-reference P-frame coding (Non-adaptive Case) (Anchor: ITM 9.0) 표 10. 비 참조 P 프레임 부호화의 라그랑지안 승수 선택 기법의 BD-rate 결과(비적응적 기법) (기준: ITM 9.0)

Table 11.BD-rate results of Lagrange multipler selection in non-reference P-frame coding (Adaptive Case) (Anchor: ITM 9.0) 표 11. 비 참조 P 프레임 부호화의 라그랑지안 승수 선택 기법의 BD-rate 결과(적응적 기법) (기준: ITM 9.0)

Table 12.BD-rate results of Lagrange multipler selection in non-reference P-frame coding (Anchor: ITM 10.0) 표 12. 비 참조 P 프레임 부호화의 라그랑지안 승수 선택 기법의 BD-rate 결과(기준: ITM 10.0)

Table 13.BD-rate results of Extension of Chroma Intra Prediction Mode (All Intra) (Anchor: ITM14.0) 표 13. 색차신호의 확장된 화면내 예측 모드의 BD-rate 결과(All-Intra) (기준: ITM14.0)

Table 14.BD-rate results of Extension of Chroma Intra Prediction Mode (CS1) (Anchor: ITM14.0) 표 14. 색차신호의 확장된 화면내 예측 모드의 BD-rate 결과(CS1) (기준: ITM14.0)

Table 15.BD-rate results of Extension of Chroma Intra Prediction Mode (CS2) (Anchor: ITM14.0) 표 15. 색차신호의 확장된 화면내 예측 모드의 BD-rate 결과(CS2) (기준: ITM14.0)

 

Ⅴ. Conclusion

In this paper, a set of coding tools for improving the coding efficiency of Internet Video Coding (IVC), which have been proposed during the developing process of IVC codec in MPEG, is presented. The proposed coding tools have been verified as royalty-free technologies as well as improving coding efficiency. The current version of IVC is comparable to H.264/AVC HP in terms of objective performance and subjective quality. The proposed tools have been adopted in the IVC DIS and play a role to enhance the performance of IVC. It is expected that IVC would be a practical solution of Type-1 standard video codec for diverse video applications and services.

References

  1. "Overview, context and objectives of Media-centric IoTs and Wearables," ISO/IEC JTC 1/SC 29/WG 11 N15727, Oct. 2015.
  2. "Call for Proposals (CfP) for Internet Video Coding Technologies," ISO/IEC JTC 1/SC 29/WG 11 N12204, Jul. 2011.
  3. K. Choi, E. S. Jang, “Royalty-Free Video Coding Standards in MPEG,” IEEE Signal Process. Mag., vol. 31, no. 1, pp. 145-155, Jan. 2014. https://doi.org/10.1109/MSP.2013.2282413
  4. "Text of CD 14496-33 Internet Video Coding," ISO/IEC JTC1/SC29/WG11 N15427, Warsaw, Poland, June 2015.
  5. R. Wang, "Report of visual tests of Internet Video Coding," ISO/IEC JTC1/SC29/WG11 N15428, Warsaw, Poland, June 2015.
  6. S.-h. Park, R. Wang, and J.-G. Kim, "Internet Video Coding Test model (ITM) v 13.0," ISO/IEC JTC1/SC29/WG11 N15429, Warsaw, Poland, June 2015.
  7. D. Kim, H. Choi, and J.-G. Kim, "Non-Reference P Frame Coding in Multiple Reference Frames of Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M34108, Sapporo, July. 2014.
  8. D. Kim, J.-s. Kim, and J.-G. Kim, “Non-Reference P Frame Coding for Low-Delay Encoding in Internet Video Coding,” JBE vol. 19, no. 2, pp, 250-256, J. Brocast Engineering, March 2014. https://doi.org/10.5909/JBE.2014.19.2.250
  9. D. Kim, H. Choi, and J.-G. Kim, "Performance evaluation of DC intra prediction mode for Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M32144, San Jose, Jan. 2014.
  10. D-H Kim, B. T. Oh and J.-G. Kim, "Lagrange Multiplier Selection for Non-Reference P Frames in Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M34109, Sapporo, July. 2014.
  11. S.-C. Oh, A. Yang, D. Kim, H. Choi, and J.-G. Kim, "Improvement on Lagrange Multiplier Selection for Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M34973, Strasbourg, Oct. 2014.
  12. S.-h. Lee, S.-h. Park, and E. S. Jang, “Chroma enhancement technique on the intra predicted block for IVC encoding,” Warsaw, Poland, June 2015.
  13. J.-Y. Lee, A. Yang, J.-K. Han, and J.-G. Kim, "Extension of Prediction Modes in Chroma Intra Coding for Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M37799, San Diego, Feb. 2016.
  14. J.-Y. Lee, A. Yang, J.-K. Han, and J.-G. Kim, "4x4 Blocksize Inter Prediction for Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M36361, Warsaw, June 2015.
  15. A. Yang, J.-Y. Lee, J.-K. Han, and J.-G. Kim, "Extension of Chroma Intra Prediction Modes in Internet Video Coding," ISO/IEC JTC1/SC29/WG11 M37462, Geneva, Oct. 2015.
  16. "IVC Core Experiment CE1: Overall Codec Testing," ISO/IEC JTC1/SC29/WG11 N13354, Geneva, Jan. 2013.
  17. R. Wang, X. Zhang, Q. Yu, M. Gao, and L. Bivolarsky, "Description of Core Experiments in Internet Video Coding," ISO/IEC JTC1/SC29/WG11 N13164, Shanghai, Oct. 2012.