DOI QR코드

DOI QR Code

Fractal Depth Map Sequence Coding Algorithm with Motion-vector-field-based Motion Estimation

  • Zhu, Shiping (Department of Measurement Control and Information Technology, School of Instrumentation Science and Optoelectronics Engineering, Beihang University) ;
  • Zhao, Dongyu (Department of Measurement Control and Information Technology, School of Instrumentation Science and Optoelectronics Engineering, Beihang University)
  • Received : 2014.08.29
  • Accepted : 2014.12.30
  • Published : 2015.01.31

Abstract

Three-dimensional video coding is one of the main challenges restricting the widespread applications of 3D video and free viewpoint video. In this paper, a novel fractal coding algorithm with motion-vector-field-based motion estimation for depth map sequence is proposed. We firstly add pre-search restriction to rule the improper domain blocks out of the matching search process so that the number of blocks involved in the search process can be restricted to a smaller size. Some improvements for motion estimation including initial search point prediction, threshold transition condition and early termination condition are made based on the feature of fractal coding. The motion-vector-field-based adaptive hexagon search algorithm on the basis of center-biased distribution characteristics of depth motion vector is proposed to accelerate the search. Experimental results show that the proposed algorithm can reach optimum levels of quality and save the coding time. The PSNR of synthesized view is increased by 0.56 dB with 36.97% bit rate decrease on average compared with H.264 Full Search. And the depth encoding time is saved by up to 66.47%. Moreover, the proposed fractal depth map sequence codec outperforms the recent alternative codecs by improving the H.264/AVC, especially in much bitrate saving and encoding time reduction.

Keywords

1. Introduction

With the development of multimedia technology, the traditional two-dimensional video has been unable to meet people's needs, while the three-dimensional (3D) immersive and interactive video gives the users an unprecedented wonderful experience owing to its unique depth of field effect. 3D video is composed of texture videos and the corresponding depth map sequences. Based on the principle of binocular parallax, the users can obtain a three-dimensional image, which enhances the senses of visual reality and verisimilitude. Compared with monocular video, the original data of 3D video is much huge. The transmission bandwidth is limited, which limits the widespread applications of 3D video, and therefore, it is necessary to study high efficiency 3D video compression technology.

The current 3D video compression methods mainly contain multi-view video coding [1] and depth map-based compression [2]. The multi-view video uses multiple cameras to shoot the same scene from different angles. Multi-view video is characterized by high complexity of coding and large amount of data. The multi-view video coding needs to consider the redundancy in the same viewpoint as well as that between adjacent viewpoints. Depth map represents the information about the distance from the scene to the camera imaging plane, which quantizes the actual depth value to [0, 255] so as to obtain the grayscale map representing the depth information. In the Multi-view Video plus Depth (MVD) [3] structure, each color image contains its corresponding depth map. This paper is to study the coding of depth map sequence.

The depth map can be represented by 3D-mesh, which can also be seen as a two-dimensional image so that the depth map compression can be treated as two-dimensional image compression and the existing compression standard can be applicable. In [4], an early SKIP-mode decision scheme and a selective prediction-mode decision scheme have been developed based on the relationship between coding modes of collocated macroblocks on texture and depth frames. Depth map sequence coding aims to maximize the perceived visual quality of the synthesized virtual view instead of the depth map sequence itself, so that a structural similarity-based synthesized view distortion (SS-SVD) model to relate perceptual distortion in coded depth map and synthesized view is proposed in [5]. It obtains better rate distortion performance and perceptual quality of synthesized views than JM reference software. In [6], an efficient intra prediction algorithm for smooth regions in depth coding is proposed by considering one single prediction direction instead of multiple prediction directions. The original context-based adaptive binary arithmetic coding (CABAC) was originally designed for lossy texture coding, and it cannot provide the best coding performance for lossless depth map coding. In [7], an enhanced CABAC coding mechanism for lossless depth map coding based on the statistics of residual data is proposed.

All the above depth map sequence encoding methods are based on H.264/AVC standard [8]. Its basic theory is to form a prediction of the macroblock based on previously-coded data, either from the current frame (intra prediction) or from other frames that have already been coded and transmitted (inter prediction) and then residual is transformed, quantified and entropy coded. Different from it, fractal image coding method is a novel technique in the image compression field using partial self-similarity of image [9]. Its theoretical bases are the Iterated Function System and Collage theorem [10], and it is recognized as one of the most promising new generation image coding technologies at present, which is characterized by novel concepts, high compression ratio and great potential, but with high coding complexity. The depth map sequence shows a strong similarity between adjacent frames. Therefore, good compression performance can be obtained by use of fractal theory.

To obtain high efficiency encoding performance, and to decrease the complexity of depth map sequence codec as well, we propose a novel fractal depth map sequence coding algorithm with motion-vector-field-based motion estimation. This paper is organized as follows: The theory of fractal coding is summarized in section 2. The proposed fractal depth map sequence coding algorithm is presented in section 3. The experimental results are presented in section 4. And finally the conclusions are outlined in section 5.

 

2. Background of Fractal Compression

Jacquin described the first practical fractal image compression algorithm based on a Partitioned Iterated Function System (PIFS). Divide the image I to be coded into two sizes of blocks, with the size of B×B pixels being range blocks called Ri , allowing non-overlapping and covering the entire image; and with the size of 2B×2B pixels being domain blocks called Di , allowing overlapping. All the domain blocks form a domain pool. The image I can be represented by

where N is the total number of range blocks.

Each range block Ri shall be approximated by a certain domain block Dm(i) with the transformation ωi : Dm(i) → Ri , which is composed of spatial contraction transformation γi and grayscale modification transformation λi.

ωi is often a contracted affine transformation, its general form is:

γi uses the average value of four adjacent pixels to get B×B pixel block . The average value of four adjacent pixels is:

where dh,v and represent pixel values of Dm(i) and at the pixel position (h,v) , respectively.

λi introduces grayscale adjustment parameter si , grayscale offset parameter oi , and isometric transformation tk in order to get a better approximation of Ri by further modifying the grayscale of Dm(i). The general form of λi is:

where a , b , c and d determine 8 kinds of pixel rearrangements (4 kinds of rotation and 4 kinds of flip); si and oi represent brightness adjustment and brightness offset respectively.

The fractal coding is aimed to look for the best matching block Dm(i) and block transformation ωi for each Ri , so that R'i = ωi(Dm(i)) and Ri are as close as possible in the given distortion metric, namely

The global transformation of the image onto a close approximation of itself is then constructed from a combination of these local block transformations. Fig. 1 shows some mappings from domain to range blocks.

Fig. 1.Mappings from domain to range blocks

 

3. The Proposed Fractal Depth Map Sequence Coding Scheme

This paper proposes a fractal depth map sequence coding scheme, as shown in Fig. 2. Actually, it is a hybrid codec with intra prediction and inter fractal. When a depth map Fn is input to the encoder, it will be processed with range block as the basic unit, and the encoding mode will be selected between intra prediction and inter fractal. The intra prediction is to use the previously decoded adjacent pixels within the same frame to predict the current block based on spatial correlation. Compared with the previous encoders, the inter fractal adds pre-search restriction and improves motion estimation. A motion-vector-field-based adaptive hexagon search method is designed. Locate the best matching block of current range block from the reference frame according to the motion vector (x, y) , and then use the best matching block iteration to produce the currently predicted block PRED according to the recorded scale factor s and offset factor o. The iterative formula is as follows:

Fig. 2.The proposed fractal depth map sequence encoder

where ri represents pixel value of the predicted block, and di represents pixel value of the corresponding matching block.

In the figure, the reference frame represents the reconstructed image F'n-1 of the previous depth map. A residual block Dn is generated by subtracting the predicted block PRED from the current range block. It is transformed and quantized as a coefficient set X, and these coefficients are reordered for entropy coding. Because the reconstruction process requires iteration of fractal parameters to produce the reference frame, entropy coding coefficients of fractal parameters need to be written into the code stream. The entropy coding coefficients and side information (prediction mode, quantization step size, etc.) form the compressed bit stream.

3.1 Pre-search Restriction

In the process of fractal coding, the most important step is to search in the domain pool to find the best matching block for each range block in order to minimize the matching error mean square error (MSE).

where N represents the number of pixels in the current block.

It can be seen that, the fractal coding process is very time-consuming. Because the best matching block is required to be searched for every range block in the codebook A, while in theory, the number of candidate domain blocks is usually very large. For example, for each range block in the original image, if exhaustive search is used, the search time is O(|A|) (linearity depends on |A|), where |A| represents the number of blocks in A. If |A| can be reduced, the fractal encoding time must be decreased. Some efforts have been made to reduce the number of blocks need to search. For example, by using a nearest-neighbor search techniques [11] or a local search operation [12] or searching in the area centered on last best matching domain block [13] or classification of image blocks [14].

In our scheme, before the matching search process for a range block, we add a pre-search restriction to rule the improper domain blocks out of the matching search process based on the variance matching condition of each block so that the number of blocks involved in the matching search process can be restricted to a smaller size, thus to greatly accelerate the coding. The derivation process of the pre-search restriction condition is detailed as follows.

In the fractal coding process, by setting an appropriate scale factor s and offset factor o may allow the value of Di after affine transformation has a minimum squared distance of the value of Ri . When the partial differentiations of s and o are 0, the minimum matching error MSE can be obtained, in which the corresponding s and o are shown in Formulas (9) and (10).

Therefore, substitute the Equations (9) and (10) again into the Formula (8) to obtain the following MSE form different from the basic one:

Make and it’s obvious that Then Formula (11) can be further derived as follows:

Let

For each range block, is already known. In order to obtain the minimum matching error MSE, the value of m should be as smaller as possible, which is the start point of our pre-search restriction scheme.

In our pre-search restriction scheme, for each range block in the matching process, if Th ˂ m˂1, then we need to search the next block in the codebook to find the minimum matching error MSE; otherwise it indicates that the current MSE has been small enough without requirement of the next step’s comparison and the result can be directly saved, then proceed to search for the next range block. Th is a threshold experimentally determined in advance and it’s set to 0.9 in our scheme.

The pre-search restriction algorithm is outlined in the pseudo-code as follows.

Pre-search Restriction Algorithm:

3.2 Improvements of Motion Estimation

Currently, the fractal coding speed is very low, of which the most important reason is that the search for the best matching block is time-consuming, and to improve the coding speed, it is necessary to improve search skills. Unsymmetrical-Cross Multi-Hexagonal Search (UMHexagonS [15]) is a fast motion estimation algorithm, which has been applied in the JM reference software of H.264/AVC. It greatly outperforms full search in computation with up to 90% computational reduction while still maintaining good rate-distortion performance. However it’s proposed based on the multiple prediction modes and multiple reference frames of H.264, which is significantly different from the fractal depth map-sequence coding method in this paper. Therefore, the following improvements have been made.

1. Initial search point prediction

Use the following three methods for initial search point prediction:

(1) Spatial median prediction: According to the spatial correlation, take the median of motion vectors of the left, up and right-up adjacent blocks in the current frame as the predicted motion vector, as shown in Fig. 3;

Fig. 3.Spatial median prediction

(2) Origin prediction: Take (0,0) as the motion vector value;

(3) Adjacent reference frame prediction: According to the time correlation, get current motion vector from MV in the same position of the previous reference frame.

Then calculate the MSE using Formula (12) for each candidate, compare and choose the minimum MSE. The point with the minimum MSE is named as MMD (minimum matching distortion), which is used as the initial search point.

2. Threshold transition condition at the asymmetrical cross template search

The criteria of the similarity between range block and transformed domain block in fractal coding uses MSE as shown in Formula (12). We use different thresholds based on different sizes of blocks; after asymmetrical cross template search is completed, select the best matching point as a new start point for the subsequent template matching.

3. Early termination condition

Based on the characteristics of fractal coding algorithm in this paper, the early termination circumstances include two aspects:

Firstly, in the multi-level non-uniform hexagonal grid integer pixel motion search process, in addition to the early termination conditions of the algorithm itself, in order to reduce the complexity of search, if the optimal matching point is at the center of the hexagon, the search can be stopped;

Secondly, the inter fractal coding algorithm in this paper uses the hierarchical tree-structure partition as shown in Fig. 4. Search using mode 1 at first, if the threshold condition is satisfied, then terminate the current block coding and proceed to the next block; otherwise, use mode 2 to divide the current range block into two smaller sub-blocks and search for each sub-block. This partition goes on to mode 3 and mode 4 until the threshold condition is satisfied. When the range block has been partitioned to 4 equal sub-blocks as shown in mode 4, each sub-block allows further partition recursively until it reaches the allowed minimum size.

Fig. 4.Hierarchical tree-structure partition

3.3 Motion-vector-field-based Adaptive Hexagon Search Algorithm

In the actual video sequences, most of the depth motion vectors are enclosed in the central area [16]. We use full search algorithm in fractal depth map sequence codec and make some statistical analysis about motion vector. As shown in Fig. 5(a) and Fig. 5(b), about 81.80% of motion vectors are found located in the central 5×5 area, i.e. p = ±2 pixels, using the search window W = ±7 . The large cross pattern (A+B+C) accounts for 74.71% of the motion vector distribution, and the small cross pattern (A+B) accounts for 68.98% of the motion vector distribution. A lot of experimental results show that more than 80% of the blocks can be regarded as stationary or quasi-stationary blocks, suggesting that motion vector is of center-biased distribution characteristics [16]. Most of the best matching points of inter fractal encoding are located in a small scope around the initial point.

Fig. 5.Average distributive probability of depth motion vector for the search window W = ±7

Based on this observation, we propose our motion-vector-field-based adaptive hexagon search algorithm. Firstly, an unsymmetrical cross-shaped pattern (UCSP) is used to search points around the initial point, as shown in Fig. 6(a), and then small cross-shaped pattern (SCSP, see Fig. 6(b)) and large hexagon-shaped pattern (LHSP) are used. In addition to the six surrounding points and the central point of the traditional LHSP, we also contains the midpoint of the two sides located above and below the search central point, as shown in Fig. 6(c), so that the new LHSP contains a total of 9 search points, which could improve the search accuracy with negligible increasing complexity. Furthermore, we employ a halfway-stop technology so that the small motion vector can be searched with less points, so as to accelerate the motion estimation. The specific steps are as follows:

Fig. 6.Search patterns

Step (i):      Search by a UCSP centering the initial point. Search stops if the MMD point occurs at the center of the UCSP, and we get the motion vector, as shown in Fig. 7(a), where the best matching point is filled with red color. Otherwise, go to Step (ii).

Fig. 7.The proposed motion-vector-field-based adaptive hexagon search algorithm

Step (ii):      Take the MMD point in Step (i) as the center of the SCSP. Two or three new points need to be searched in this step. Search stops if the MMD point occurs at the center of the SCSP. Fig. 7(b) shows the case when two more points (filled with yellow color) are searched. Otherwise, go to Step (iii).

Step (iii):      Four additional points on a traditional LHSP centering the initial point in step (i) are checked. If the MMD point is not on the LHSP, meaning that the MMD point in Step (ii) is still the best matching point, the search stops, as shown in the in Fig. 7(c). Otherwise, if the MMD point is on the LHSP, go to Step (iv).

Step (iv):      Search by the new LHSP centering the MMD point in Step (iii). If the new MMD point is at the center of the new LHSP, then go to Step (v), as shown in Fig. 7(d). Otherwise, this step is repeated again.

Step (v):      Take the minimum MMD point in the previous step as the center, and then a SCSP is applied. Find the new MMD point, which is the final solution for the motion vector, as shown in Fig.7 (e).

The first two steps in our proposed motion estimation scheme using two cross-shaped pattern have a high probablity to find the best matching point based on center-biased distribution characteristics of motion vector. Thus the number of points need to search is heavily reduced. In the following steps, we use large hexagon-shaped pattern to avoid falling into local optimum.

 

4. Experimental Results and Analysis

To verify the performance of the proposed method, we have made several experiments with the Kendo, Newspaper, and Book Arrival test sequences. Experimental results obtained using our proposed fractal coding algorithm with motion-vector-field-based motion estimation are compared with those using H.264 full search, H.264 EPZS (enhanced predictive zonal search), LCMDME proposed by Gianluca et al [17], and fast MD/ME with Sobel edge detector by Zhu et al [18]. Since depth maps are not directly used to watch, but they are used to generate the synthetic views. Hence, we compare the quality between the synthetic views generated by the encoded depth maps. Synthetic views are generated using the View Synthesis Reference Software (VSRS) 3.0 provided by MPEG [19]. The views used for our experiments are listed in Table 1.

Table 1.Color and depth sequences used for the experiments

To keep coincidence with the compared methods, the color videos used for the generation of the synthetic views have been encoded using our fractal codec. The RD curves are shown in Fig. 8, where the PSNR is evaluated between the synthetic views generated through the original depth maps and the synthetic views created through the compressed ones [17]. The bitrate is calculated by the sum of the two depth sequences’ bitrate.

Fig. 8.RD curves for different methods

It can be seen from Fig. 8 that our proposed fractal depth map sequence coding algorithm with motion-vector-field-based motion estimation outperforms all the other compared methods. Adjacent frames in the depth map sequence have a strong similarity, so that we can get good compression performance by inter fractal method.

The RD performance comparisons evaluated with the Bjontegaard metric [20] are summarized in Table 2. We note that the proposed method is consistently better than all the other compared methods, with 33.82% bit rate decrease and 0.45 dB PSNR increase on average than H.264 Full Search.

Table 2.BDBR and BDPSNR between the proposed method and the compared five methods

Fig. 9 shows the 10th frame of the original depth map sequence Newspaper view 2 and the decoded ones by Full Search using JM18.1 [21] and the proposed method. Fig. 10 shows the 10th frame of the synthesized Newspaper view 3 resulted from Full Search using JM18.1 and the proposed method.

Fig. 9.the 10th frame of the original depth map-sequence Newspaper view 2 and the decoded ones

Fig. 10.The 10th frame of the synthesized Newspaper view 3

Fig. 11 shows the encoding time for two depth views and Table 3 shows the percentage of average depth encoding time saved by our method compared with Full Search and LCMDME.

Fig. 11.Comparison of depth sequences (two depth views) encoding time

Table 3Average depth encoding time saved (%) by the proposed method compared with Full Search and LCMDME

The encoding time comparisons show our proposed method can reduce depth compression time by up to 66.47% and 42.26% compared with H.264 Full Search and LCMDME respectively. Although the encoding time is longer for our methods than the other three fast motion estimation methods, the difference is not prominent for Kendo.

 

5. Conclusion

In this paper, an efficient fractal depth map-sequence coding algorithm with fast motion estimation is proposed. We firstly add pre-search restriction to rule the improper domain blocks out of the matching search process so that the number of blocks involved in the matching search process can be restricted to a smaller size. Some improvements for motion estimation including initial search point prediction, threshold transition condition and early termination condition are made based on the feature of fractal coding. The probability distribution characteristics of depth motion vectors are also analyzed and motion-vector-field-based adaptive hexagon search algorithm is proposed to accelerate the search, especially for the static block or the quasi static block.

Experimental results show that the proposed algorithm can reach optimum levels of quality and save coding time at the same time. The PSNR of synthesized view is increased by 0.44 dB with 33.82% bit rate decrease on average compared with H.264 Full Search evaluated by the Bjontegaard metric. And the depth encoding time is saved by up to 66.47%. This method makes the best of the features of fractal coding and depth motion, and achieves a great improvement with considerably good results. Also, it has built a good foundation for the further research of 3D video fractal coding. Our method doesn’t consider the relation between depth map and the corresponding color video, which may be our further research direction.

References

  1. Anthony Vetro, Thomas Wiegand and Gary J. Sullivan, "Overview of the stereo and multiview video coding extensions of the H.264/MPEG-4 AVC standard," in Proc. of the IEEE, vol. 99, no. 4, pp. 626-642, April, 2011. https://doi.org/10.1109/JPROC.2010.2098830
  2. Byung Tae Oh, Jaejoon Lee and Du-sik Park, "Depth map coding based on synthesized view distortion function," IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 7, pp. 1344-1352, November, 2011. https://doi.org/10.1109/JSTSP.2011.2164893
  3. Qiuwen Zhang, Ping An, Yan Zhang, Liquan Shen and Zhaoyang Zhang, "Low complexity multiview video plus depth coding," IEEE Transactions on Consumer Electronics, vol. 57, no. 4, pp. 1857-1865, November, 2011. https://doi.org/10.1109/TCE.2011.6131164
  4. Ling-Jiao Pan, Byung-Tak Lee and Nac-Woo Kim. "H.264-based depth map-sequence coding algorithm using mode mapping for fast 3-D video compression," Optical Engineering, vol. 50, no. 1, pp. 017401, January, 2011. https://doi.org/10.1117/1.3533026
  5. Hui-Ping Deng, Li Yu, Bin Feng and Qiong Liu, "Structural similarity-based synthesized view distortion estimation for depth map coding," IEEE Transactions on Consumer Electronics, vol. 58, no.4, pp. 1338-1344, November, 2012. https://doi.org/10.1109/TCE.2012.6415004
  6. S.-H. Tsang, Y.-L. Chan and W.-C. Siu, "Efficient intra prediction algorithm for smooth regions in depth coding," Electronics Letters, vol. 48, no. 18, pp. 1117-1119, August, 2012. https://doi.org/10.1049/el.2012.1768
  7. Jin Heo and Yo-Sung Ho, "Improved context-based adaptive binary arithmetic coding over H.264/AVC for lossless depth map coding," IEEE Signal Processing Letters, vol. 17, no.10, pp. 835-838, October, 2010. https://doi.org/10.1109/LSP.2010.2059014
  8. JVT of ISO/IEC MPEG and ITU-T VCEG, "Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264 l ISO/IEC 14496-10 AVC)," Doc. G050r1, March, 2003.
  9. Erjun Zhao and Dan Liu, "Fractal image compression methods: a review," in Proc. of 3rd International Conference on Information Technology and Applications, pp. 756-759, July 4-7, 2005.
  10. Yuval Fisher, "Fractal image compression," Fractals, vol. 2, no. 3, pp. 347-361, September, 1994. https://doi.org/10.1142/S0218348X94000442
  11. Hannes Hartenstein, Matthias Ruhl and Dietmar Saupe, "Region-based fractal image compression," IEEE Transactions on Image Processing, vol. 9, no. 7, pp. 1171-1184, July, 2000. https://doi.org/10.1109/83.847831
  12. H. Mohamadi, S. Nodehi and M. Tayarani, "A local search operator in quantum evolutionary algorithm and its application in fractal image compression," in Proc. of The 2nd International Conference on Computer and Automation Engineering, pp. 710-715, February 26-28, 2010.
  13. Hui Yu, Li Li , Dan Liu, Hongyu Zhai and Xiaoming Dong, "Based on quadtree fratal image compression improved algorithm for research," in Proc. of International Conference on E-Product E-Service and E-Entertainment, pp. 1-3, November 7-9, 2010.
  14. Tamas Kovacs, "A fast classification based method for fractal image encoding," Image and Vision Computing, vol. 26, no. 8, pp. 1129-1136, August, 2008. https://doi.org/10.1016/j.imavis.2007.12.008
  15. Z. B. Chen, P. Zhou and Y. He, "Fast integer pixel and fractional pixel motion estimation for H.264/AVC," Journal of Visual Communication and Image Representation, vol. 17, no. 2, pp. 264-290, April, 2006. https://doi.org/10.1016/j.jvcir.2004.12.002
  16. Chun-Ho Cheung and Lai-Man Po, "A novel cross-diamond search algorithm for fast block motion estimation," IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 12, pp. 1168-1177, December, 2002. https://doi.org/10.1109/TCSVT.2002.806815
  17. Gianluca Cernigliaro, Fernando Jaureguizar, Julian Cabrera and Narciso García, "Low complexity mode decision and motion estimation for H.264/AVC based depth maps encoding in free viewpoint video," IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 5, pp. 769-783, May, 2013. https://doi.org/10.1109/TCSVT.2012.2223632
  18. Bo Zhu, Gangyi Jiang, Yun Zhang, Zongju Peng and Mei Yu, "View synthesis oriented depth map coding algorithm," in Proc. of Asia-Pacific Conference on Information Processing, pp. 104-107, July 18-19, 2009.
  19. M. Tanimoto, T. Fujii and K. Suzuki, "View synthesis algorithm in view synthesis reference software 3.0 (VSRS 3.0)," Doc. M16090, February, 2009.
  20. G. Bjontegaard, "Calculation of average PSNR differences between RD curves," Doc. VCEG-M33, April, 2001.
  21. http://iphome.hhi.de/suehring/tml/download/old_jm/