1. Introduction
In recent years, 3D image compression has attracted significant research attention, especially depth image in 3D image data format. Depth image represents the distance information between a camera and the objects in the scene. Depth images are often treated as gray-scale image sequences, which are similar to the luminance component of texture videos. However, different from the texture image, the depth image has its own special characteristics. First, the depth image signal is much sparser than that of the texture video. It contains no texture but has sharp object boundaries because the gray levels are nearly the same in most regions within an object but change abruptly across boundaries. Furthermore, the depth image is not directly used for display, but it plays an important role in the virtual view synthesis. The distortion of depth data, especially around object boundaries, seriously degrades the quality of the rendered virtual views [1]. Therefore, determining the means to employ depth image characteristics for efficient compression is an essential part in 3D systems.
Multiple Description (MD) is a coding technique that has emerged as a promising approach to enhance the fault tolerance of a video delivery system [2]. In 1993, the first work on MD coding was introduced in [3]. MD coding can also be used in watermarking. In [4], the ownership of an image that has little perceptual distortion can be identified by image watermarking. For many applications, an MD coder generates multiple descriptions, and the packets of each description are routed over the same or multiple partial paths. To decode the media stream, any description can be used. If one description is received, we can reconstruct a low-quality image that can be measured by side distortion. However, the greater the number of received descriptions, the better the image quality will be reconstructed. In a simple architecture of two channels, the distortion with two received descriptions is called central distortion. Quantization-based MD coding scheme mainly includes multiple description scalar quantization (MDSQ) [5] and multiple description lattice vector quantization (MDLVQ) [6]. The MDSQ incorporated with wavelet transformation is first presented in [5], in which the input stream is made into two descriptions by two scalar quantizers. However, due to the good symmetrical structure of the lattice, the nonexistent need to design a code book, and the low storage space it takes up, MDLVQ will simplify the calculation of the conventional vector quantization. In [7], the performance of the MDLVQ image encoder was enhanced by using some algorithms that are more effective. The definition of Lattice Vector Quantization (LVQ), the specific algorithm, and the optimized scheme all have been introduced in [7]. In [7], an effective MD image coding scheme is introduced based on the MD Lattice Vector Quantization (MDLVQ) for the wavelet transformed images. Another LVQ based MD coding work is presented in [8], whereby the authors use an asymmetric MDLVQ scheme to a wide range of distortion profiles. There is also one paper [9], in which a new MD-coinciding lattice vector quantizer (MDCLVQ) is presented. The design of the quantizer is based on coinciding 2D hexagonal sub-lattices. The coinciding sub-lattices are geometrically similar sub-lattices, with the same index but generated by different generator matrices. However, no single-designed MDLVQ scheme for the depth image has been reported yet.
In this paper, considering the special characteristics of depth information, an optimized MDLVQ on depth map is proposed. Compared with the basic MDLVQ, the proposed scheme for depth image has some improvements. Given that the depth image contains no texture but only object edges, the blocks can be classified into two classes, edge blocks and smooth blocks, which can then be compressed using different modes. Furthermore, instead of the fixed step size of LVQ over the whole map, the step size can be adaptively assigned to different blocks according to the boundary contents of the edge blocks.
The rest of this paper is organized as follows. In Section 2, an overview of the conventional MDLVQ coding scheme is presented for general natural images. In Section 3, the optimization of MDLVQ encoding and decoding is proposed for the depth image in detail. The performance of the proposed scheme is examined against the other coders in Section 3. Conclusions are presented in Section 4.
2. Basic MDLVQ Image Coding
Fig. 1 illustrates the framework of the MDLVQ scheme for image coding. Here, two balance channels are considered, that is to say, the bit rate and side distortion produced by the two side decoders are approximately the same for the two channels. Next, a step-by-step recipe will be given as follows.
Fig. 1.Block diagram of the basic MDLVQ scheme.
Step 1: Block splitting and transformation
First, a given input image is decomposed into blocks of the same size. In [7], the whole image can be decomposed into subbands (subband 1, subband 2 ,…, subband m , denoted by Si, i =1, 2,..., m ) by DWT. In this paper, DCT transform is applied in each block. Similar to that done in [5], small DCT coefficients in high frequency are set to zeros in view that the information with high frequency in a natural image is not particularly important after DCT transform.
Step 2: Vector organization
In the basic MDLVQ, the LVQ is based on the A2 lattice, which is a 2D lattice. In view of the high compression efficiency, the correlations between the 2D vectors should be exploited appropriately. In [7], the wavelet coefficients in each subband have different directional correlations. Therefore, organizing coefficients in every subband according to their directional correlations is more efficient. For example, HL is scanned to form vectors along the vertical direction, LH is scanned in the horizontal direction, and HH is scanned in zigzag way. In addition, spiral scan is applied in the LL subband because of the strong correlation among neighboring coefficients. However, the correlations among DCT coefficients are different from that of DWT coefficients because most non-zero coefficients focus on the upper left corner in the 2D matrix after DCT quantification. Each block can thus be scanned in zigzag manner.
Step 3: Lattice vector quantization (LVQ)
Here, we utilize the LVQ based on an A2 lattice. A2 lattice is also equivalent to the hexagonal lattice [10]. The hexagonal lattice can be spanned by the vectors (1, 0) and and the generator matrix will be as follows.
In each block, every two coefficients form a 2D vector according to the special scanning. A lattice vector quantizer is then applied to such 2D vectors, thus producing a quantized symbol λ , λ⊂A2. The process is similar to that of scalar quantization, in which the lattice vector can also implement quantization in a different accuracy by adjusting the “step size.” Different from the conventional vector quantization, LVQ does not require performing the computation-intensive nearest neighboring search based on squared distance calculation. Therefore, the complexity of LVQ on A2 is considered very low.
Step 4: Index assignment
After LVQ, a quantized point λ is mapped into two sub-lattice points as two descriptions, which is then called index assignment. Here, a labeling function [11] maps λ⊂ Λ to a pair (λ1' , λ2')∈ Λ' x Λ'. where Λ' is a sub-lattice of Λ with the index N . The index N determines the coarse degree of the sub-lattice that can control the amount of redundancy in the MDLVQ encoder [6].
Fig. 2 is an example of an A2 sub-lattice with index N =13 . In the case of N =13 , we can obtain a labeling function as in Table 1, where each fine lattice point λ is mapped to a unique label (λ1', λ2'), with λ1' and λ2' being the two sub-lattice points as close to λ as possible. Note that the proposed mapping scheme shown in the table is slightly different from the index assignment developed by Servetto, Vaishampayan, and Sloane [6] (known as SVS technique). In our proposed scheme, λ1' is always closer to λ , and thus λ1' is denoted as the near sub-lattice point and λ2' is the far sub-lattice point. To strike a balance of reconstruction quality with any single description sequence, λ1' and λ2' are alternately transmitted over two channels.
Fig. 2.Example of A2 with index 13: Fine lattice points are labeled by small letters, and sub-lattice points are labeled by capital letters.
Table 1.Index assignment for V0(0) in the hexagonal lattice with N =13
As a simple example, if we have a quantized sequence of fine lattice points{ λ(1),λ(2),...,λ(8) }={ a, a, a, b, b, b, i, i}, then the two sequences of sub-lattice points that use the labeling function in Table 1 are { λ1'(1),λ1'(2),...,λ1'(8) }={ O, O, O, O, O, O, D, D } and { λ2'(1),λ2'(2),...,λ2'(8) }={ A, A, A, B, B, B, B, B }. Based on the alternative transmission scheme, the sequence
is transmitted over channel 1 and
over channel 2.
Step 5: Center decoder and side decoder
At the receiver, if both channels are working properly, then the two descriptions will be processed by the central decoder after arithmetic decoding, and the sequence of fine lattice points{ λ } can be reconstructed with the central distortion. If either description is lost, a side decoding effect can be obtained by performing lost information prediction if necessary, based on the neighboring inter-vector correlation and the mentioned alternative transmission scheme above. However, this option results in a larger distortion than that obtained by the central decoder.
3. Improved MDLVQ for depth image
3.1 Overview
Based on the basic MDLVQ image coding in Fig. 1, an optimized MDLVQ scheme is proposed in view of the characteristics of the 3D depth image shown in Fig. 3. First, the depth image is sparse, and each gray value represents the distance between the corresponding pixel point and the camera. Therefore, the edge regions in color image tend to be smooth in the depth map. Moreover, the distortion of the smooth area in the depth map has a low impact on the quality of the synthesized virtual viewpoint image. Therefore, the depth image is classified into edge blocks and smooth blocks, which are encoded in different modes. The specific classification results and the two coding modes will be presented in more detail below. Second, because the depth image contains no texture, only object edges, edge information encoding tends to be particularly important. Here, we regulate the step size of MDLVQ adaptively for each block according to the boundary contents.
Fig. 3.Block diagram of the improved scheme.
3.2 Smooth block encoding and decoding
After block splitting, the mean value of each block is calculated first. If the values of each pixel in the corresponding block are equal to the mean value, then the block is called smooth block; otherwise, it is called edge block.
After classification, the smooth blocks will be marked with the flag bit “0.” Then only the flag bit together with the DC component of the blocks will be compressed with arithmetic coding. The information of smooth blocks will also be transmitted together with the two descriptions that are processed after index assignment. At the decoder, if the flag bit “0” is received, the block will be decoded with all zeros. The block can then be reconstructed by adding its DC component. As a result, the block classification optimization can largely reduce the bit rate, saving coding time at the same time. In addition, the smoother the blocks are, the greater the reduction in bit rate.
The results of block classification are shown in Table 2. For the sequence Pantomime, 77.3% of the blocks in the left view and 78.8% in the right view are smooth blocks. For the Kendo sequence, smooth blocks account for 69.1% in the left view and 61.2% in the right view. Although the proportion of smooth blocks for the Balloons sequence is relatively smaller, there are still 20.3% and 32.1% in the left and the right views respectively.
Table 2.Block classification results
Fig. 4 shows the subjective block classification of the Pantomime sequence. Here, the smooth blocks are tagged in green. The figure clearly shows that most areas in both the left and right views of the depth image are smooth.
Fig. 4.Subjective pantomime sequence: (a) left view, (b) right view.
3.3 Edge block encoding and decoding
In the basic MDLVQ image encoding [7], two important factors will affect the reconstruction image quality and the bit rate. The first one is the area of the hexagonal lattice (in Step 3), i.e., the quantization “volume-size” used in LVQ, and the other is the choice of sub-lattice index (in Step 4).
As the depth image contains no texture, only object edges, a quantizer designed according to the boundary contents seems to be essential. The edge of depth image is more aggressive than any other ordinary image edge, so it is easy to use an edge detection operator to extract the edge. Canny operator directly detects the weak and strong edge in two different thresholds; it can restrain noise and obtain an accurate edge. Therefore, to achieve accurate edge detection, a Canny operator is selected to extract the edge of depth image before LVQ, as shown in Fig. 3.
The lattice A2 is the space that can be spanned by two vectors (1, 0) and and thus the area of the hexagonal lattice can be determined by the two vectors. However, we can retain the shape of the hexagonal lattice and change its area by multiplying the generator matrix U by a factor δ,(δ ∈ R, δ˃0) . If there is more edge information in one block, then the factor δ value is also relatively smaller. The parameter δ in the LVQ is similar to the step-size in scalar quantization (SQ). The central distortion D0 and its associated bit rate can be adjusted by changing δ.
Finding the optimal parameters δ and N is necessary to strike the best trade-off among central distortion, side distortion, and their associated bit rates. With the analysis of analogies between MDLVQ and MDSQ, we can perform the optimization of parameters δ and N in MDLVQ encoding similar to the optimization method for MDSQ encoding in [5]. Therefore, the MD design problem can be formulated as yielding optimal performance in the presence of the constraints of side distortion and its bit rate. To facilitate the description, consider the following definitions.
Let I denote an image, and M={m1,m2,...,mi,} denote edge blocks after the block classification.
Let δmi refer to the magnified degree of the lattice area (i.e., quantization “volume-size”) used for the corresponding block. Nm = {Nmj|j=1,2,...,i} represents the set of the index numbers used in the labeling function for different edge blocks.
Let D0 = (M, δm, Nm), D1 = (M, δm, Nm) and D2 = (M, δm, Nm) denote the mean squared errors (MSE) from the central decoder and the side decoders for the input image.
Let R1(M, δm, Nm) and R2(M, δm, Nm) denote the number of bits required to encode each description of I using the given central quantizer and index assignments.
Then, our goal is to find a pair (δm, Nm) to solve
subject to
where the user-specified parameters are Rbudget (the available bit rate to encode each description) and Dbudget (the maximum distortion acceptable for single-channel reconstructions).
Next, we present an algorithm to find parameters that can solve (2)–(4) with the constraints on the bit rate per channel and the side distortion. Here, δm and Nm are adjusted accordingly to minimize the central distortion.
The basic idea is to take advantage of the monotonicity of both R and D as functions of δm. First, after initialization a relatively minimal δm is searched to minimize D0 subject to Condition 1. Second, according to condition 2, Nm can be updated sequentially from a low index to high ones. The updated Nm then affects R1(M, δm, Nm) and R2(M, δm, Nm) in Condition 1 and, in turn, δm will be updated to minimize D0 further. Thus, the two steps will be iterated to update δm and Nm until D0 has little change. A pseudocode description of the proposed algorithm is presented below.
4. Experimental Results and Analysis
To highlight the performance of the proposed scheme, the experiments are implemented on three standard sequences of depth images, including the Balloons sequence (1024 x 768), Kendo sequence (1024 x 768), and Pantomime sequence (1280 x 960). This paper focuses on not only the comparison of our proposed optimized scheme against the conventional MDLVQ , but also the same experimental setup for the MDLVQ scheme in [7], which is based on wavelet domain. To prove the universality of the experiment, four groups of data in each sequence are selected for comparison. According to the MDC quality assessment, we compare not only the rate central distortion performance when two descriptions can be received correctly, but also the rate side distortion performance when only one description can be received. The comparison for depth images is presented in Fig. 5, where the horizontal axis represents the bit rate, and the vertical axis represents the PSNR values. Here, one view is chosen for each three sequences to compare: the 1st view of the Balloons and Kendo, and the 37th view of the Pantomime.
Fig. 5.Objective quality comparison for the depth image sequences Balloons, Kendo, and Pantomime. (a), (b), and (c): rate side distortion performance; (d), (e), and (f): rate central distortion performance.
Given that the depth map is not directly used for display, the objective and subjective qualities of the rendered virtual views should be taken into account. In the objective aspect, the synthesized virtual viewpoint image can be achieved by two original camera images. For example, for the tested sequences Balloons and Kendo, the depth and texture from the 1st and 3rd views can be used to synthesize the texture of the 2nd view, while for the sequence Pantomime, the depth and texture from the 37th and 39th views can generate the texture of the 38th view. A comparison of the synthesized images is presented in Fig. 6.
Fig. 6.Objective quality comparison for the synthesized virtual viewpoint sequences of Balloons, Kendo, and Pantomime. (a), (b), and (c): rate side distortion performance; (d), (e), and (f): rate central distortion performance.
The figures show that the PSNR values reconstructed by the proposed scheme can be improved significantly in both single and central channels. Fig. 5 has shown the objective quality comparison for the three tested sequences. In the single channel, compared with the basic MDLVQ, the proposed scheme can achieve 3.8-6.4 dB improvements for “Balloons”, 3.8-5.5 dB for “Kendo” and 3.3-5.8 dB for “Pantomime”. Compared with the reference scheme in [7], the proposed scheme can obtain 0.6-0.8 dB gains for “Balloons”, 0.9-1.4 dB for “Kendo” and 6.0-6.5 dB for “Pantomime”. At the same time, in the central channel, compared with the basic MDLVQ, the proposed scheme can achieve 4.5-7.8 dB improvements for “Balloons”, 4.9-5.7 dB for “Kendo” and 5.0-8.0 dB for “Pantomime”. Compared with the reference scheme in [7], the proposed scheme can obtain 3.6-4.5 dB gains for “Balloons”, 3.3-4.4 dB for “Kendo” and 4.7-5.0 dB for “Pantomime”.
As for the synthesized virtual viewpoint sequences, in Fig.6 we can see clearly that the proposed scheme outperforms the two schemes we choose to compare. In the single channel, the proposed scheme can gain around 1.4-3.9 dB for “Balloons”, 3.2-7.1 dB for “Kendo”, and 2.1-3.4 dB for “Pantomime”, compared against the basic MDLVQ. And the proposed scheme can obtain around 2.6-2.8 dB for “Balloons”, 2.7-2.8 dB for “Kendo”, and 7.6-7.8 dB for “Pantomime”, compared against the reference scheme in [7]. Furthermore, in the central channel, the proposed scheme can achieve around 0.8-2.8 dB for “Balloons”, 4.5-7.0 dB for “Kendo”, and 5.2-7.9 dB for “Pantomime”, compared against the basic MDLVQ. And the proposed scheme can obtain around 3.5-4.9 dB for “Balloons”, 3.2-4.5 dB for “Kendo”, and 4.5-4.9 dB for “Pantomime”, compared against the reference scheme in [7].
Furthermore, the advantages of the proposed scheme can be more clearly seen in Fig. 7, in which the subjective quality of the synthesized virtual viewpoint of Balloons is presented, especially in some parts denoted by red rectangle.
Fig. 7.Subjective quality comparison of synthesized virtual viewpoint for the Balloons sequences. (a), (c), and (e): the proposed scheme; (b), (d), and (f): basic MDLVQ.
5. Conclusion
An LVQ-based MD depth image coding scheme was developed in this study. An effective optimization scheme in LVQ encoding was accommodated in the proposed system to achieve better rate and central/side distortion performance. By considering the smooth blocks and the appropriate QP assignment, we can clearly see that the proposed MDLVQ demonstrates superior rate-distortion performance and considerably lower bit rate. The PSNR of reconstruction image by the proposed scheme improves significantly at the same bit rate. Thus, our proposed scheme is clearly a worthy choice for depth map coding.
References
- C. Zhu, Y. Zhao, L. Yu, M. Tanimoto, "3D-TV System with Depth-Image-Based Rendering: Architectures," Techniques and Challenges, 2012.
- Y. Wang, A. R. Reibman, and S. Lin, "Multiple description coding for video delivery," Proceedings of the IEEE, vol. 93, no.1, pp. 57-69, 2005.. https://doi.org/10.1109/JPROC.2004.839618
- V. Vaishampayan, "Design of multiple description scalar quantizers," IEEE Trans. on Information Theory, vol. 39, no. 3, pp. 821-834, 1993. https://doi.org/10.1109/18.256491
- Y. Hsia and J. Liao, "Multiple-description iterative coding image watermarking," Digital Signal Processing, vol. 20, no. 4, pp. 1183-1195, 2010. https://doi.org/10.1016/j.dsp.2009.12.011
- S. Servetto, K. Ramchandran, V. Vaishampayan, and K. Nahrstedt, "Multiple description wavelet based image coding," IEEE Trans. on Image Processing, vol. 9, no. 5, pp. 813-826, 2000. https://doi.org/10.1109/83.841528
- S. Servetto, V. Vaishampayan, and N. Sloane, "Multiple description lattice vector quantization," in Proc. of IEEE Data Compression Conf., Snowbird, UT, pp. 13-22, 1999.
- H. Bai, C. Zhu, Y. Zhao, "Optimized multiple description lattice vector quantization for wavelet image coding," IEEE Trans. on Circuits Systems for Video Technology, vol. 17, no. 7, pp. 912-917, 2007. https://doi.org/10.1109/TCSVT.2007.898646
- S. Diggavi, N. Sloane, and V. Vaishampayan, "Asymmetric multiple description lattice vector quantizers," IEEE Trans. on Information Theory, vol. 48, no. 1, pp. 174-191, 2002. https://doi.org/10.1109/18.971747
- E. Akhtarkavan and M.Salleh, "Multiple descriptions coinciding lattice vector quantizer for wavelet image coding," IEEE Trans. on Image Processing, vol. 21, no. 2, pp. 653-661, 2012. https://doi.org/10.1109/TIP.2011.2164419
- J. Conway and N. Sloane, Sphere Packings, Lattices and Groups, 3rded. New York: Springer-Verlag, pp. 108-117, 1998.
- V. Vaishampayan, N. Sloane, S. Servetto, "Multiple description vector quantization with lattice codebooks: design and analysis," IEEE Trans. on Information Theory, vol. 47, no. 5, pp. 1718-1734, 2001. https://doi.org/10.1109/18.930913
Cited by
- Predictive side decoding for human-centered multiple description image coding vol.2020, pp.1, 2015, https://doi.org/10.1186/s13638-020-01719-z