DOI QR코드

DOI QR Code

Hole Filling Algorithm for a Virtual-viewpoint Image by Using a Modified Exemplar Based In-painting

  • Ko, Min Soo (Dept. of Electronic Engineering, Kwangwoon University) ;
  • Yoo, Jisang (Dept. of Electronic Engineering, Kwang-woon University)
  • Received : 2014.08.17
  • Accepted : 2016.02.03
  • Published : 2016.07.01

Abstract

In this paper, a new algorithm by using 3D warping technique to effectively fill holes that are produced when creating a virtual-viewpoint image is proposed. A hole is defined as the region that cannot be seen in the reference view when a virtual view is created. In the proposed algorithm, to reduce the blurring effect that occurs on the hole region filled by conventional algorithms and to enhance the texture quality of the generated virtual view, Exemplar Based In-painting algorithm is used. The boundary noise which occurs in the initial virtual view obtained by 3D warping is also removed. After 3D warping, we estimate the relative location of the background to the holes and then pixels adjacent to the background are filled in priority to get better result by not using only adjacent object's information. Also, the temporal inconsistency between frames can be reduced by expanding the search region up to the previous frame when searching for most similar patch. The superiority of the proposed algorithm compared to the existing algorithms can be shown through the experimental results.

Keywords

1. Introduction

Through the development of broadcasting technology, 3DTV, UHDTV and other next generation services have been able to provide viewers with unseen realism and the sense of immersion. The majority of the current 3D movies or 3DTV broadcasts are stereoscopic; in other words, they use two images, one being the left one and the other being the right one. In this case, because only the left and right stereo images are used, one viewpoint can be provided. Another drawback to the stereoscopic method is the need for the viewers to wear 3D glasses. To solve these problems, research for multi-view is in progress. Multi-view is formed with at least two or more views which mean that the image can be seen from a different viewpoint based on the location of the viewer. Not only does the multi-view broaden the field of view, it also provides convenience by not requiring the use of 3D glasses. However, with the increase in the number of viewpoints, the number of cameras also increases which hampers smooth synchronization and calibration between the cameras. Additionally storing or transmitting the data is difficult because the amount of data increase proportionally to the number of cameras. Therefore, virtual view synthesis algorithms that use the minimum number of cameras to obtain reference views and then produce needed viewpoints with the obtained reference views are being researched.

Depth based 3D warping algorithm is used as a virtual view generation algorithm. 3D warping uses the depth information and camera parameters of the reference view to calculate the 3D world coordinates of the view. The calculated 3D world coordinates then are re-projected to the image plane with the camera parameters located at the virtual viewpoint. This algorithm allows the generation of intermediate views in between the reference views and also the generation of arbitrary virtual views using the defined virtual viewpoint and it’s camera's parameters [1].

However, in the virtual view generated with 3D warping shows the occlusion region that could not be seen in the reference view. Because this region cannot be referenced, it remains as a common-hole. Therefore, these common-holes occurred as a result of 3D warping need to be filled in order to generate a high quality image. However, in most cases, it is very difficult to fill the common-holes perfectly because there is not enough information given about the common-holes. To fill the common-holes, linear interpolation and in-painting algorithms have been proposed [2].

Linear interpolation on a set of common hole pixels is defined as the concatenation of linear interpolates between each pair of the opposite end pixel of the common-hole region [3]. This method has a fast execution speed but results in filled common-holes of low quality. In-painting is an algorithm originally used to restore damaged regions of the original image [4]. It uses the color information around the damaged region to estimate the value of the regions that needs to be restored as naturally as possible. The VSRS (view synthesis reference software) provided by MPEG uses a pixel based in-painting algorithm [5, 6].

However, if a large region is restored using the pixel based in-painting algorithm, the restored region becomes blurred. To solve this problem, in-painting algorithms using a texture synthesis method have been proposed. Out of those, Criminisi proposed an exemplar based in-painting algorithm that uses a patch unit scale that can improve the texture of image [7]. This algorithm restores the image by prioritizing pixels on the boundary of the hole region and using a patch which include first priority pixel to find the most similar patch within the image. Y-J. Kim [8] used this algorithm directly to fill common-holes in the virtual view while Daribo [9] proposed that the depth value be additionally used in this algorithm. However, most of the existing algorithms have problems because object’s information is used in the hole filling process and in the case of a video, the algorithm is applied independently to each frame so the filled common-hole region in between the frames are not matched which may cause flickering effect.

To solve these problems, a new common-hole filling algorithm using exemplar-based in-painting is proposed in this paper. In the proposed algorithm, the depth image that corresponds with the color image will be used to find the world coordinates with the calculated camera parameters and then they are re-projected into the virtual viewpoint image. In the generated virtual view by re-projection, boundary noise is created because the mismatch of the color and depth image boundaries. Therefore, we should remove this boundary noise first.

After 3D warping and removing the boundary noise, we estimate the relative location of the background to the holes and then pixels adjacent to the background are filled in priority to get better result by not using adjacent object's information. When searching for the region most similar to the defined patch, the previous frame’s original region is set as the search range in addition to the current frame in order to minimize the temporal inconsistency within frames. Many experiments have been performed in order to confirm that the proposed algorithm is more effective than the existing algorithms.

This paper is organized into 5 parts. First in section 2, the fundamental concept of the 3D warping and exemplar based in-painting algorithm will be explained. Section 3 will discuss the proposed common-hole filling algorithm of the virtual viewpoint image using exemplar based in-painting algorithm. Section 4 will include various experimental results to confirm the performance of the proposed algorithm and section 5 will discuss the conclusion.

 

2. 3D Warping & Exemplar Based In-painting

2.1 Virtual view generation by 3D warping

In 3D warping, the world coordinates of a given image by using the intrinsic and extrinsic camera parameters are first determined and then a desired viewpoint image through the re-projection to the 2D space by using camera parameters at the virtual viewpoint is determined. The world coordinate and the camera coordinate systems are both 3D coordinate systems and conversion between these two systems can be achieved through rotational and translational operations as shown in Fig. 1. These two operations are defined as the camera’s extrinsic parameter.

Fig. 1.Transformation of the world coordinate to the camera coordinate

The conversion of the camera coordinate system to the 2D image coordinate system can be explained through the camera’s geometrical structure as shown in Fig. 2. Fig. 2(a) represents the 3D model of the pin-hole camera, and Fig. 2(b) represents its 2D model. Generally, the image from the pin-hole camera passes through the pin-hole in a straight line and is formed as a reverse image in the −f location of the Z-axis, where the f stands for the focal length of the camera. However, the plane where an image is formed is moved to the focal length to the positive direction on the Z-axis then analyzed for convenience.

Fig. 2.Geometrical structure of the pin-hole camera; (a) 3D and (b) 2D

The projection of the object’s 3D coordinates of the camera on the image plane can be explained using similar triangles that are formed using the focal length and the coordinates of the object as shown in Fig. 2(b). This conversion process is defined as the intrinsic parameter.

Using the extrinsic and intrinsic parameters of the camera, the 3D coordinates in the world coordinate system can be converted to the 2D coordinates in the image plane as shown in Eq. (1).

where x, y represents 2D coordinates of the image plane, K represents the intrinsic parameter of the camera, R is the rotational matrix of the camera, T is the translational vector of the camera and X, Y, Z represents the 3D coordinates of the world coordinate system. Also, K[R|T] is defined as the projection matrix.

Through the inverse operation of the matrix in Eq. (1), the 2D coordinates can be converted back to the world coordinate system as shown in Eq. (2). At this time, the disparity information D from Eq. (3) is needed to find the real depth value Z.

where Z(i, j) and D(i, j) are the depth value and the disparity values of the (i, j) coordinate within the image, MinZ and MaxZ represents the minimum and maximum values of Z, respectively.

To produce a virtual viewpoint image, the intrinsic and extrinsic parameters of a camera that exist at a virtual viewpoint location need to be defined. In general, the intrinsic parameter is determined by camera’s inner structure. Thus, the intrinsic parameter of the camera at reference viewpoint can be used as the intrinsic parameter of the virtual viewpoint camera. The value of the extrinsic parameter can be used after converting to the location of the virtual viewpoint camera. The converted 3D world coordinate and the parameter of the virtual viewpoint camera are applied to Eq. (1) to find Eq. (4). Then, Eq. (4) re-projects to the image coordinates of the virtual viewpoint.

where xv, yv represents 2D coordinates of the virtual viewpoint image to be formed, and KV, RV, and TV represent the intrinsic parameter, rotation matrix and the translation vector of the virtual viewpoint camera, respectively [1].

Fig. 3 shows an example of a virtual view produced by the 3D warping algorithm. As shown in Fig. 3(b), an occlusion region can be found which cannot be seen at the reference view. This region is the common-hole region.

Fig. 3.Generation of a virtual view-point image; (a) reference color and depth images and (b) generated virtual viewpoint image transformation of the world coordinate to the camera coordinate

2.2 Exemplar based In-painting

In-painting is an algorithm generally used for restoring old images or images with damaged regions. In this algorithm, the region that needs to be restored is filled with estimates by using the information around the region that needs to be restored because there is no information given about the region. The earliest form of the in-painting algorithm existed in pixel units. However, pixel based in-painting algorithm fills the region by pixels so if the region grows, the quality of the texture of the image decreases and the blurring effect may also take place. Therefore, exemplar based in-painting algorithm using texture synthesis has been proposed. The structure of the exemplar based in-painting algorithm can be seen in Fig. 4.

Fig. 4.Structure of exemplar-based texture synthesis [7]

The original image is divided into two parts: the target region(Ω), its contour δΩ and the source region(Φ). First, the pixels existing at the boundary(δΩ) of the target region needs to be prioritized in the order of restoration. Patches of fixed size having each pixel at the center are defined and restored in the prioritized order. The region most similar to the patch containing the pertinent pixel within the source image is found(Ψp’ and Ψp’’). Then, the most similar patch that is found is filled into the patch region(Ψp) and the above process is repeated until the target region is completely filled.

2.2.1 Importance of restoration priority

Because the exemplar based in-painting algorithm restores high priority patch first, the order of restoration is crucial. Fig. 5 shows different restoration results based on the order of priority.

Fig. 5.Importance of restoration priority; (a) Original image; (b) filling by counterclockwise order and (c) filling by an edge-driven filling order [7]

In Fig. 5(a), the white region in the original image is the target region and the rest is the source region. Fig. 5(b) shows the results after filling by counterclockwise order. Fig. 5(c) shows the results using the edge-driven filling order. These two figures show that the results differ based on the filling order.

2.2.2 Calculation of restoration priority

The order of restoration is defined as the multiple of the confidence term and the data term as shown in Eq. (5).

where p is the location of the center pixel of the patch, C(p) is the confidence term, and D(p) is the data term. P(p) is the final value calculated to determine the order of priority.

Confidence represents how much of the source region exists in a selected patch as in Eq. (6).

where Ψp represents the patch that include p pixel at the center and |Ψp| represents the patch region. q represents the location of a pixel within the patch, I is the whole image and Ω represents the restoration region. If q is in the source region then C(q) is 1, if q is in the restoration region, then C(q) is initialized to 0. C(p) is repeated found for all the pixels within the patch.

The data term represents the value of the inner product between the gradient of the image and the unit vector orthogonal to the boundary at the center of the pixel in the patch that needs to be restored. Fig. 6 shows how to find the data term by using Eq. (7).

Fig. 6.Calculation of data term [7]

where ∇Ip⊥ represents the direction of the gradient from p pixel, np is the unit vector orthogonal to the boundary at point p. α is the normalization parameter which is 255 at a gray-level image.

Fig. 7 shows an example of the calculation of filling priority. In Fig. 7, pixel ‘a’ has a low confidence value because it has low amount of reliable information around the region. However, it has high data term because the gradient and orthogonal directions are similar. The confidence value of pixel ‘b’ is greater than that of pixel ‘a’. Pixel ‘b’ has a smaller data term than pixel ‘a’ because it has the same gradient value but a different vector orthogonal to the boundary than pixel ‘a’. Pixel ‘c’ has the biggest confidence term but has a data term of 0 because the gradient value is 0. Consequently, pixel ‘b’ has the highest filling priority and pixel ‘c’ has the lowest filling priority [7].

Fig. 7.Example of priorities calculation [7]

2.2.3 Finding similar region

Once the patch with the highest filling priority is decided, the most similar patch is found by using the full search algorithm on the source region. Eq. (8) shows the process of finding the patch most similar to that of the patch with the highest filling priority.

where Ψp represents the patch with the highest filling priority, Ψq is the candidate patch having pixel q at the center and ΨQ represents the selected patch. SSE(Ψp,Ψq) is the sum of the squared error of the summation of the difference of the squared values of the corresponding pixels from the two patches. Based on Eq. (8), the patch with the lowest error value is defined as the most similar patch and the pixel information from this patch is copied then renewed as the information of the copied restored region. This process is repeated until the region that needs to be restored is fully filled.

 

3. Proposed Algorithm

Fig. 8 is the block diagram of the proposed algorithm. First, the color and the corresponding depth images are used to produce color and depth images of the desired virtual view by using the 3D warping algorithm. Then, using the mean value of the initially produced virtual viewpoint image, the boundary noise is removed. Next, using the location of the reference viewpoint image and the virtual viewpoint image, the restoration priority between the pixels adjacent to the boundary of the common-hole’s background is calculated. When the patch with the highest priority is found, the patch most similar to the patch with the highest priority is found by searching the source region of the current frame and the previous frame. Next, the patch found after the searching is used to fill the common-hole and the information is renewed. The above process is repeated until all the common-holes are filled.

Fig. 8.Block diagram of the proposed algorithm

3.1 Boundary noise removal

Boundary noise occurs because the object boundary in the reference view does not match with the corresponding object boundary from the depth image. Boundary noise occurs with a similar afterimage of the object as shown in Fig. 9.

Fig. 9.Examples of boundary noise

In the proposed algorithm, we first find the background region around the common-hole by considering the location of the reference and the virtual views. Then we find the average of some number of pixels from the background pixel closest to the hole in the horizontal direction. Next, we compare this average to the value of the pixel closest to the hole and consider it as the boundary noise when the absolute difference is bigger than the threshold. If the first pixel is determined to be a boundary noise, then we compare it to the next pixel to determine whether the boundary noise continuously appears or not. Then, the exposed boundary noise is assigned into the common-hole region and removed as common holes. Fig. 10 shows the result of boundary noise removal image using the proposed algorithm.

Fig. 10.Result of boundary noise removal image using the proposed algorithm: (a) Boundary noise image; (b) result of proposed algorithm

Eq. (9) shows the process of finding the boundary noise pixels.

where b(x, y) represents the candidate pixel for finding boundary noise at the (x, y) coordinate, AvgBackground is the average of pixels from the background, p(xk, yk) is the background pixel closest to the hole in the horizontal direction. th is the threshold value for determining boundary noise pixel.

If exemplar based in-painting algorithm is used when this noise is not removed, the quality of the filled hole falls off because the patch used to fill the hole includes the boundary noise as shown in Fig. 11(b). Fig. 11(c) shows the image after removing the boundary noise and filling the common-hole through the proposed algorithm. When compared to the image that has been filled without removing the boundary noise, the image that has gone through the proposed algorithm shows better quality [10].

Fig. 11.(a) Boundary noise image: (b) hole Filling image with boundary noise, and (c) hole filling after removing the boundary noise with the proposed algorithm

3.2 Common-hole filling algorithm using modified exemplar based in-painting

As explained earlier, the quality of the image generated may greatly differ by the restoration order when using the exemplar based in-painting algorithm. In existing methods, any pixel in the boundary region that needs to be restored is used as candidate for the priority order of the restoration. However in this case, if the priority order of the pixel adjacent to the object in the boundary is seen as the highest priority, there could be error in finding the most similar patch in the object region which could affect the filling process and consequently decrease the quality of the generated virtual view. Therefore, in the proposed algorithm, only pixels from the boundary region adjacent to the background are used to find the restoration priority. Fig. 12 shows candidate pixels for the priority order of the restoration using the proposed algorithm.

Fig. 12.Candidate pixels for the priority order of the restoration Common-hole image: (b) candidate pixels by existing algorithm, and (c) by the proposed algorithm

If this method is used, we start filling the hole from the region close to background which could prevent the error of the filling the common hole region with the object region.

Fig. 13(b) and (c) show the results using the existing and the proposed algorithm of prioritizing the restoration order, respectively. It can be seen that the proposed algorithm generates an image of higher quality than that of the image generated with the existing algorithm.

Fig. 13.(a) Uncovered hole: (b) hole filling by old algorithm and (c) by the proposed algorithm

3.3 Consideration of temporal consistency

Most of existing algorithms use single image information to fill the hole. Therefore, the filled region may differ in successive frames. To remove this temporal inconsistency, if the hole is found in the same area in successive frames, the filled holes have to be consistent across the frames. In the proposed algorithm, the source region of the previous frame is included in the search range when searching for the most similar patch.

Eq. (10) shows the process of finding the most similar patch.

where ΨQ(t-1) represents the patch in the same location in synthesized previous frames. If a common-hole is found in the same location in consecutive frames, the identical region in the previous frame is most likely to be the most similar patch thus the region from the previous frame is filled first in order to maintain temporal consistency. Fig. 14 shows the comparison of the existing algorithm and the proposed algorithm. It shows that the proposed algorithm has better temporal consistency.

Fig. 14.Proposed temporal inconsistency compensation algorithm; (a) Old algorithm (0th frame); (b) the proposed algorithm (0th frame); (c) old algorithm (1st frame) and (d) the proposed algorithm (1st frame)

 

4. Experimental Results

To test the efficiency of the proposed algorithm, multi-view sequences provided by MPEG have been used as test sequences. Pixel based in-painting algorithm included in VSRS 3.5[12], spiral weighted average algorithm included in VSRS 3.5 alpha [11] and bilateral filtering have been used [6] for performance comparison.

4.1 Extrapolation case

As shown in Fig. 15, we could generate a virtual viewpoint image which is positioned at the outer side of the reference views (extrapolation view synthesis case) by the proposed algorithm.

Fig. 15.Extrapolation view synthesis case

First, note that all the data for the test sequence in the table from now on are the average values for all the temporal frames of a sequence, as shown in Table 1.

Table 1.Spec. of the test sequences.

Fig. 16 shows the resulting virtual views using the respective algorithms to fill the common-holes. Fig. 17 is the enlarged result of Fig. 16.

Fig. 16.Generated virtual view (“Newspaper” 7th view, 0th frame): (a) 3D warping result; (b) pixel based In-painting algorithm; (c) VSRS 1; (d)VSRS 2, and (e) the proposed algorithm

Fig. 17.Detail image of Fig. 16 (“Newspaper” 7th view, 0th frame); (a) 3D warping result; (b) pixel based In-painting algorithm; (c) VSRS 1; (d) VSRS 2, and (e) the proposed algorithm

As shown in Fig. 16 and Fig. 17, it can be seen that the proposed algorithm reduced the blurring effect and preserved the texture of the image more effectively than the existing algorithms.

Objective quality evaluation has been performed as well. The PSNR value was calculated by comparing the original (N+1)th view and the virtual (N+1)th view generated by using the reference viewpoint image with the Nth viewpoint.

As shown in Table 2, the results using the proposed algorithm generally show the improvement in the PSNR value when compared to that of the results by using the old algorithms. The reason why the proposed algorithm had better results is because of the fact that the background information has been used as much as possible to fill the holes, which improved the texture of the image.

Table 2.Performance comparison for extrapolaion

4.2 Interpolation case

Fig. 18 shows an interpolation view synthesis case. We generated a virtual viewpoint image which is positioned at the inner side of the reference views (interpolation view synthesis case) by the proposed algorithm. All the data for the test sequence in the table are also the average values for all the temporal frames of a sequence, as shown in Table 3.

Fig.18.Interpolation view synthesis case

Table 3.Spec. of the test sequences

In Table 4, we also compared PSNR performance of the proposed algorithm for the interpolation case.

Table 4.Performance comparison for interpolation

The proposed algorithms performed better than existing algorithms for the interpolation case, too. And Table 4 shows that the proposed algorithm gave better results in general.

 

5. Conclusion

In this paper, an algorithm to effectively fill holes that are produced when creating a virtual-viewpoint image has been proposed. In the proposed algorithm, after 3D warping, the boundary noise from the generated initial virtual view is removed and then we estimate the relative location of the background to the holes and pixels adjacent to the background are filled in priority to get better result by not using adjacent object's information. Also, the inconsistency in between two consecutive frames can be reduced by expanding the search region up to the previous frame when searching for the most similar patch. The superiority of the proposed algorithm compared to the existed algorithms has been shown through the experiments performed.

References

  1. Y. Mori, N. Fukushima, T. Yendoa, T. Fujii and M. Tanimotoa, “View generation with 3D warping using depth information for FTV”, ELSEVIER, Signal Processing: Image Communication, vol. 24, issue. 1-2, pp. 65-72, Jan. 2009. https://doi.org/10.1016/j.image.2008.10.013
  2. Z. Tauber, Ze-Nian Li, and M.S. Drew, “Review and preview : disocclusion by inpainting for image-based rendering,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 37, no. 4, pp. 527-540, Jul. 2007. https://doi.org/10.1109/TSMCC.2006.886967
  3. S.-H. Park, H. Song, E.-Y. Jang, N. Hur, J. Kim, J.- S. Kim, S.-H. Lee, and J. Yoo, “Compensation method for occluded-region of arbitrary-view image synthesized from multi-view video”, The Journal of Korea Information and Communications Society, vol. 33, no. 12, pp. 1029-1038, Dec. 2008.
  4. M. Bertalmio, G. Sapiro, V. Caselles and C. Ballester, “Image inpainting”, Proc. of 27th Conference Computer Graphics and Interactive Algorithms (ACM SIGGRAPH 2000), pp. 417-424, July 2000.
  5. ISO/IEC JTC1/SC29/WG11 "Introduction to 3D video”, M9784, May 2008.
  6. A. Telea, “An Image In-painting Algorithm Based on the Fast Marching Method”, Journal of Graphics Tools, vol. 9, no. 1, pp. 25-36, 2004.
  7. A. Criminisi, P. Perez and K. Toyama, “Region filling and Object Removal by Exemplar-Based Image Inpainting”, IEEE Trans. on Image Processing, vol. 13, no. 9, pp. 1200-1212, Sept. 2004. https://doi.org/10.1109/TIP.2004.833105
  8. Y.-J. Kim, S. H. Lee and J.-I. Park, "A High-quality Occlusion Filling Method Using Image Inpainting", Journal of Broadcast Engineering, vol. 15, no. 1, pp. 3-13, Jan. 2010. https://doi.org/10.5909/JBE.2010.15.1.003
  9. I. Daribo and H. Saito, “A Novel Inpainting-Based Layered Depth Video for 3DTV”, IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 533-541, June 2011. https://doi.org/10.1109/TBC.2011.2125110
  10. M. S. Ko and J. Yoo, “Virtual View Generation by a New Hole Filling Algorithm”, The Journal of Electrical Engineering & Technology, vol. 9, no. 3, pp. 1023-1033, March 2014. https://doi.org/10.5370/JEET.2014.9.3.1023
  11. ISO/IEC JTC1/SC29/WG11 “Boundary noise removal and hole filling for VSRS 3.5 alpha”, M19992, March 2011.
  12. ISO / IEC JTC1 / SC29 / WG11 "Implementation of Hole Filling Methods for VSRS 3.5. alpha", M20005, March 2011.