Browse > Article

High-Quality Depth Map Generation of Humans in Monocular Videos  

Lee, Jungjin (KAIST)
Lee, Sangwoo (KAIST)
Park, Jongjin (Spheregram Co.)
Noh, Junyong (KAIST)
Abstract
The quality of 2D-to-3D conversion depends on the accuracy of the assigned depth to scene objects. Manual depth painting for given objects is labor intensive as each frame is painted. Specifically, a human is one of the most challenging objects for a high-quality conversion, as a human body is an articulated figure and has many degrees of freedom (DOF). In addition, various styles of clothes, accessories, and hair create a very complex silhouette around the 2D human object. We propose an efficient method to estimate visually pleasing depths of a human at every frame in a monocular video. First, a 3D template model is matched to a person in a monocular video with a small number of specified user correspondences. Our pose estimation with sequential joint angular constraints reproduces a various range of human motions (i.e., spine bending) by allowing the utilization of a fully skinned 3D model with a large number of joints and DOFs. The initial depth of the 2D object in the video is assigned from the matched results, and then propagated toward areas where the depth is missing to produce a complete depth map. For the effective handling of the complex silhouettes and appearances, we introduce a partial depth propagation method based on color segmentation to ensure the detail of the results. We compared the result and depth maps painted by experienced artists. The comparison shows that our method produces viable depth maps of humans in monocular videos efficiently.
Keywords
Depth Estimation; Pose Estimation; Depth Propagation; 2D-to-3D Conversion;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L.-M. Po, X. Xu, Y. Zhu, S. Zhang, K.-W. Cheung, and C.- W. Ting, "Automatic 2d-to-3d video conversion technique based on depth-from-motion and color segmentation," in Signal Processing (ICSP), 2010 IEEE 10th International Conference on, Oct 2010, pp. 1000-1003.
2 A. McKenzie, E. Vendrovsky, and J. Noh, "Terrain geometry from monocular image sequences," Journal of Computing Science and Engineering, vol. 2, no. 1, pp. 98-108, 2008.   DOI   ScienceOn
3 H. Hwang, K. Kim, J. Noh, et al., "Stereoscopic image generation of background terrain scenes," Computer Animation and Virtual Worlds, 2011.
4 B. Ward, S. Kang, and E. Bennett, "Depth director: A system for adding depth to movies," Computer Graphics and Applications, IEEE, vol. 31, no. 1, pp. 36-48, 2011.
5 D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rodgers, and J. Davis, "Scape: shape completion and animation of people," in ACM Transactions on Graphics (TOG), vol. 24, no. 3. ACM, 2005, pp. 408-416.   DOI   ScienceOn
6 A. Jain, T. Thormahlen, H. Seidel, and C. Theobalt, "Moviereshape: Tracking and reshaping of humans in videos," in ACM Transactions on Graphics (TOG), vol. 29, no. 6. ACM, 2010, p. 148.
7 G. Loy, M. Eriksson, J. Sullivan, and S. Carlsson, "Monocular 3d reconstruction of human motion in long action sequences," Computer Vision-ECCV 2004, pp. 442-455, 2004.
8 S. Zhou, H. Fu, L. Liu, D. Cohen-Or, and X. Han, "Parametric reshaping of human bodies in images," ACM Transactions on Graphics (TOG), vol. 29, no. 4, p. 126, 2010.
9 D. DiFranco, T. Cham, and J. Rehg, "Reconstruction of 3d figure motion from 2d correspondences," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I-307.
10 A. Agarwal and B. Triggs, "Recovering 3d human pose from monocular images," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, no. 1, pp. 44-58, 2006.   DOI   ScienceOn
11 X. Wei and J. Chai, "Videomocap: Modeling physically realistic human motion from monocular video sequences," ACM Transactions on Graphics (TOG), vol. 29, no. 4, p. 42, 2010.
12 M. Lourakis, "levmar: Levenberg-marquardt nonlinear least squares algorithms in C/C++," [web page] http://www.ics.forth.gr/˜lourakis/levmar/, Jul. 2004, [Accessed on 31 Jan. 2005.].
13 D. Comaniciu and P. Meer, "Mean shift: A robust approach toward feature space analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 5, pp. 603- 619, 2002.   DOI   ScienceOn
14 K. Hormann and M. Floater, "Mean value coordinates for arbitrary planar polygons," ACM Transactions on Graphics (TOG), vol. 25, no. 4, pp. 1424-1441, 2006.   DOI   ScienceOn
15 R. B. i. Ribera, S. Choi, Y. Kim, J. Lee, and J. Noh, "Video panorama for 2d to 3d conversion," Computer Graphics Forum, vol. 31, no. 7pt2, pp. 2213-2222, 2012. [Online]. Available: http://dx.doi.org/10.1111/j.1467- 8659.2012.03214.x   DOI   ScienceOn
16 M. Guttmann, L. Wolf, and D. Cohen-Or, "Semi-automatic stereo extraction from video footage," in Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009, pp. 136-142.
17 O. Wang, M. Lang, M. Frei, A. Hornung, A. Smolic, and M. Gross, "Stereobrush: interactive 2d to 3d conversion using discontinuous warps," in International Symposium on Sketch- Based Interfaces and Modeling (SBIM 2011), 2011.
18 X. Yan, Y. Yang, G. Er, and Q. Dai, "Depth map generation for 2d-to-3d conversion by limited user inputs and depth propagation," in 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2011. IEEE, 2011, pp. 1-4.