[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2017.10.015

SuperDepthTransfer: Depth Extraction from Image Using Instance-Based Learning with Superpixels

Zhu, Yuesheng (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University)
Jiang, Yifeng (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University)
Huang, Zhuandi (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University)
Luo, Guibo (Shenzhen Key Lab of Information Theory & Future Network Arch, Communication & Information Security Lab, Institute of Big Data Technologies Shenzhen Graduate School, Peking University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.11, no.10, 2017 , pp. 4968-4986 More about this Journal

Abstract

In this paper, we primarily address the difficulty of automatic generation of a plausible depth map from a single image in an unstructured environment. The aim is to extrapolate a depth map with a more correct, rich, and distinct depth order, which is both quantitatively accurate as well as visually pleasing. Our technique, which is fundamentally based on a preexisting DepthTransfer algorithm, transfers depth information at the level of superpixels. This occurs within a framework that replaces a pixel basis with one of instance-based learning. A vital superpixels feature enhancing matching precision is posterior incorporation of predictive semantic labels into the depth extraction procedure. Finally, a modified Cross Bilateral Filter is leveraged to augment the final depth field. For training and evaluation, experiments were conducted using the Make3D Range Image Dataset and vividly demonstrate that this depth estimation method outperforms state-of-the-art methods for the correlation coefficient metric, mean log10 error and root mean squared error, and achieves comparable performance for the average relative error metric in both efficacy and computational efficiency. This approach can be utilized to automatically convert 2D images into stereo for 3D visualization, producing anaglyph images that are visually superior in realism and simultaneously more immersive.

Keywords

Depth estimation; instance-based learning; superpixels; semantic label; 2D-to-3D conversion;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Felzenszwalb, P.F., Huttenlocher, D.P., "Efficient graph-based image segmentation," International Journal of Computer Vision, vol. 59, pp. 167-181, 2004. DOI
2	Malisiewicz, T., Efros, A.A., "Recognition by association via learning per-exemplar distances," in Proc. of Computer Vision and Pattern Recognition, IEEE Conference on, pp. 1-8, 2008.
3	Tighe, J., Lazebnik, S., "Superparsing: scalable nonparametric image parsing with superpixels," in Proc. of Computer Vision-ECCV, Springer, pp. 352-365, 2010.
4	Durand, F., Dorsey, J., "Fast bilateral filtering for the display of high-dynamic-range images," in Proc. of ACM transactions on graphics, vol. 21, pp. 257-266, 2002.
5	Angot, L.J., Huang, W.J., Liu, K.C., "A 2d to 3d video and image conversion technique based on a bilateral filter," in Proc. of IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 75260D-75260, 2010.
6	Holliman, N.S., Dodgson, N.A., Favalora, G.E., Pockett, L., "Three-dimensional displays: a review and applications analysis," Broadcasting, IEEE Transactions on, vol. 57, pp. 362-371, 2011. DOI
7	Karsch, K., Liu, C., Kang, S.B., "Depth extraction from video using non-parametric sampling," in Proc. of Computer Vision-ECCV 2012, pp. 775-788, 2012.
8	Karsch, K., Liu, C., Kang, S.B., "Depth transfer: Depth extraction from video using non-parametric sampling," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 36, pp. 2144-2158, 2014. DOI
9	Saxena, A., Sun, M., Ng, A.Y., "Learning 3-d scene structure from a single still image," in Proc. of Computer Vision, IEEE 11th International Conference on, pp. 1-8, 2007.
10	Zhang, L., Tam, W.J., "Stereoscopic image generation based on depth images for 3d tv," Broadcasting, IEEE Transactions on, vol. 51, pp. 191-19, 2005. DOI
11	Chen, J.C., Huang, M., "2D-to-3D conversion system using depth map enhancement," KSII Transactions on Internet & Information Systems, vol. 10, 2016.
12	Yang, H., Zhang, H., "Efficient 3d room shape recovery from a single panorama," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5422-5430, 2016.
13	Song, Y., Tang, J., Liu, F., Yan, S., "Body surface context: A new robust feature foraction recognition from depth videos," IEEE Transactions on Circuits & Systems for Video Technology, vol. 24, 2014.
14	Huang, C., Liu, Q., Yu, S., "Regions of interest extraction from color image based on visual saliency," The Journal of Supercomputing, vol. 58, pp. 20-33, 2011. DOI
15	Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M., "Shape-from-shading: a survey," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, pp. 690-706, 1999. DOI
16	Forsyth, David A., and J. Ponce, "Computer Vision: A Modern Approach, 2/E," Prentice Hall Professional Technical Reference, 2002.
17	Subbarao, M., Surya, G., "Depth from defocus: a spatial domain approach," International Journal of Computer Vision, vol. 13, pp. 271-294, 1994. DOI
18	Huang, X., Wang, L., Huang, J., Li, D., Zhang, M., "A depth extraction method based on motion and geometry for 2d to 3d conversion," in Proc. of 2009 Third International Symposium on Intelligent Information Technology Application, pp. 294-298, 2009.
19	Hoiem, D., Efros, A.A., Hebert, M., "Geometric context from a single image," in Proc. of Computer Vision, Tenth IEEE International Conference on, vol. 1, pp. 654-661, 2005.
20	Delage, E., Lee, H., Ng, A.Y., "A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image," in Proc. of Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 2, pp. 2418-2428, 2006.
21	Saxena, A., Sun, M., Ng, A.Y., "Make3d: Learning 3d scene structure from a single still image," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, pp. 824-840, 2009. DOI
22	Saxena, A., Chung, S.H., Ng, A.Y., "Learning depth from single monocular images," in Proc. of Advances in Neural Information Processing Systems, pp. 1161-1168, 2005.
23	Liao, M., Gao, J., Yang, R., Gong, M., "Video stereolization: Combining motion analysis with user interaction," Visualization and Computer Graphics, IEEE Transactions on, vol. 18, pp. 1079-1088, 2012. DOI
24	Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., Savarese, S., "ObjectNet3D: A Large Scale Database for 3D Object Recognition," Springer International Publishing, 2016.
25	Li, Z., Liu, J., Tang, J., Lu, H., "Robust structured subspace learning for data representation," IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 37, 2015.
26	Urtasun, R., Lenz, P., Geiger, A., "Are we ready for autonomous driving? the kitti vision benchmark suite," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 3354-3361, 2012.
27	Guttmann, M., Wolf, L., Cohen-Or, D., "Semi-automatic stereo extraction from video footage," in Proc. of Computer Vision, 2009 IEEE 12th International Conference on, pp. 136-142, 2009.
28	Herrera, J.L., Konrad, J., del Bianco, C.R., Garcia, N., "Learning-based depth estimation from 2d images using gist and saliency," in Proc. of Image Processing, IEEE International Conference on, pp. 4753-4757, 2015.
29	Liu, B., Gould, S., Koller, D., "Single image depth estimation from predicted semantic labels," in Proc. of Computer Vision and Pattern Recognition, IEEE Conference on, pp. 1253-1260, 2010.
30	Konrad, J., Wang, M., Ishwar, P., Wu, C., Mukherjee, D., "Learning-based, automatic 2d-to-3d image and video conversion," Image Processing, IEEE Transactions on, vol. 22, pp. 3485-3496, 2013. DOI
31	Konrad, J., Brown, G., Wang, M., Ishwar, P., Wu, C., Mukherjee, D., "Automatic 2d-to-3d image conversion using 3d examples from the internet," in Proc. of IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 82880F-82880F, 2012.
32	Konrad, J., Wang, M., Ishwar, P., "2d-to-3d image conversion by learning depth from examples," in Proc. of Computer Vision and Pattern Recognition Workshops, IEEE Computer Society Conference on, pp. 16-22, 2012.
33	Liu, M., Salzmann, M., He, X., "Discrete-continuous depth estimation from a single image," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 716-723, 2014.
34	Eigen, D., Puhrsch, C., Fergus, R., "Depth map prediction from a single image using a multi-scale deep network," in Proc. of Advances in neural information processing systems, pp. 2366-2374, 2014.
35	Wang, M., Konrad, J., Ishwar, P., Jing, K., Rowley, H., "Image saliency: Fromintrinsic to extrinsic context," in Proc. of Computer Vision and Pattern Recognition, IEEE Conference on, pp. 417-424, 2011.
36	Liu, F., Shen, C., Lin, G., "Deep convolutional neural fields for depth estimation from a single image," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5162-5170, 2015.
37	Baig, Mohammad Haris, and L. Torresani, "Coupled depth learning," Applications of Computer Vision, pp. 1-10, 2016.
38	Su, C.C., Cormack, L.K., Bovik, A.C., "Depth estimation from monocular color images using natural scene statistics models," in Proc. of IVMSP Workshop, pp. 1-4, 2013.
39	Wang, X., Hou, C., Pu, L., Hou, Y., "A depth estimating method from a single image using foe crf," Multimedia Tools and Applications, vol. 74, pp. 9491-9506, 2015. DOI
40	Liu, C., Yuen, J., Torralba, A., "Nonparametric scene parsing via label transfer," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, pp. 2368-2382, 2011. DOI
41	Oliva, A., Torralba, A., "Modeling the shape of the scene: A holistic representation of the spatial envelope," International journal of computer vision, vol. 42, pp. 145-175, 2001. DOI
42	Ren, X., Malik, J., "Learning a classification model for segmentation," in Proc. of Computer Vision, Proceedings, Ninth IEEE International Conference on, pp. 10-17, 2003.