[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.11003/JPNT.2021.10.4.297

Deep Learning Based Monocular Depth Estimation: Survey

Lee, Chungkeun (Department of Aerospace Engineering, Seoul National University)
Shim, Dongseok (Department of Aerospace Engineering, Seoul National University)
Kim, H. Jin (Department of Aerospace Engineering, Seoul National University)

Publication Information

Journal of Positioning, Navigation, and Timing / v.10, no.4, 2021 , pp. 297-305 More about this Journal

Abstract

Monocular depth estimation helps the robot to understand the surrounding environments in 3D. Especially, deep-learning-based monocular depth estimation has been widely researched, because it may overcome the scale ambiguity problem, which is a main issue in classical methods. Those learning based methods can be mainly divided into three parts: supervised learning, unsupervised learning, and semi-supervised learning. Supervised learning trains the network from dense ground-truth depth information, unsupervised one trains it from images sequences and semi-supervised one trains it from stereo images and sparse ground-truth depth. We describe the basics of each method, and then explain the recent research efforts to enhance the depth estimation performance.

Keywords

deep learning; depth estimation;

Citations & Related Records

Reference

1	Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., & Navab, N. 2016, Deeper Depth Prediction with Fully Convolutional Residual Networks, in Fourth International Conference on 3D Vision, Stanford, CA, 25-28 Oct 2016, pp.239-248. https://doi.org/10.1109/3DV.2016.32 DOI
2	Almalioglu, Y., Saputra, M. R. U., de Gusmao, P. P. B., Markham, A., & Trigoni, N. 2019, Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks, in International Conference on Robotics and Automation, Montreal, QC, May 2019, pp. 5474-5480. https://doi.org/10.1109/ICRA.2019.8793512 DOI
3	Forster, C., Pizzoli, M., & Scaramuzza, D. 2014, SVO: Fast semi-direct monocular visual odometry, in IEEE international conference on robotics and automation, Hong Kong, China, Jun 2014, pp.15-22. https://doi.org/10.1109/ICRA.2014.6906584 DOI
4	Geiger, A., Lenz, P., & Urtasun, R. 2012, Are we ready for autonomous driving? the kitti vision benchmark suite, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Providence, RI, Jun 2012, pp.3354-3361. https://doi.org/10.1109/CVPR.2012.6248074 DOI
5	He, K., Zhang, X., Ren, S., & Sun, J. 2016, Deep Residual Learning for Image Recognition, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 27-30 Jun 2016, pp.770-778. https://doi.org/10.1109/CVPR.2016.90 DOI
6	Lee, J. & Kim, C. 2019, Monocular Depth Estimation Using Relative Depth Maps, in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 15-20 Jun 2019, pp.9729-9738. https://doi.org/10.1109/CVPR.2019.00996 DOI
7	Silberman, N., Hoiem, D., Kohli, P., & Fergus, R., 2012, Indoor Segmentation and Support Inference from RGBD Images, in European Conference on Computer Vision, Firenze, Italy, Oct 2012, pp.746-760. https://doi.org/10.1007/978-3-642-33715-4_54 DOI
8	Yin, Z. & Shi, J. 2018, GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose, in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 18-23 Jun 2018, pp.1983-1992. https://doi.org/10.1109/CVPR.2018.00212 DOI
9	Zhou, T., Brown, M., Snavely, N., & Lowe, D. G. 2017, Unsupervised Learning of Depth and Ego-Motion from Video, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 21-26 July 2017, pp.1851-1858. https://doi.org/10.1109/CVPR.2017.700 DOI
10	Wofk, D., Ma, F., Yang, T. J., Karaman, S., & Sze, V. 2019, FastDepth: Fast Monocular Depth Estimation on Embedded Systems, in International Conference on Robotics and Automation, Montreal, QC, 20-24 May 2019, pp.6101-6108. https://doi.org/10.1109/ICRA.2019.8794182 DOI
11	Kundu, J. N., Uppala, P. K., Pahuja, A., & Babu, R. V. 2018, AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation, in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 18-23 Jun 2018, pp.2656-2665. https://doi.org/10.1109/CVPR.2018.00281 DOI
12	Godard, C., Aodha, O. M., & Brostow, G. J. 2017, Unsupervised Monocular Depth Estimation with Left-Right Consistency, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 2017, pp.270-279. https://doi.org/10.1109/CVPR.2017.699 DOI
13	Fu, H., Gong, M., Wang, C., Batmanghelich, K., & Tao, D. 2018, Deep Ordinal Regression Network for Monocular Depth Estimation, in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, Jun 2018, pp.2002-2011. https://doi.org/10.1109/CVPR.2018.00214 DOI
14	Garg, R., Bg, V. K., Carneiro, G., & Reid, I. 2016, Unsupervised Cnn for Single View Depth Estimation: Geometry to the Rescue, in European conference on computer vision, Amsterdam, the Netherlands, Oct 2016, pp.740-756. https://doi.org/10.1007/978-3-319-46484-8_45 DOI
15	Godard, C., Aodha, O. M., Firman, M., & Brostow, G. J. 2019, Digging Into Self-Supervised Monocular Depth Estimation, in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, Oct 2019, pp.3828-3838. https://doi.org/10.1109/ICCV.2019.00393 DOI
16	Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., et al. 2017, Mobilenets: Efficient convolutional neural networks for mobile vision applications, Jun 9, Retrieved from https://arxiv.org/abs/1704.04861
17	Jaderberg, M., Simonyan, K., Zisserman, A., & Kavukcuoglu, K. 2015, in Advances in Neural Information Processing Systems, Montreal, CA, Dec 2015, pp.2017-2025. https://dl.acm.org/doi/abs/10.5555/2969442.2969465
18	Aleotti, F., Tosi, F., Poggi, M., & Mattoccia, S. 2018, Generative adversarial networks for unsupervised monocular depth prediction, in Proceedings of the European Conference on Computer Vision Workshops, Munich, Germany, Sep 2018, pp.337-354. https://doi.org/10.1007/978-3-030-11009-3_20 DOI
19	Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., et al. 2016, The cityscapes dataset for semantic urban scene understanding, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, Jun 2016, pp.3213-3223. https://doi.org/10.1109/CVPR.2016.350 DOI
20	Eigen, D., Puhrsch, C., & Fergus, R. 2014, Depth map prediction from a single image using a multi-scale deep network, in Advances in Neural Information Processing Systems, Cambridge, MA, Dec 2014, pp.2366-2374. https://dl.acm.org/doi/10.5555/2969033.2969091 DOI
21	Zheng, C., Cham, T. J., & Cai, J. 2018, T2net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks, in Proceedings of the European Conference on Computer Vision, Munich, Germany, 8-14 Sep 2018, pp.798-814. https://doi.org/10.1007/978-3-030-01234-2_47 DOI
22	Kuznietsov, Y., Stuckler, J., & Leibe, B. 2017, Semi-Supervised Deep Learning for Monocular Depth Map Prediction, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, 21-26 July 2017, pp.6647-6655. https://doi.org/10.1109/CVPR.2017.238 DOI
23	Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. 2015, ORBSLAM: a Versatile and Accurate Monocular SLAM system, IEEE transactions on robotics, 31, 1147-1163. https://doi.org/10.1109/TRO.2015.2463671 DOI
24	Ranftl, R., Vineet, V., Chen, Q., & Koltun, V. 2016, Dense Monocular Depth Estimation in Complex Dynamic Scenes, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 27-30 Jun 2016, pp.4058-4066. https://doi.org/10.1109/CVPR.2016.440 DOI
25	Wang, R., Pizer, S. M., & Frahm, J. 2019, Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, Jun 2019, pp.5555-5564. https://doi.org/10.1109/CVPR.2019.00570 DOI
26	Zhao, S., Fu, H., Gong, M., & Tao, D. 2019, Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, 15-20 Jun 2019, pp.9788-9798. https://doi.org/10.1109/CVPR.2019.01002 DOI
27	Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. 2017, Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks, in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22-29 Oct 2017, pp.2223-2232. https://doi.org/10.1109/ICCV.2017.244 DOI