Browse > Article
http://dx.doi.org/10.15701/kcgs.2017.23.3.31

Estimation of Manhattan Coordinate System using Convolutional Neural Network  

Lee, Jinwoo (Visual Computing Lab., Kookmin University)
Lee, Hyunjoon (Intel Korea)
Kim, Junho (Visual Computing Lab., Kookmin University)
Abstract
In this paper, we propose a system which estimates Manhattan coordinate systems for urban scene images using a convolutional neural network (CNN). Estimating the Manhattan coordinate system from an image under the Manhattan world assumption is the basis for solving computer graphics and vision problems such as image adjustment and 3D scene reconstruction. We construct a CNN that estimates Manhattan coordinate systems based on GoogLeNet [1]. To train the CNN, we collect about 155,000 images under the Manhattan world assumption by using the Google Street View APIs and calculate Manhattan coordinate systems using existing calibration methods to generate dataset. In contrast to PoseNet [2] that trains per-scene CNNs, our method learns from images under the Manhattan world assumption and thus estimates Manhattan coordinate systems for new images that have not been learned. Experimental results show that our method estimates Manhattan coordinate systems with the median error of $3.157^{\circ}$ for the Google Street View images of non-trained scenes, as test set. In addition, compared to an existing calibration method [3], the proposed method shows lower intermediate errors for the test set.
Keywords
camera calibration; Manhattan coordinate; deep learning; convolutional neural network;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. Wu, "Towards linear-time incremental structure from motion," in Proceedings of the International Conference on 3D Vision, 2013, pp. 127-134.
2 "Google Street View Image API." [Online]. Available: https://developers.google.com/maps/documentation/streetview/
3 M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, "TensorFlow: Large-scale machine learning on heterogeneous systems." [Online]. Available: http://tensorflow.org/
4 D. P. Kingma and J. L. Ba, "Adam: A method for stochastic optimization," in Proceedings ot The International Conference on Learning Representations, 2015, pp. 2938-2946.
5 E. Tretyak, O. Barinova, P. Kohli, and V. Lempitsky, "Geometric image parsing in man-made environments," International Journal of Computer Vision, vol. 97, no. 3, pp. 305-321, 2011.   DOI
6 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.
7 A. Kendall, M. Grimes, and R. Cipolla, "PoseNet: A convolutional network for real-time 6-dof camera relocalization," in Proceedings of the International Conference on Computer Vision, 2015, pp. 2938-2946.
8 H. Lee, E. Shechtman, J. Wang, and S. Lee, "Automatic upright adjustment of photographs with robust camera calibration," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 5, pp. 833-844, 2014.   DOI
9 J. M. Coughlan and A. L. Yuille, "Manhattan World: Compass direction from a single image by bayesian inference," in Proceedings of the International Conference on Computer Vision, 1999, p. 941-947.
10 K. H. Jang and S. K. Jung, "Practical modeling technique for large-scale 3d building models from ground images," Pattern Recognition Letters, vol. 30, no. 10, pp. 861-869, 2009.   DOI
11 P. Denis, J. H. Elder, and F. J. Estrada, "Efficient edge-based methods for estimating manhattan frames in urban imagery," in Proceedings of European Conference on Computer Vision, 2008, p. 197-210.
12 B. Li, K. Peng, X. Ying, and H. Zha, "Simultaneous vanishing point detection and camera calibration from single images," in Proceedings og the International Symposium on Visual Computing, 2010, pp. 151-160.
13 J. R. Movellan, "Tutorial on Gabor filters," Univ. of California, San Diego, Tech. Rep., 2005.
14 M. Zhai, S. Workman, and N. Jacobs, "Detecting vanishing points using global image context in a non-Manhattan world," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5657-5665.