Browse > Article
http://dx.doi.org/10.4218/etrij.2021-0303

Camera pose estimation framework for array-structured images  

Shin, Min-Jung (Department of Electronic Engineering, Sogang University)
Park, Woojune (Department of Electronic Engineering, Sogang University)
Kim, Jung Hee (Department of Electronic Engineering, Sogang University)
Kim, Joonsoo (Immersive Media Research Section, Electronics and Telecommunications Research Institute)
Yun, Kuk-Jin (Immersive Media Research Section, Electronics and Telecommunications Research Institute)
Kang, Suk-Ju (Department of Electronic Engineering, Sogang University)
Publication Information
ETRI Journal / v.44, no.1, 2022 , pp. 10-23 More about this Journal
Abstract
Despite the significant progress in camera pose estimation and structure-from-motion reconstruction from unstructured images, methods that exploit a priori information on camera arrangements have been overlooked. Conventional state-of-the-art methods do not exploit the geometric structure to recover accurate camera poses from a set of patch images in an array for mosaic-based imaging that creates a wide field-of-view image by sewing together a collection of regular images. We propose a camera pose estimation framework that exploits the array-structured image settings in each incremental reconstruction step. It consists of the two-way registration, the 3D point outlier elimination and the bundle adjustment with a constraint term for consistent rotation vectors to reduce reprojection errors during optimization. We demonstrate that by using individual images' connected structures at different camera pose estimation steps, we can estimate camera poses more accurately from all structured mosaic-based image sets, including omnidirectional scenes.
Keywords
camera pose estimation; mosaic-based image; omnidirectional image; structure from motion;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Kar, C. Hane, and J. Malik, Learning a multi-view stereo machine, in Proc. Conf. Neural Inform. Process. Syst. (Long Beach CA, USA), 2017.
2 M. Lhuillier and L. Quan, A quasi-dense approach to surface reconstruction from uncalibrated images, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005), 418-433.   DOI
3 X. Gu et al., Cascade cost volume for high-resolution multi-view stereo and stereo matching, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2019. https://doi.org/10.1109/CVPR42600.2020.00257   DOI
4 Y. Yao et al., Recurrent mvsnet for high-resolution multi-view stereo depth inference, in Proc. IEEE/ CVF Conf. Comput. Vision Pattern Recogn. (Long Beach, CA, USA), June 2019. https//doi.org/10.1109/CVPR.2019.00567   DOI
5 H. Aanaes, R. R. Jensen, G. Vogiatzis, E. Tola, and AB Dahl, Large-scale data for multiple-view stereopsis, Int. J. Comput. Vis. 120 (2016), 153-168.   DOI
6 A. Knapitsch, J. Park, Q. Y. Zhou, and V. Koltun, Tanks and temples: benchmarking large-scale scene reconstruction, ACM Trans. Graph, 36 (2017), no. 4, 1-3.
7 F. Toyama, K. Shoji, and J. Miyamichi, Image mosaicing from a set of images without configuration information, in Proc, Int. Conf. Pattern Recogn. (Cambridge, UK), Aug. 2004, pp. 899-902. https://doi.org/10.1109/ICPR.2004.1334404   DOI
8 S. Agarwal et al., Bundle adjustment in the large, in Proc. Eur. Conf. Comput. Vision (Crete, Greece), 2010, pp. 29-42.
9 I. Eichhardt and D. Barath, Relative pose from deep learned depth and a single affine correspondence, 2020. ArXiv abs/2007.10082.
10 D. Nister, An efficient solution to the five-point relative pose problem, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004), 756-770.   DOI
11 B. Guan et al., Minimal solutions for relative pose with a single affine correspondence, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2020, pp. 1926-1935.
12 C. Raposo and J. P. Barreto, Theory and practice of structure-from-motion using affine correspondences, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las Vegas, NV, USA), June 2016, pp. 5470-5478.
13 Y. Furukawa and J. Ponce, Accurate, dense, and robust multi-view stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010), 1362-1376.   DOI
14 O. Ozyesil et al., A survey of structure from motion*, Acta Numerica 26 (2017), 305-364.   DOI
15 I. Eichhardt and D. Chetverikov, Affine correspondences between central cameras for rapid relative pose estimation, Computer Vision - ECCV 2018 (Cham) (Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, (eds.), Lecture Notes in Computer Science, Springer International Publishing, 2018, pp. 488-503.
16 D. Simons, Current approaches to change blindness, Visual Cognition. 7 (2000), 1-15.   DOI
17 A. Chatterjee and V. Govindu, Robust relative rotation averaging, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), 958-972.   DOI
18 B. Rogers and G. Maureen, Motion parallax as an independent cue for depth perception, Perception 8 (1979), 125-134.   DOI
19 J. Triesch et al., What you see is what you need, J. Vision. 3 (2003), no. 1, 86-94.
20 M. Ji et al., Surfacenet: an end-to-end 3D neural network for multiview stereopsis, in Proc. IEEE Int. Conf. Comput. Vision (Venice, Italy), Oct. 2017, pp. 2326-2334.
21 S. Gao, Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and Gauss-Newton refinement, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2020, pp. 1946-1955. https://doi.org/10.1109/CVPR42600.2020.00202   DOI
22 I. Zoghlami, O. Faugeras, and R. Deriche, Using geometric corners to build a 2D mosaic from a set of images, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (San Juan, PR, USA), June 1997, pp. 420-425. https://doi.org/10.1109/CVPR.1997.609359   DOI
23 L. Heng et al., Infrastructure-based calibration of a multi-camera rig, in Proc. IEEE Int. Conf. Robotics Autom. (Hong Kong, China), 2014, pp. 4912-4919.
24 R. Usamentiaga and D. Garcia, Multi-camera calibration for accurate geometric measurements in industrial environments, Measurement 134 (2019), 345-358.   DOI
25 M. Fischler and R. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM. 24 (1981), 381-395.   DOI
26 S. Zhu et al., Very large-scale global SfM by distributed motion averaging, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Salt Lake City, UT, USA), June 2018, pp. 4568-4577.
27 Z. Cui and P. Tanm, Global structure-from-motion by similarity averaging, in Proc. IEEE Int. Conf. Comput. Vision (Santiago, Chile), Dec. 2015, pp. 864-872.
28 N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism: exploring photo collections in 3D, in Proc. ACM SIGGPAPH (Boston, MA, USA), 2006, pp. 835-846. https://doi.org/10.1145/1179352.1141964   DOI
29 M. Klopschitz et al., Robust incremental structure from motion, in Proc. Int. Symp. 3D Process. Visualization Trans, 2010.
30 R. Shah, A. Deshpande, and P. J. Narayanan, Multistage SfM: a coarse-to-fine approach for 3D reconstruction, 2015. ArXiv abs/1512.06235.
31 K. N. Kutulakos and S. M. Seitz, A theory of shape by space carving, Int. J. Comput. Vis. 38 (2004), 199-218.   DOI
32 S. M. Seitz and C. R. Dyer, Photorealistic scene reconstruction by voxel coloring, Int. J. Comput. Vis. 35 (2004), 151-173.   DOI
33 H. Zhan et al., Visual odometry revisited: what should be learnt?, in Proc. IEEE Int. Conf. Robotics Autom. (Paris, France), 2020, pp. 4203-4210.
34 S. Aghayari et al., Geometric calibration of full spherical panoramic ricoh-theta camera, In Proc. ISPRS Ann. Photogramm. Remote. Sens. Spatial Inform. Sci. (Hannover, Germany), June 2017, pp. 237-245.
35 D. Gledhill, 3D panoramic imaging for virtual environment construction, 2009. http://eprints.hud.ac.uk/id/eprint/6981/
36 L. McMillan and G. Bishop, Plenoptic modeling: an imagebased rendering system, in Proc. Annu. Conf. Comput. Graphics Interactive Techniques, Sept. 1995. pp. 39-46. https://doi.org/10.1145/218380.218398   DOI
37 G. Burdea and P. Coiffet, Virtual reality technology, Presence: Teleoperators Virtual Environ. 12 (2003), 663-664.
38 S. Negahdaripour and X. Xun, Mosaic-based positioning and improved motion-estimation methods for automatic navigation of submersible vehicles, IEEE J. Ocean. Eng. 27 (2002), 79-99.   DOI
39 G. Y. Tian, D. Gledhill, and D. Taylor, Comprehensive interest points based imaging mosaic, Pattern Recogn. Lett. 24 (2003), 1171-1179.   DOI
40 Y. Lin et al., Infrastructure-based multi-camera calibration using radial projections, 2020. ArXiv abs/2007.15330.
41 F. Liu, C. Shen, G. Lin, and I. Reid, Learning depth from single monocular images using deep convolutional neural fields, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), 2024-2039.   DOI
42 A. M. K. Siu and R. W. H. Lau, Image registration for image-based rendering, IEEE Trans. Image Process. 14 (2005), no. 2, 241-252.   DOI
43 J. L. Schonberger and J. Frahm, Structure-from-motion revisited, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las vegas, NV, USA), June 2016, pp. 4104-4113.
44 S. H. Lee and J. Civera, Rotation-only bundle adjustment, 2020. ArXiv abs/2011.11724.
45 D. Smith, Numerical optimization, J. Oper. Res. Soc. 52 (2001), 245.   DOI
46 C. Wu et al., Multicore bundle adjustment, in Proc. CVPR (Colorado Springs, CO, USA), June 2011. https://doi.org/10.1109/CVPR.2011.5995552   DOI
47 B. Wrobel, Multiple view geometry in computer vision, Kunstliche Intell. 15 (2001), 41.
48 Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, MVSNet: Depth Inference for Unstructured Multi-view Stereo, ArXiv, 2018, abs/1804.02505.
49 G. L. David, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision. 60 (2004), 91-110.   DOI
50 D. DScaramuzza, Omnidirectional Camera, Computer vision: a reference guide, Katsushi Ikeuchi, (ed.), Springer US, Boston, MA, 2014, pp. 552-560.
51 B. Triggs et al., Bundle adjustment-a modern synthesis, in Proc. Int. Workshop Vision Algorithms, (Corfu, Greece), Sept. 1999, pp 298-372.
52 Inc. The MathWorks, Computer vision toolbox, Natick, Massachusetts, United State, 2020.
53 S. Yan et al., Image retrieval for structure-from-motion via graph convolutional network, 2020. ArXiv abs/2009.08049.
54 R. A. Newcombe, S. Lovegrove, and A. Davison, DTAM: dense tracking and mapping in real-time, in Proc. Int. Conf. Comput. Vision (Barcelona, Spain), Nov. 2011, pp. 2320-2327.
55 W. Medendorp, D. Tweed, and J. Crawford, Motion parallax is computed in the updating of human spatial memory, J. Neurosci. 23 (2003), 8135-8142.   DOI
56 P. Hedman et al., Deep blending for free-viewpoint image-based rendering, ACM Trans. Graph. 37 (2018), 1-15.
57 M. Broxton et al., Immersive light field video with a layered mesh representation, ACM Trans. Graph. 39 (2020), 86:1-15.
58 A. L. Rodriguez, P. E. L opez-de-Teruel, and A. Ruiz, Reduced epipolar cost for accelerated incremental SfM, in Proc. CVPR (Colorado Sprrings, CO. USA), June 2011. https://doi.org/10.1109/CVPR.2011.5995569   DOI
59 K. Wilson and N. Snavely, Network principles for sfm: Disambiguating repeated structures with local context, in Proc. IEEE Int. Conf. Comput. Vision (Sydney, Australia), Dec. 2013, pp. 513-520.
60 W. Park et al., Structured camera pose estimation for mosaic-based omnidirectional imaging, in Proc. IEEE Int. Symp. Circuits Syst. (Daegu, Rep. of Korea), May 2021, pp. 1-5.
61 R. Mur-Artal, J. M. M. Montiel, and J. D. Tard os, ORB-SLAM: a versatile and accurate monocular SLAM system, IEEE Trans. Robot. 31 (2015), 1147-1163.   DOI
62 T. Taihu Pire, G. I. Fischer, P. C. Castro, J. Civera, and J. Jacobo-Berlles, S-ptam: Stereo parallel tracking and mapping, Robotics Autonomous Syst. 93 (2017), 27-42.   DOI
63 V. Govindu, Combining two-view constraints for motion estimation, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (Kauai, HI, USA), Dec. 2001. https://doi.org/10.1109/CVPR.2001.990963   DOI
64 G. Bradski, The OpenCV library, 2000. Dr. Dobb's Journal of Software Tools.
65 P. Fuchs, Virtual reality headsets-a theoretical and pragmatic approach, London, UK, CRC Press, 2017. https://doi.org/10.1201/9781315208244   DOI
66 S. Seitz et al., A comparison and evaluation of multi-view stereo reconstruction algorithms, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (New York, NY, USA), June 2006, pp. 519-528. https://doi.org/10.1109/CVPR.2006.19   DOI
67 P. C. Merrell et al., Real-time visibility-based fusion of depth maps, in Proc. IEEE Int. Conf. Comput. Vision (Rio de Janeiro, Brazil), Oct. 2007, pp. 1-8.
68 Y. Furukawa and J. Ponce, Accurate, dense, and robust multi-view stereopsis, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Minneapolis, MN, USA), June 2007, pp. 1-8. https://doi.org/10.1109/CVPR.2007.383246   DOI
69 F. Arrigoni et al., Robust synchronization in SO (3) and SE (3) via low-rank and sparse matrix decomposition, Comput. VIsion Image Understanding 174 (2018), 95-113.   DOI
70 P. Purkait, T.-J. Chin, and I. Reid, Neurora: neural robust rotation averaging, in Proc. Eur. Conf. Comput. Vision, 2020.
71 K. Wilson and N. Snavely, Robust global translations with 1DSfM, in Proc. Eur. Conf. Comput. VIsion (Zurich, Switzerland), Sept. 2014, pp. 61-75.
72 D. Eigen, C. R. Puhrsch, and J. Fergus, Depth map prediction from a single image using a multi-scale deep network, 2014. ArXiv abs/1406.2283.
73 Z. Li and N. Snavely, MegaDepth: learning single-view depth prediction from internet photos, IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Salt Lake City, UT, USA), June 2018, pp. 2041-2050.
74 A. Tonioni et al., Learning to adapt for stereo, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Long Beach, CA, USA), June 2019, pp. 9653-9662.
75 C. Forster, M. Pizzoli, and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in Proc. IEEE Int. Conf. Robotics Autom. (Hong Kong, China), 2014, pp. 15-22.
76 S. Wang et al., Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks, in Proc. IEEE Int. Conf. Robotics Autom. (Singapore), 2017, pp. 2043-2050.
77 S. Rogge et al., MPEG-I depth estimation reference software, in Proc. Int. Conf. 3D Immersion (Brussels, Belgium), Dec. 2019. https://doi.org/10.1109/IC3D48390.2019.8975995   DOI
78 W. U. Changchang, Towards linear-time incremental structure from motion, in Proc. Int. Conf. 3D Vision-3DV (Seattle, WA, USA), 2013, pp. 127-134.
79 A. Knapitsch et al., Building large image mosaics with invisible seam lines, in Proc. Aerospace/Defense Sensing Contr. (Orlando, FL, USA), 1988. https://doi.org/10.1117/12.316427   DOI
80 C. S. Kurashima, Ruigang Yang, and A. Lastra, Combining approximate geometry with view-dependent texture mapping-a hybrid approach to 3D video teleconferencing, in Proc. XV Brazilian Symp. Comput. Graphics Image Process. (Fortaleza, Brazil), Oct. 2002, pp. 112-119. https://doi.org/10.1109/SIBGRA.2002.1167133   DOI