Camera pose estimation framework for array-structured images

Shin, Min-Jung;Park, Woojune;Kim, Jung Hee;Kim, Joonsoo;Yun, Kuk-Jin;Kang, Suk-Ju;

doi:10.4218/etrij.2021-0303

ETRI Journal

Volume 44 Issue 1
/
Pages.10-23
/
2022
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Camera pose estimation framework for array-structured images

Shin, Min-Jung (Department of Electronic Engineering, Sogang University) ;
Park, Woojune (Department of Electronic Engineering, Sogang University) ;
Kim, Jung Hee (Department of Electronic Engineering, Sogang University) ;
Kim, Joonsoo (Immersive Media Research Section, Electronics and Telecommunications Research Institute) ;
Yun, Kuk-Jin (Immersive Media Research Section, Electronics and Telecommunications Research Institute) ;
Kang, Suk-Ju (Department of Electronic Engineering, Sogang University)

Received : 2021.08.31
Accepted : 2021.12.17
Published : 2022.02.01

https://doi.org/10.4218/etrij.2021-0303 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Despite the significant progress in camera pose estimation and structure-from-motion reconstruction from unstructured images, methods that exploit a priori information on camera arrangements have been overlooked. Conventional state-of-the-art methods do not exploit the geometric structure to recover accurate camera poses from a set of patch images in an array for mosaic-based imaging that creates a wide field-of-view image by sewing together a collection of regular images. We propose a camera pose estimation framework that exploits the array-structured image settings in each incremental reconstruction step. It consists of the two-way registration, the 3D point outlier elimination and the bundle adjustment with a constraint term for consistent rotation vectors to reduce reprojection errors during optimization. We demonstrate that by using individual images' connected structures at different camera pose estimation steps, we can estimate camera poses more accurately from all structured mosaic-based image sets, including omnidirectional scenes.

Keywords

Acknowledgement

This work was supported by the Institute of Information and Communication Technology Planning and Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00207, Immersive Media Research Laboratory), the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2021-2018-0-01421) supervised by the Institute of Information and communications Technology Planning and Evaluation (IITP) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A2C1004208).

References

D. Simons, Current approaches to change blindness, Visual Cognition. 7 (2000), 1-15. https://doi.org/10.1080/135062800394658
J. Triesch et al., What you see is what you need, J. Vision. 3 (2003), no. 1, 86-94.
D. Gledhill, 3D panoramic imaging for virtual environment construction, 2009. http://eprints.hud.ac.uk/id/eprint/6981/
B. Rogers and G. Maureen, Motion parallax as an independent cue for depth perception, Perception 8 (1979), 125-134. https://doi.org/10.1068/p080125
W. Medendorp, D. Tweed, and J. Crawford, Motion parallax is computed in the updating of human spatial memory, J. Neurosci. 23 (2003), 8135-8142. https://doi.org/10.1523/jneurosci.23-22-08135.2003
L. McMillan and G. Bishop, Plenoptic modeling: an imagebased rendering system, in Proc. Annu. Conf. Comput. Graphics Interactive Techniques, Sept. 1995. pp. 39-46. https://doi.org/10.1145/218380.218398
P. Hedman et al., Deep blending for free-viewpoint image-based rendering, ACM Trans. Graph. 37 (2018), 1-15.
M. Broxton et al., Immersive light field video with a layered mesh representation, ACM Trans. Graph. 39 (2020), 86:1-15.
G. Burdea and P. Coiffet, Virtual reality technology, Presence: Teleoperators Virtual Environ. 12 (2003), 663-664.
P. Fuchs, Virtual reality headsets-a theoretical and pragmatic approach, London, UK, CRC Press, 2017. https://doi.org/10.1201/9781315208244
S. Seitz et al., A comparison and evaluation of multi-view stereo reconstruction algorithms, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (New York, NY, USA), June 2006, pp. 519-528. https://doi.org/10.1109/CVPR.2006.19
P. C. Merrell et al., Real-time visibility-based fusion of depth maps, in Proc. IEEE Int. Conf. Comput. Vision (Rio de Janeiro, Brazil), Oct. 2007, pp. 1-8.
Y. Furukawa and J. Ponce, Accurate, dense, and robust multi-view stereopsis, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Minneapolis, MN, USA), June 2007, pp. 1-8. https://doi.org/10.1109/CVPR.2007.383246
C. S. Kurashima, Ruigang Yang, and A. Lastra, Combining approximate geometry with view-dependent texture mapping-a hybrid approach to 3D video teleconferencing, in Proc. XV Brazilian Symp. Comput. Graphics Image Process. (Fortaleza, Brazil), Oct. 2002, pp. 112-119. https://doi.org/10.1109/SIBGRA.2002.1167133
A. M. K. Siu and R. W. H. Lau, Image registration for image-based rendering, IEEE Trans. Image Process. 14 (2005), no. 2, 241-252. https://doi.org/10.1109/TIP.2004.840690
F. Arrigoni et al., Robust synchronization in SO (3) and SE (3) via low-rank and sparse matrix decomposition, Comput. VIsion Image Understanding 174 (2018), 95-113. https://doi.org/10.1016/j.cviu.2018.08.001
A. Chatterjee and V. Govindu, Robust relative rotation averaging, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), 958-972. https://doi.org/10.1109/TPAMI.2017.2693984
P. Purkait, T.-J. Chin, and I. Reid, Neurora: neural robust rotation averaging, in Proc. Eur. Conf. Comput. Vision, 2020.
V. Govindu, Combining two-view constraints for motion estimation, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (Kauai, HI, USA), Dec. 2001. https://doi.org/10.1109/CVPR.2001.990963
K. Wilson and N. Snavely, Robust global translations with 1DSfM, in Proc. Eur. Conf. Comput. VIsion (Zurich, Switzerland), Sept. 2014, pp. 61-75.
J. L. Schonberger and J. Frahm, Structure-from-motion revisited, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las vegas, NV, USA), June 2016, pp. 4104-4113.
O. Ozyesil et al., A survey of structure from motion*, Acta Numerica 26 (2017), 305-364. https://doi.org/10.1017/S096249291700006X
K. Wilson and N. Snavely, Network principles for sfm: Disambiguating repeated structures with local context, in Proc. IEEE Int. Conf. Comput. Vision (Sydney, Australia), Dec. 2013, pp. 513-520.
S. H. Lee and J. Civera, Rotation-only bundle adjustment, 2020. ArXiv abs/2011.11724.
W. Park et al., Structured camera pose estimation for mosaic-based omnidirectional imaging, in Proc. IEEE Int. Symp. Circuits Syst. (Daegu, Rep. of Korea), May 2021, pp. 1-5.
R. A. Newcombe, S. Lovegrove, and A. Davison, DTAM: dense tracking and mapping in real-time, in Proc. Int. Conf. Comput. Vision (Barcelona, Spain), Nov. 2011, pp. 2320-2327.
R. Mur-Artal, J. M. M. Montiel, and J. D. Tard os, ORB-SLAM: a versatile and accurate monocular SLAM system, IEEE Trans. Robot. 31 (2015), 1147-1163. https://doi.org/10.1109/TRO.2015.2463671
T. Taihu Pire, G. I. Fischer, P. C. Castro, J. Civera, and J. Jacobo-Berlles, S-ptam: Stereo parallel tracking and mapping, Robotics Autonomous Syst. 93 (2017), 27-42. https://doi.org/10.1016/j.robot.2017.03.019
D. Eigen, C. R. Puhrsch, and J. Fergus, Depth map prediction from a single image using a multi-scale deep network, 2014. ArXiv abs/1406.2283.
F. Liu, C. Shen, G. Lin, and I. Reid, Learning depth from single monocular images using deep convolutional neural fields, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), 2024-2039. https://doi.org/10.1109/TPAMI.2015.2505283
Z. Li and N. Snavely, MegaDepth: learning single-view depth prediction from internet photos, IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Salt Lake City, UT, USA), June 2018, pp. 2041-2050.
A. Tonioni et al., Learning to adapt for stereo, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Long Beach, CA, USA), June 2019, pp. 9653-9662.
C. Forster, M. Pizzoli, and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in Proc. IEEE Int. Conf. Robotics Autom. (Hong Kong, China), 2014, pp. 15-22.
S. Wang et al., Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks, in Proc. IEEE Int. Conf. Robotics Autom. (Singapore), 2017, pp. 2043-2050.
H. Zhan et al., Visual odometry revisited: what should be learnt?, in Proc. IEEE Int. Conf. Robotics Autom. (Paris, France), 2020, pp. 4203-4210.
D. Nister, An efficient solution to the five-point relative pose problem, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004), 756-770. https://doi.org/10.1109/TPAMI.2004.17
I. Eichhardt and D. Barath, Relative pose from deep learned depth and a single affine correspondence, 2020. ArXiv abs/2007.10082.
B. Guan et al., Minimal solutions for relative pose with a single affine correspondence, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2020, pp. 1926-1935.
C. Raposo and J. P. Barreto, Theory and practice of structure-from-motion using affine correspondences, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las Vegas, NV, USA), June 2016, pp. 5470-5478.
I. Eichhardt and D. Chetverikov, Affine correspondences between central cameras for rapid relative pose estimation, Computer Vision - ECCV 2018 (Cham) (Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, (eds.), Lecture Notes in Computer Science, Springer International Publishing, 2018, pp. 488-503.
N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism: exploring photo collections in 3D, in Proc. ACM SIGGPAPH (Boston, MA, USA), 2006, pp. 835-846. https://doi.org/10.1145/1179352.1141964
M. Klopschitz et al., Robust incremental structure from motion, in Proc. Int. Symp. 3D Process. Visualization Trans, 2010.
W. U. Changchang, Towards linear-time incremental structure from motion, in Proc. Int. Conf. 3D Vision-3DV (Seattle, WA, USA), 2013, pp. 127-134.
A. L. Rodriguez, P. E. L opez-de-Teruel, and A. Ruiz, Reduced epipolar cost for accelerated incremental SfM, in Proc. CVPR (Colorado Sprrings, CO. USA), June 2011. https://doi.org/10.1109/CVPR.2011.5995569
R. Shah, A. Deshpande, and P. J. Narayanan, Multistage SfM: a coarse-to-fine approach for 3D reconstruction, 2015. ArXiv abs/1512.06235.
Z. Cui and P. Tanm, Global structure-from-motion by similarity averaging, in Proc. IEEE Int. Conf. Comput. Vision (Santiago, Chile), Dec. 2015, pp. 864-872.
S. Zhu et al., Very large-scale global SfM by distributed motion averaging, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Salt Lake City, UT, USA), June 2018, pp. 4568-4577.
K. N. Kutulakos and S. M. Seitz, A theory of shape by space carving, Int. J. Comput. Vis. 38 (2004), 199-218. https://doi.org/10.1023/A:1008191222954
S. M. Seitz and C. R. Dyer, Photorealistic scene reconstruction by voxel coloring, Int. J. Comput. Vis. 35 (2004), 151-173. https://doi.org/10.1023/A:1008176507526
M. Ji et al., Surfacenet: an end-to-end 3D neural network for multiview stereopsis, in Proc. IEEE Int. Conf. Comput. Vision (Venice, Italy), Oct. 2017, pp. 2326-2334.
A. Kar, C. Hane, and J. Malik, Learning a multi-view stereo machine, in Proc. Conf. Neural Inform. Process. Syst. (Long Beach CA, USA), 2017.
M. Lhuillier and L. Quan, A quasi-dense approach to surface reconstruction from uncalibrated images, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005), 418-433. https://doi.org/10.1109/TPAMI.2005.44
Y. Furukawa and J. Ponce, Accurate, dense, and robust multi-view stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010), 1362-1376. https://doi.org/10.1109/TPAMI.2009.161
X. Gu et al., Cascade cost volume for high-resolution multi-view stereo and stereo matching, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2019. https://doi.org/10.1109/CVPR42600.2020.00257
Y. Yao et al., Recurrent mvsnet for high-resolution multi-view stereo depth inference, in Proc. IEEE/ CVF Conf. Comput. Vision Pattern Recogn. (Long Beach, CA, USA), June 2019. https//doi.org/10.1109/CVPR.2019.00567
S. Gao, Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and Gauss-Newton refinement, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2020, pp. 1946-1955. https://doi.org/10.1109/CVPR42600.2020.00202
H. Aanaes, R. R. Jensen, G. Vogiatzis, E. Tola, and AB Dahl, Large-scale data for multiple-view stereopsis, Int. J. Comput. Vis. 120 (2016), 153-168. https://doi.org/10.1007/s11263-016-0902-9
A. Knapitsch, J. Park, Q. Y. Zhou, and V. Koltun, Tanks and temples: benchmarking large-scale scene reconstruction, ACM Trans. Graph, 36 (2017), no. 4, 1-3.
A. Knapitsch et al., Building large image mosaics with invisible seam lines, in Proc. Aerospace/Defense Sensing Contr. (Orlando, FL, USA), 1988. https://doi.org/10.1117/12.316427
F. Toyama, K. Shoji, and J. Miyamichi, Image mosaicing from a set of images without configuration information, in Proc, Int. Conf. Pattern Recogn. (Cambridge, UK), Aug. 2004, pp. 899-902. https://doi.org/10.1109/ICPR.2004.1334404
S. Negahdaripour and X. Xun, Mosaic-based positioning and improved motion-estimation methods for automatic navigation of submersible vehicles, IEEE J. Ocean. Eng. 27 (2002), 79-99. https://doi.org/10.1109/48.989892
I. Zoghlami, O. Faugeras, and R. Deriche, Using geometric corners to build a 2D mosaic from a set of images, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (San Juan, PR, USA), June 1997, pp. 420-425. https://doi.org/10.1109/CVPR.1997.609359
G. Y. Tian, D. Gledhill, and D. Taylor, Comprehensive interest points based imaging mosaic, Pattern Recogn. Lett. 24 (2003), 1171-1179. https://doi.org/10.1016/S0167-8655(02)00287-8
S. Aghayari et al., Geometric calibration of full spherical panoramic ricoh-theta camera, In Proc. ISPRS Ann. Photogramm. Remote. Sens. Spatial Inform. Sci. (Hannover, Germany), June 2017, pp. 237-245.
L. Heng et al., Infrastructure-based calibration of a multi-camera rig, in Proc. IEEE Int. Conf. Robotics Autom. (Hong Kong, China), 2014, pp. 4912-4919.
Y. Lin et al., Infrastructure-based multi-camera calibration using radial projections, 2020. ArXiv abs/2007.15330.
D. DScaramuzza, Omnidirectional Camera, Computer vision: a reference guide, Katsushi Ikeuchi, (ed.), Springer US, Boston, MA, 2014, pp. 552-560.
R. Usamentiaga and D. Garcia, Multi-camera calibration for accurate geometric measurements in industrial environments, Measurement 134 (2019), 345-358. https://doi.org/10.1016/j.measurement.2018.10.087
G. L. David, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision. 60 (2004), 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
G. Bradski, The OpenCV library, 2000. Dr. Dobb's Journal of Software Tools.
M. Fischler and R. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM. 24 (1981), 381-395. https://doi.org/10.1145/358669.358692
B. Triggs et al., Bundle adjustment-a modern synthesis, in Proc. Int. Workshop Vision Algorithms, (Corfu, Greece), Sept. 1999, pp 298-372.
S. Agarwal et al., Bundle adjustment in the large, in Proc. Eur. Conf. Comput. Vision (Crete, Greece), 2010, pp. 29-42.
D. Smith, Numerical optimization, J. Oper. Res. Soc. 52 (2001), 245. https://doi.org/10.1057/palgrave.jors.2601183
C. Wu et al., Multicore bundle adjustment, in Proc. CVPR (Colorado Springs, CO, USA), June 2011. https://doi.org/10.1109/CVPR.2011.5995552
Inc. The MathWorks, Computer vision toolbox, Natick, Massachusetts, United State, 2020.
B. Wrobel, Multiple view geometry in computer vision, Kunstliche Intell. 15 (2001), 41.
S. Yan et al., Image retrieval for structure-from-motion via graph convolutional network, 2020. ArXiv abs/2009.08049.
S. Rogge et al., MPEG-I depth estimation reference software, in Proc. Int. Conf. 3D Immersion (Brussels, Belgium), Dec. 2019. https://doi.org/10.1109/IC3D48390.2019.8975995
Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, MVSNet: Depth Inference for Unstructured Multi-view Stereo, ArXiv, 2018, abs/1804.02505.