Camera pose estimation framework for array-structured images |
Shin, Min-Jung
(Department of Electronic Engineering, Sogang University)
Park, Woojune (Department of Electronic Engineering, Sogang University) Kim, Jung Hee (Department of Electronic Engineering, Sogang University) Kim, Joonsoo (Immersive Media Research Section, Electronics and Telecommunications Research Institute) Yun, Kuk-Jin (Immersive Media Research Section, Electronics and Telecommunications Research Institute) Kang, Suk-Ju (Department of Electronic Engineering, Sogang University) |
1 | A. Kar, C. Hane, and J. Malik, Learning a multi-view stereo machine, in Proc. Conf. Neural Inform. Process. Syst. (Long Beach CA, USA), 2017. |
2 | M. Lhuillier and L. Quan, A quasi-dense approach to surface reconstruction from uncalibrated images, IEEE Trans. Pattern Anal. Mach. Intell. 27 (2005), 418-433. DOI |
3 | X. Gu et al., Cascade cost volume for high-resolution multi-view stereo and stereo matching, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2019. https://doi.org/10.1109/CVPR42600.2020.00257 DOI |
4 | Y. Yao et al., Recurrent mvsnet for high-resolution multi-view stereo depth inference, in Proc. IEEE/ CVF Conf. Comput. Vision Pattern Recogn. (Long Beach, CA, USA), June 2019. https//doi.org/10.1109/CVPR.2019.00567 DOI |
5 | H. Aanaes, R. R. Jensen, G. Vogiatzis, E. Tola, and AB Dahl, Large-scale data for multiple-view stereopsis, Int. J. Comput. Vis. 120 (2016), 153-168. DOI |
6 | A. Knapitsch, J. Park, Q. Y. Zhou, and V. Koltun, Tanks and temples: benchmarking large-scale scene reconstruction, ACM Trans. Graph, 36 (2017), no. 4, 1-3. |
7 | F. Toyama, K. Shoji, and J. Miyamichi, Image mosaicing from a set of images without configuration information, in Proc, Int. Conf. Pattern Recogn. (Cambridge, UK), Aug. 2004, pp. 899-902. https://doi.org/10.1109/ICPR.2004.1334404 DOI |
8 | S. Agarwal et al., Bundle adjustment in the large, in Proc. Eur. Conf. Comput. Vision (Crete, Greece), 2010, pp. 29-42. |
9 | I. Eichhardt and D. Barath, Relative pose from deep learned depth and a single affine correspondence, 2020. ArXiv abs/2007.10082. |
10 | D. Nister, An efficient solution to the five-point relative pose problem, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004), 756-770. DOI |
11 | B. Guan et al., Minimal solutions for relative pose with a single affine correspondence, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2020, pp. 1926-1935. |
12 | C. Raposo and J. P. Barreto, Theory and practice of structure-from-motion using affine correspondences, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las Vegas, NV, USA), June 2016, pp. 5470-5478. |
13 | Y. Furukawa and J. Ponce, Accurate, dense, and robust multi-view stereopsis, IEEE Trans. Pattern Anal. Mach. Intell. 32 (2010), 1362-1376. DOI |
14 | O. Ozyesil et al., A survey of structure from motion*, Acta Numerica 26 (2017), 305-364. DOI |
15 | I. Eichhardt and D. Chetverikov, Affine correspondences between central cameras for rapid relative pose estimation, Computer Vision - ECCV 2018 (Cham) (Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, (eds.), Lecture Notes in Computer Science, Springer International Publishing, 2018, pp. 488-503. |
16 | D. Simons, Current approaches to change blindness, Visual Cognition. 7 (2000), 1-15. DOI |
17 | A. Chatterjee and V. Govindu, Robust relative rotation averaging, IEEE Trans. Pattern Anal. Mach. Intell. 40 (2018), 958-972. DOI |
18 | B. Rogers and G. Maureen, Motion parallax as an independent cue for depth perception, Perception 8 (1979), 125-134. DOI |
19 | J. Triesch et al., What you see is what you need, J. Vision. 3 (2003), no. 1, 86-94. |
20 | M. Ji et al., Surfacenet: an end-to-end 3D neural network for multiview stereopsis, in Proc. IEEE Int. Conf. Comput. Vision (Venice, Italy), Oct. 2017, pp. 2326-2334. |
21 | S. Gao, Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and Gauss-Newton refinement, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Seattle, WA, USA), June 2020, pp. 1946-1955. https://doi.org/10.1109/CVPR42600.2020.00202 DOI |
22 | I. Zoghlami, O. Faugeras, and R. Deriche, Using geometric corners to build a 2D mosaic from a set of images, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (San Juan, PR, USA), June 1997, pp. 420-425. https://doi.org/10.1109/CVPR.1997.609359 DOI |
23 | L. Heng et al., Infrastructure-based calibration of a multi-camera rig, in Proc. IEEE Int. Conf. Robotics Autom. (Hong Kong, China), 2014, pp. 4912-4919. |
24 | R. Usamentiaga and D. Garcia, Multi-camera calibration for accurate geometric measurements in industrial environments, Measurement 134 (2019), 345-358. DOI |
25 | M. Fischler and R. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM. 24 (1981), 381-395. DOI |
26 | S. Zhu et al., Very large-scale global SfM by distributed motion averaging, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Salt Lake City, UT, USA), June 2018, pp. 4568-4577. |
27 | Z. Cui and P. Tanm, Global structure-from-motion by similarity averaging, in Proc. IEEE Int. Conf. Comput. Vision (Santiago, Chile), Dec. 2015, pp. 864-872. |
28 | N. Snavely, S. M. Seitz, and R. Szeliski, Photo tourism: exploring photo collections in 3D, in Proc. ACM SIGGPAPH (Boston, MA, USA), 2006, pp. 835-846. https://doi.org/10.1145/1179352.1141964 DOI |
29 | M. Klopschitz et al., Robust incremental structure from motion, in Proc. Int. Symp. 3D Process. Visualization Trans, 2010. |
30 | R. Shah, A. Deshpande, and P. J. Narayanan, Multistage SfM: a coarse-to-fine approach for 3D reconstruction, 2015. ArXiv abs/1512.06235. |
31 | K. N. Kutulakos and S. M. Seitz, A theory of shape by space carving, Int. J. Comput. Vis. 38 (2004), 199-218. DOI |
32 | S. M. Seitz and C. R. Dyer, Photorealistic scene reconstruction by voxel coloring, Int. J. Comput. Vis. 35 (2004), 151-173. DOI |
33 | H. Zhan et al., Visual odometry revisited: what should be learnt?, in Proc. IEEE Int. Conf. Robotics Autom. (Paris, France), 2020, pp. 4203-4210. |
34 | S. Aghayari et al., Geometric calibration of full spherical panoramic ricoh-theta camera, In Proc. ISPRS Ann. Photogramm. Remote. Sens. Spatial Inform. Sci. (Hannover, Germany), June 2017, pp. 237-245. |
35 | D. Gledhill, 3D panoramic imaging for virtual environment construction, 2009. http://eprints.hud.ac.uk/id/eprint/6981/ |
36 | L. McMillan and G. Bishop, Plenoptic modeling: an imagebased rendering system, in Proc. Annu. Conf. Comput. Graphics Interactive Techniques, Sept. 1995. pp. 39-46. https://doi.org/10.1145/218380.218398 DOI |
37 | G. Burdea and P. Coiffet, Virtual reality technology, Presence: Teleoperators Virtual Environ. 12 (2003), 663-664. |
38 | S. Negahdaripour and X. Xun, Mosaic-based positioning and improved motion-estimation methods for automatic navigation of submersible vehicles, IEEE J. Ocean. Eng. 27 (2002), 79-99. DOI |
39 | G. Y. Tian, D. Gledhill, and D. Taylor, Comprehensive interest points based imaging mosaic, Pattern Recogn. Lett. 24 (2003), 1171-1179. DOI |
40 | Y. Lin et al., Infrastructure-based multi-camera calibration using radial projections, 2020. ArXiv abs/2007.15330. |
41 | F. Liu, C. Shen, G. Lin, and I. Reid, Learning depth from single monocular images using deep convolutional neural fields, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016), 2024-2039. DOI |
42 | A. M. K. Siu and R. W. H. Lau, Image registration for image-based rendering, IEEE Trans. Image Process. 14 (2005), no. 2, 241-252. DOI |
43 | J. L. Schonberger and J. Frahm, Structure-from-motion revisited, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Las vegas, NV, USA), June 2016, pp. 4104-4113. |
44 | S. H. Lee and J. Civera, Rotation-only bundle adjustment, 2020. ArXiv abs/2011.11724. |
45 | D. Smith, Numerical optimization, J. Oper. Res. Soc. 52 (2001), 245. DOI |
46 | C. Wu et al., Multicore bundle adjustment, in Proc. CVPR (Colorado Springs, CO, USA), June 2011. https://doi.org/10.1109/CVPR.2011.5995552 DOI |
47 | B. Wrobel, Multiple view geometry in computer vision, Kunstliche Intell. 15 (2001), 41. |
48 | Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, MVSNet: Depth Inference for Unstructured Multi-view Stereo, ArXiv, 2018, abs/1804.02505. |
49 | G. L. David, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision. 60 (2004), 91-110. DOI |
50 | D. DScaramuzza, Omnidirectional Camera, Computer vision: a reference guide, Katsushi Ikeuchi, (ed.), Springer US, Boston, MA, 2014, pp. 552-560. |
51 | B. Triggs et al., Bundle adjustment-a modern synthesis, in Proc. Int. Workshop Vision Algorithms, (Corfu, Greece), Sept. 1999, pp 298-372. |
52 | Inc. The MathWorks, Computer vision toolbox, Natick, Massachusetts, United State, 2020. |
53 | S. Yan et al., Image retrieval for structure-from-motion via graph convolutional network, 2020. ArXiv abs/2009.08049. |
54 | R. A. Newcombe, S. Lovegrove, and A. Davison, DTAM: dense tracking and mapping in real-time, in Proc. Int. Conf. Comput. Vision (Barcelona, Spain), Nov. 2011, pp. 2320-2327. |
55 | W. Medendorp, D. Tweed, and J. Crawford, Motion parallax is computed in the updating of human spatial memory, J. Neurosci. 23 (2003), 8135-8142. DOI |
56 | P. Hedman et al., Deep blending for free-viewpoint image-based rendering, ACM Trans. Graph. 37 (2018), 1-15. |
57 | M. Broxton et al., Immersive light field video with a layered mesh representation, ACM Trans. Graph. 39 (2020), 86:1-15. |
58 | A. L. Rodriguez, P. E. L opez-de-Teruel, and A. Ruiz, Reduced epipolar cost for accelerated incremental SfM, in Proc. CVPR (Colorado Sprrings, CO. USA), June 2011. https://doi.org/10.1109/CVPR.2011.5995569 DOI |
59 | K. Wilson and N. Snavely, Network principles for sfm: Disambiguating repeated structures with local context, in Proc. IEEE Int. Conf. Comput. Vision (Sydney, Australia), Dec. 2013, pp. 513-520. |
60 | W. Park et al., Structured camera pose estimation for mosaic-based omnidirectional imaging, in Proc. IEEE Int. Symp. Circuits Syst. (Daegu, Rep. of Korea), May 2021, pp. 1-5. |
61 | R. Mur-Artal, J. M. M. Montiel, and J. D. Tard os, ORB-SLAM: a versatile and accurate monocular SLAM system, IEEE Trans. Robot. 31 (2015), 1147-1163. DOI |
62 | T. Taihu Pire, G. I. Fischer, P. C. Castro, J. Civera, and J. Jacobo-Berlles, S-ptam: Stereo parallel tracking and mapping, Robotics Autonomous Syst. 93 (2017), 27-42. DOI |
63 | V. Govindu, Combining two-view constraints for motion estimation, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (Kauai, HI, USA), Dec. 2001. https://doi.org/10.1109/CVPR.2001.990963 DOI |
64 | G. Bradski, The OpenCV library, 2000. Dr. Dobb's Journal of Software Tools. |
65 | P. Fuchs, Virtual reality headsets-a theoretical and pragmatic approach, London, UK, CRC Press, 2017. https://doi.org/10.1201/9781315208244 DOI |
66 | S. Seitz et al., A comparison and evaluation of multi-view stereo reconstruction algorithms, in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn. (New York, NY, USA), June 2006, pp. 519-528. https://doi.org/10.1109/CVPR.2006.19 DOI |
67 | P. C. Merrell et al., Real-time visibility-based fusion of depth maps, in Proc. IEEE Int. Conf. Comput. Vision (Rio de Janeiro, Brazil), Oct. 2007, pp. 1-8. |
68 | Y. Furukawa and J. Ponce, Accurate, dense, and robust multi-view stereopsis, in Proc. IEEE Conf. Comput. Vision Pattern Recogn. (Minneapolis, MN, USA), June 2007, pp. 1-8. https://doi.org/10.1109/CVPR.2007.383246 DOI |
69 | F. Arrigoni et al., Robust synchronization in SO (3) and SE (3) via low-rank and sparse matrix decomposition, Comput. VIsion Image Understanding 174 (2018), 95-113. DOI |
70 | P. Purkait, T.-J. Chin, and I. Reid, Neurora: neural robust rotation averaging, in Proc. Eur. Conf. Comput. Vision, 2020. |
71 | K. Wilson and N. Snavely, Robust global translations with 1DSfM, in Proc. Eur. Conf. Comput. VIsion (Zurich, Switzerland), Sept. 2014, pp. 61-75. |
72 | D. Eigen, C. R. Puhrsch, and J. Fergus, Depth map prediction from a single image using a multi-scale deep network, 2014. ArXiv abs/1406.2283. |
73 | Z. Li and N. Snavely, MegaDepth: learning single-view depth prediction from internet photos, IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Salt Lake City, UT, USA), June 2018, pp. 2041-2050. |
74 | A. Tonioni et al., Learning to adapt for stereo, in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recogn. (Long Beach, CA, USA), June 2019, pp. 9653-9662. |
75 | C. Forster, M. Pizzoli, and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in Proc. IEEE Int. Conf. Robotics Autom. (Hong Kong, China), 2014, pp. 15-22. |
76 | S. Wang et al., Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks, in Proc. IEEE Int. Conf. Robotics Autom. (Singapore), 2017, pp. 2043-2050. |
77 | S. Rogge et al., MPEG-I depth estimation reference software, in Proc. Int. Conf. 3D Immersion (Brussels, Belgium), Dec. 2019. https://doi.org/10.1109/IC3D48390.2019.8975995 DOI |
78 | W. U. Changchang, Towards linear-time incremental structure from motion, in Proc. Int. Conf. 3D Vision-3DV (Seattle, WA, USA), 2013, pp. 127-134. |
79 | A. Knapitsch et al., Building large image mosaics with invisible seam lines, in Proc. Aerospace/Defense Sensing Contr. (Orlando, FL, USA), 1988. https://doi.org/10.1117/12.316427 DOI |
80 | C. S. Kurashima, Ruigang Yang, and A. Lastra, Combining approximate geometry with view-dependent texture mapping-a hybrid approach to 3D video teleconferencing, in Proc. XV Brazilian Symp. Comput. Graphics Image Process. (Fortaleza, Brazil), Oct. 2002, pp. 112-119. https://doi.org/10.1109/SIBGRA.2002.1167133 DOI |