Browse > Article

A Review of 3D Object Tracking Methods Using Deep Learning  

Park, Hanhoon (Department of Electronic Engineering, Pukyong National University)
Publication Information
Journal of the Institute of Convergence Signal Processing / v.22, no.1, 2021 , pp. 30-37 More about this Journal
Abstract
Accurate 3D object tracking with camera images is a key enabling technology for augmented reality applications. Motivated by the impressive success of convolutional neural networks (CNNs) in computer vision tasks such as image classification, object detection, image segmentation, recent studies for 3D object tracking have focused on leveraging deep learning. In this paper, we review deep learning approaches for 3D object tracking. We describe key methods in this field and discuss potential future research directions.
Keywords
3D object tracking; Camera pose estimation; Vision-based; Deep learning; Augmented Reality;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Crivellaro, M. Rad, Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, "A novel representation of parts for accurate 3D object detection and tracking in monocular images," Proc. of ICCV, pp. 4391-4399, 2015.
2 K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the devil in the details: delving deep into convolutional nets," Proc. of BMVC, 2014.
3 J. Xiao, A. Owens, and A. Torralba, "SUN3D: a database of big spaces reconstructed using SfM and object labels," Proc. of ICCV, pp. 1625-1632, 2013.
4 K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua, "Learning to find good correspondences," Proc. of CVPR, pp. 2666-2674, 2018.
5 S. Ren, K. He, R. B. Girshick, and J. Sun, "Faster R-CNN: towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.   DOI
6 M Bui, C. Baur, N. Navab, S. Ilic, and S. Albarqouni, "Adversarial networks for camera pose regression and refinement," Proc. of ICCVW, pp. 3778-3787, 2019.
7 V. A. Prisacariu, O. Kahler, D. W. Murray, and I. D. Reid, "Real-time 3D tracking and reconstruction on mobile phones," IEEE Trans. on Vis. and Comp. Grap., vol. 21, no. 5, pp. 557-570, 2015.   DOI
8 H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-up robust features (SURF)," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346-359, 2008.   DOI
9 S. Shoman, T. Mashita, A. Plopski, P. Ratsamee, Y. Uranishi, and H. Takemura, "Illumination invariant camera localization using synthetic images," Proc. of ISMAR-Adjunct, pp. 143-144, 2018.
10 S. Mahendran, H. Ali, and R. Vidal, "3D pose regression using convolutional neural networks," Proc. of ICCVW, pp. 2174-2182, 2017.
11 M. Garon and J.-F. Lalonde, "Deep 6-DOF tracking," IEEE Trans. on Vis. and Comp. Grap., vol. 23, no. 11, pp. 2410-2418, 2017.   DOI
12 O. Akgul, H. I. Penekli, and Y. Genc, "Applying deep learning in augmented reality tracking," Proc. of SITIS, pp. 47-54, 2016.
13 J. R. Rambach, A. Tewari, A. Pagani, and D. Stricker, "Learning to fuse: a deep learning approach to visual-inertial camera pose estimation," Proc. of ISMAR, pp. 71-76, 2016.
14 Y. Shavit and R. Ferens, "Introduction to camera pose estimation with deep learning," arXiv preprint arXiv:1907.05272, 2019.
15 B. Tekin, S. N. Sinha, and P. Fua, "Real-time seamless single shot 6D object pose prediction," Proc. of CVPR, pp. 292-301, 2018.
16 K.-M. Lee and J.-I. Kim, "Design and implementation of hybrid VR lock system by Arduino control," The Journal of Korea Institute of Signal Processing and Systems, vol. 15, no. 3, pp. 97-103, 2014.
17 E. Marchand, H. Uchiyama, and F. Spindler, "Pose estimation for augmented reality: a hands-on survey," IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 12, pp. 2633-2651, 2016.   DOI
18 X. Liu, J. Zhang, X. He, X. Song, and X. Qin, "6DoF pose estimation with object cutout based on a deep autoencoder," Proc. of ISMAR-Adjunct, 2019.
19 J. Rambach, C. Deng, A. Pagani, and D. Stricker, "Learning 6DoF object poses from synthetic single channel images," Proc. of ISMAR-Adjunct, pp. 164-169, 2018.
20 G. Pavlakos, X. Zhou, A. Chan, K. G. Derpanis, and K. Daniilidis, "6-DoF object pose from semantic keypoints," Proc. of ICRA, pp. 2011-2018, 2017.
21 D. G. Lowe, "Distinctive image features from scale-invariant keypoints," IJCV, vol. 60, no. 2, pp. 91-110, 2004.   DOI
22 K. Park, J. Prankl, and M. Vincze, "Mutual hypothesis verification for 6D pose estimation of natural objects," Proc. of ICCVW, pp. 2192-2199, 2017.
23 T. Sattler, Q. Zhou, M. Pollefeys, Laura Leal-Taixe, "Understanding the limitations of CNN-based absolute camera pose regression," Proc. of CVPR, pp. 3297-3307, 2019.
24 T.-T. Do, M. Cai, T. Pham, and I. Reid, "Deep-6DPose: recovering 6D object pose from a single RGB image," arXiv preprint arXiv:1802.10367, 2018.
25 H. Su, C. R. Qi, Y. Li, and L. J. Guibas, "Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views," Proc. of ICCV, pp. 2686-2694, 2015.
26 A. Dey, M. Billinghurst, R. W. Lindeman, and J. E. Swan, "A systematic review of 10 years of augmented reality usability studies: 2005 to 2014," Front. Robot. AI, vol. 5, article 37, 2018.
27 T. X. Qing, W. Fan, and Z. Y. Tao, "Camera pose estimation method based on deep neural network," Proc. of ICDLT, pp. 85-90, 2019.
28 P. Han and G. Zhao, "A review of edge-based 3D tracking of rigid objects," Virtual Reality & Intelligent Hardware, vol. 1, no. 6, pp. 580-596, 2019.   DOI
29 N.-D. Duong, A. Kacete, C. Sodalie, P.-Y. Richard, and J. Royan, "xyzNet: towards machine learning camera relocalization by using a scene coordinate prediction network," Proc. of ISMAR-Adjunct, pp. 258-263, 2018.
30 Y. Wu, F. Tang, and H. Li, "Image-based camera localization: an overview," Visual Computing for Industry, Biometric, and Art, vol. 1, article number: 8, 2018.
31 B. Wang, F. Zhong, and X. Qin, "Pose optimization in edge distance field for textureless 3D object tracking," Proc. of the Computer Graphics International Conference, article no. 32, 2017.
32 V. A. Knyaz, O. Vygolov, V. V. Kniaz, Y. Vizilter, and V. Gorbatsevich, "Deep learning of convolutional auto-encoder for image matching and 3D object reconstruction in the infrared range," Proc. of ICCVW, pp. 2155-2164, 2017.
33 H. Park and J.-I. Park, "Recent trends and analysis on AR technology - focused on 3D object tracking methods," Proc. of The Korean Institute of Broadcast and Media Engineers Summer Conference, pp. 299-300, 2018.
34 R. Hartley and A. Zisserman, Multiple View Geometry, 2nd Ed., Cambridge University Press, 2003.
35 S. Zhang, C. Song, and R. Radkowski, "Setforge - synthetic RGB-D training data generation to support CNN-based pose estimation for augmented reality," Proc. of ISMAR-Adjunct, pp. 227-232, 2019.
36 J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, "Domain randomization for transferring deep neural networks from simulation to the real world," Proc. of IROS, pp. 23-30, 2017.
37 P. Wohlhart and V. Lepetit, "Learning descriptors for object recognition and 3D pose estimation," Proc. of CVPR, pp. 3109-3118, 2015.
38 K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, "LIFT: learned invariant feature transform," Proc. of ECCV, pp. 467-483, 2016.
39 C. B. Choy, J. Gwak, S. Savarese, and M. Chandraker, "Universal correspondence network," Proc. of NIPS, pp. 2414-2422, 2016.
40 D. DeTone, T. Malisiewicz, and A. Rabinovich, "SuperPoint: self-supervised interest point detection and description," Proc. of CVPRW, 2018.
41 W. Kehl, F. Milletari, F. Tombari, S. Ilic, and N. Navab, "Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation," Proc. of ECCV, vol. 3, pp. 205-220, 2016.
42 H. Zhang and Q. Cao, "Combined holistic and local patches for recovering 6D object pose," Proc. of ICCVW, pp. 2219-2227, 2017.