영상 학습 기반 손 포즈 추정 최신 연구 동향 분석

  • Published : 2021.03.31

Abstract

Keywords

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIP) (2018-0-00999, Medical Digital Twin Generation and 3D Simulation Technology for Prediction and Computer Aided Diagnosis of Musculoskeletal Disease)

References

  1. J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake, "Efficient human pose estimation from single depth images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2821-2840, 2012. https://doi.org/10.1109/TPAMI.2012.241
  2. https://www.intelrealsense.com
  3. T. Chatzis, A. Stergioulas, D. Konstantinidis, K. Dimitropoulos, and P. Daras, "A comprehensive study on deep learning-based 3D hand pose estimation methods," Applied Sciences, vol. 10, no 19:6850, 2020. https://doi.org/10.3390/app10196850
  4. R.Wang and J. Popovic, "Real-time hand-tracking with a color glove," ACM Transactions on Graphics, vol. 28, no. 3, pp. 1-8. 2009.
  5. D. Tang, T. Yu, and T. Kim, "Real-time articulated hand pose estimation using semi-supervised transductive regression forests," In Proceedings of the IEEE International Conference on Computer Vision, pp. 3224-3231, 2013.
  6. L. Ge, H. Liang, J. Yuan, and D. Thalmann, "Robust 3d hand pose estimation in single depth images: from sin- gle-view CNN to multi-view CNNs," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3393-3601, 2016.
  7. L. Yann, and Y. Bengio, "Convolutional Networks for images, speech, and time series," In The Handbook of Brain Theory and Neural Networks, MIT Press: Cambridge, MA, USA, vol. 3361, no. 10, 1995.
  8. s. Hochreiter, and J. Schmidhuber, "Long short-term memory," Neural Computing, vol 9. no. 8, pp. 1735-1780, 1997.
  9. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," Journal of Maching Learning, vol. 11, no. 10, pp. 3371-3408, 2010.
  10. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, "Generative adversarial nets," In Proceedings of the Neural Information Processing Systems (NIPS), pp. 2672-2680, 2014.
  11. M. Oberweger, G. Riegler, P Wohlhart, and V. Lepetit. "Efficiently creating 3D training data for fine hand pose estimation," In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4957-4965, 2016.
  12. G. Garcia-Hernando, S. Yuan, S. Baek, and T. Kim. "First-person hand action benchmark with RGB-D videos and 3D hand pose annotations, In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409-419, 2018.
  13. S. Hampali, M. Rad, M. Oberweger, amd V. Lepetit, "Honnotate: A method for 3D annotation of hand and object poses," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3193-3203, 2020.
  14. Y. Zhou, J. Lu, K. Du, X. Lin, Y. Sun, and X. Ma, "Hbe: Hand branch ensemble net- work for real-time 3d hand pose estimation," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 501-516, 2018.
  15. K. Du, X. Lin, Y. Sun, and X. Ma, "Crossinfonet: Multi-task information sharing based hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9896-9905, 2019.
  16. P. Ren, H. Sun, Q. Qi, J. Wang, and W. Huang, "SRN: Stacked regression net- work for real-time 3D hand pose estimation," In Proceedings of the British Machine Vision Conference (BMVC), p. 112, 2019.
  17. C. Wan, T. Probst, L. Gool, and A. Yao, "Self-supervised 3d hand pose estimation through training by fitting," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10853-10862, 2019.
  18. F. Xiong, B. Zhang, Y. Xiao, Z. Cao, Y. Yu, J. Zhou, and J. Yuan, J, "A2j: Anchor-to-joint regression network for 3d articulated pose estimation from a single depth image, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 793-802. 2019.
  19. L. Fang, X. Liu, L. Liu, H. Xu, and W. Kang, "JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image," In Proceedings of the European Conference on Computer Vision (ECCV), pp.120-137, 2020.
  20. S. Li and D. Lee, "Point-to-pose voting based hand pose estimation using re- sidual permutation equivariant layer," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11927-11936, 2019.
  21. Y. Chen, Z. Tu, L. Ge, D. Zhang, R. Chen, and J. Yuan, "So-handnet: Self-organizing network for 3d hand pose estimation with semi-supervised learning," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 6961-6970, 2019.
  22. L. Ge, H. Liang, J. Yuan, and D. Thalmann, "3D convolutional neural net- works for efficient and robust hand pose estimation from single depth images," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1991-2000, 2017.
  23. J. Malik, E. Abdelziz, A. Elhayek, S. Shimada, S. Ali, V. Golyanik, C. Theobalt, and D. Stricker, "HandVoxNet: deep vox- el-based network for 3D hand shape and pose estimation from a single depth map," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7113-7122, 2020.
  24. C. Zimmermann, amd T. Brox, "Learning to estimate 3d hand pose from single rgb images," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4903-4911, 2017.
  25. A. Boukhayma, R. Bem, and P. Torr, "3d hand shape and pose from images in the wild," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10843-10852, 2019.
  26. U. Iqbal, P. Molchanov, J. T. Gall, and J. Kautz, "Hand pose estimation via latent 2.5d heatmap regression," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 118-134, 2018.
  27. F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, D. Sridhar, D. Casas, and C. Theobalt, "Ganerated hands for real-time 3d hand tracking from monocular rgb," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49-59, 2018.
  28. Z. Cao, G. Hidalgo, T. Simon, S. Wei, and T. Sheikh, "OpenPose: realtime multi-per- son 2D pose estimation using part affinity fields," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 43, no. 1, pp. 172-186, 2019.
  29. X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, "End-to-end hand mesh recov- ery from a monocular rgb image," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2354-2364, 2019.
  30. S. Baek, K. Kim, and T. Kim, "Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1067-1076, 2019.
  31. A. Spurr, U. Iqbal, P. Molchanov, O. Hilliges, and J. Kautx, "Weakly supervised 3D hand pose estimation via bio-mechanical constraints," In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  32. T. Theodoridis, T. Chatzis, V. Solachidis, and K. Dimitropoulos, "Cross-modal variational alignment of latent spaces," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pp. 960-969, 2020.
  33. A. Spurr, J. Song, S. Park, and O. Hilliges, "Cross-modal deep variational hand pose estimation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 89-98, 2018.
  34. L. Yang, S. Li, D. Lee, and A. Yao. "Aligning latent spaces for 3d hand pose estimation, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2335-2343, 2019.
  35. C. Wan, T. Probst, L. Gool, and A. Yao, "Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 680-689, 2017.
  36. B. Zhu, C. Ngo, J. Chen, and Y. Hao, "R2gan: Cross-modal recipe retrieval with generative adversarial network," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11477-11486, 2019.
  37. L. Yang, and A. Yao, "Disentangling latent hands for image synthesis and pose estimation, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9877-9886, 2019.
  38. J. Gu, Z. Wang, W. Ouyang, J. Li, and L. Zhuo, "3d hand pose estimation with disentangled cross-modal latent space," In Proceedings of the IEEE Winter Conference on Applications on Computer Vision (WACV), pp. 391-400, 2020.
  39. H. Zhang, Z. Bo, J. Yong, and F. Xu, "Interaction fusion: Real-time re- construction of hand poses and deformable objects in hand-object interactions, ACM Transactions on Graphics, vol. 38, no. 4, 2019.
  40. Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. Black, I. Laptev, and C. Schmid, "Learning joint reconstruction of hands and manipulated objects," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11807-11816, 2019.
  41. B. Doosti, S. Naha, M. Mirbagheri, and D. Crandall, "HOPE-Net: A graph-based model for hand-object pose estimation,", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6608-6617, 2020.
  42. B. Tekin, F. Bogo, and M. Pollefeys, "H+O: Unified egocentric recognition of 3D hand-object poses and interactions," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511-4520, 2019.
  43. S. Baek, K. Kim, and T. Kim, "Weakly-supervised domain adaptation via gan and mesh model for estimating 3D hand poses interacting objects," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6120-6131, 2020.
  44. J. Wang, F. Mueller, F. Bernard, S. Sorli, O. Sotnychenko, N. Qian, M. Otaduy, D. Casas, and C. Theobalt, "RGB2Hands: real-time tacking of 3D hand interactions from monocular RGB video," ACM Transactions on Graphics (TOG), vol. 39, no. 6, 2020.
  45. J. Romero, D. Tzionas, and M. Black, "Embodied hands: modeling and capturing hands and bodies together," ACM Transactions on Graphics (TOG), vol 36, no. 6, 2017.
  46. B. Smith, C. Wu, P. Peluse, Y. Sheikh, J. Hodgins, and T. Shiratori, "Constraining dense hand surface tracking with elasticity," ACM Transactions on Graphics (TOG), vol 39. no. 6. 2020.
  47. X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, "End-to-end hand mesh recov- ery from a monocular RGB image," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2354-2364, 2019.
  48. L. Yang, J. Li, W. Xu, Y. Diao, and C. Lu, "BiHand: recovering hand mesh with multi-stage bisected hourglass networks," In Proceedings of the British Machine Vision Conference (BMVC), 2020.
  49. D. Kulon, R. Guler, I. Kokkinos, M. Bronstein, and S. Zafeiriou, "Weakly-supervised mesh-convolutional hand re- construction in the wild," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4990-5000, 2020.
  50. C. Wan, T. Probst, L. Gool, and A. Yao, "Dual grid net: hand mesh vertex re- gression from single depth maps," In Proceedings of the European Conference on Computer Vision (ECCV), pp.442-459, 2020.
  51. G. Moon, T. Shiratori, and K. Lee, "DeepHandMesh: a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling," In Proceedings of the European Conference on Computer Vision (ECCV), pp.440-455, 2020.
  52. D. Tang, H. Jin, A. Tejani, T. Kim, "Latent regression forest: Structured estimation of 3d articulated hand posture," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3786-3793, 2014.
  53. J. Tompson, M. Stein, Y. Lecun, and K. Perlin, "Real-time continuous pose recov- ery of human hands using convolutional networks,", ACM Transactions on Graphics (ToG), vol. 33, pp. 1-10, 2014.
  54. X. Sun, Y. Wei, S. Liang, X. Tang, and J. Sun, "Cascaded hand pose re- gression," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 824-832, 2015.
  55. A. Wetzler, R. Slossberg, and R. Kimmel, "Rule of thumb: Deep derotation for im- proved fingertip detection," arXiv:1507.05 726, 2015.
  56. S. Yuan, Q. Ye, B. Stenger, S. Jain, amd T. Kim, "Bighand2.2m benchmark: Hand pose dataset and state of the art analysis," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4866-4874. 2017.
  57. J. Malik, A. Elhayek, F. Nunnari, K. Varanasi, K. Tamaddon, A. Heloir, and D. Stricker, "Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth," In Proceedings of the International Conference on 3D Vision (3DV), pp. 110-119, 2018.
  58. C. Ziimmermann, D. Ceylan, J. Yang, B. Russell, M. Argus, and T. Brox, "Freihand:A dataset formarerless capture of hand pose and shape from single rgb images," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 813-822, 2019.
  59. G. Moon, S. Yu, H. Wen, T. Shiratori, and K. Lee, "InterHand2.6M: A dataset and baseline for 3D interacting hand pose esti- mation from a Single RGB Image," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 548-564, 2020.
  60. S. Sridhar, A. Oulasvirta, and C. Theobalt, "Interactive markerless articulated hand motion tracking using RGB and depth data," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2456-2463, 2013.
  61. S. Sridhar, F. Mueller, and M. Zollhofer, D. Casas, A. Oulasvirta, and C. Theobalt, "Real-time joint tracking of a hand manip- ulating an object from rgb-d input," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 294-310, 2016.
  62. J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, and Q. Yang, "A hand pose tracking benchmark from stereo matching," In Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 982-986, 2017.
  63. F. Meller, D. Mehta, O. Sotnychenko, S. Sridhar, D. Casas, and C. Theobalt, "Real-time hand tracking under occlusion from an egocentric rgb-d sensor," In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1284-1293, 2017.
  64. S. Brahmbhatt, C. Tang, C. Twigg, C. Kemp, and J. Hays, "ContactPose: a dataset of grasps with object contact and hand pose," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 361-378, 2020.