Acknowledgement
This work was supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIP) (2018-0-00999, Medical Digital Twin Generation and 3D Simulation Technology for Prediction and Computer Aided Diagnosis of Musculoskeletal Disease)
References
- J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, and A. Blake, "Efficient human pose estimation from single depth images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2821-2840, 2012. https://doi.org/10.1109/TPAMI.2012.241
- https://www.intelrealsense.com
- T. Chatzis, A. Stergioulas, D. Konstantinidis, K. Dimitropoulos, and P. Daras, "A comprehensive study on deep learning-based 3D hand pose estimation methods," Applied Sciences, vol. 10, no 19:6850, 2020. https://doi.org/10.3390/app10196850
- R.Wang and J. Popovic, "Real-time hand-tracking with a color glove," ACM Transactions on Graphics, vol. 28, no. 3, pp. 1-8. 2009.
- D. Tang, T. Yu, and T. Kim, "Real-time articulated hand pose estimation using semi-supervised transductive regression forests," In Proceedings of the IEEE International Conference on Computer Vision, pp. 3224-3231, 2013.
- L. Ge, H. Liang, J. Yuan, and D. Thalmann, "Robust 3d hand pose estimation in single depth images: from sin- gle-view CNN to multi-view CNNs," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3393-3601, 2016.
- L. Yann, and Y. Bengio, "Convolutional Networks for images, speech, and time series," In The Handbook of Brain Theory and Neural Networks, MIT Press: Cambridge, MA, USA, vol. 3361, no. 10, 1995.
- s. Hochreiter, and J. Schmidhuber, "Long short-term memory," Neural Computing, vol 9. no. 8, pp. 1735-1780, 1997.
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," Journal of Maching Learning, vol. 11, no. 10, pp. 3371-3408, 2010.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, "Generative adversarial nets," In Proceedings of the Neural Information Processing Systems (NIPS), pp. 2672-2680, 2014.
- M. Oberweger, G. Riegler, P Wohlhart, and V. Lepetit. "Efficiently creating 3D training data for fine hand pose estimation," In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4957-4965, 2016.
- G. Garcia-Hernando, S. Yuan, S. Baek, and T. Kim. "First-person hand action benchmark with RGB-D videos and 3D hand pose annotations, In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 409-419, 2018.
- S. Hampali, M. Rad, M. Oberweger, amd V. Lepetit, "Honnotate: A method for 3D annotation of hand and object poses," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3193-3203, 2020.
- Y. Zhou, J. Lu, K. Du, X. Lin, Y. Sun, and X. Ma, "Hbe: Hand branch ensemble net- work for real-time 3d hand pose estimation," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 501-516, 2018.
- K. Du, X. Lin, Y. Sun, and X. Ma, "Crossinfonet: Multi-task information sharing based hand pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9896-9905, 2019.
- P. Ren, H. Sun, Q. Qi, J. Wang, and W. Huang, "SRN: Stacked regression net- work for real-time 3D hand pose estimation," In Proceedings of the British Machine Vision Conference (BMVC), p. 112, 2019.
- C. Wan, T. Probst, L. Gool, and A. Yao, "Self-supervised 3d hand pose estimation through training by fitting," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10853-10862, 2019.
- F. Xiong, B. Zhang, Y. Xiao, Z. Cao, Y. Yu, J. Zhou, and J. Yuan, J, "A2j: Anchor-to-joint regression network for 3d articulated pose estimation from a single depth image, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 793-802. 2019.
- L. Fang, X. Liu, L. Liu, H. Xu, and W. Kang, "JGR-P2O: joint graph reasoning based pixel-to-offset prediction network for 3D hand pose estimation from a single depth image," In Proceedings of the European Conference on Computer Vision (ECCV), pp.120-137, 2020.
- S. Li and D. Lee, "Point-to-pose voting based hand pose estimation using re- sidual permutation equivariant layer," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11927-11936, 2019.
- Y. Chen, Z. Tu, L. Ge, D. Zhang, R. Chen, and J. Yuan, "So-handnet: Self-organizing network for 3d hand pose estimation with semi-supervised learning," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 6961-6970, 2019.
- L. Ge, H. Liang, J. Yuan, and D. Thalmann, "3D convolutional neural net- works for efficient and robust hand pose estimation from single depth images," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1991-2000, 2017.
- J. Malik, E. Abdelziz, A. Elhayek, S. Shimada, S. Ali, V. Golyanik, C. Theobalt, and D. Stricker, "HandVoxNet: deep vox- el-based network for 3D hand shape and pose estimation from a single depth map," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7113-7122, 2020.
- C. Zimmermann, amd T. Brox, "Learning to estimate 3d hand pose from single rgb images," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4903-4911, 2017.
- A. Boukhayma, R. Bem, and P. Torr, "3d hand shape and pose from images in the wild," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10843-10852, 2019.
- U. Iqbal, P. Molchanov, J. T. Gall, and J. Kautz, "Hand pose estimation via latent 2.5d heatmap regression," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 118-134, 2018.
- F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, D. Sridhar, D. Casas, and C. Theobalt, "Ganerated hands for real-time 3d hand tracking from monocular rgb," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49-59, 2018.
- Z. Cao, G. Hidalgo, T. Simon, S. Wei, and T. Sheikh, "OpenPose: realtime multi-per- son 2D pose estimation using part affinity fields," IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 43, no. 1, pp. 172-186, 2019.
- X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, "End-to-end hand mesh recov- ery from a monocular rgb image," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2354-2364, 2019.
- S. Baek, K. Kim, and T. Kim, "Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1067-1076, 2019.
- A. Spurr, U. Iqbal, P. Molchanov, O. Hilliges, and J. Kautx, "Weakly supervised 3D hand pose estimation via bio-mechanical constraints," In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
- T. Theodoridis, T. Chatzis, V. Solachidis, and K. Dimitropoulos, "Cross-modal variational alignment of latent spaces," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), pp. 960-969, 2020.
- A. Spurr, J. Song, S. Park, and O. Hilliges, "Cross-modal deep variational hand pose estimation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 89-98, 2018.
- L. Yang, S. Li, D. Lee, and A. Yao. "Aligning latent spaces for 3d hand pose estimation, In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2335-2343, 2019.
- C. Wan, T. Probst, L. Gool, and A. Yao, "Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 680-689, 2017.
- B. Zhu, C. Ngo, J. Chen, and Y. Hao, "R2gan: Cross-modal recipe retrieval with generative adversarial network," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11477-11486, 2019.
- L. Yang, and A. Yao, "Disentangling latent hands for image synthesis and pose estimation, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9877-9886, 2019.
- J. Gu, Z. Wang, W. Ouyang, J. Li, and L. Zhuo, "3d hand pose estimation with disentangled cross-modal latent space," In Proceedings of the IEEE Winter Conference on Applications on Computer Vision (WACV), pp. 391-400, 2020.
- H. Zhang, Z. Bo, J. Yong, and F. Xu, "Interaction fusion: Real-time re- construction of hand poses and deformable objects in hand-object interactions, ACM Transactions on Graphics, vol. 38, no. 4, 2019.
- Y. Hasson, G. Varol, D. Tzionas, I. Kalevatykh, M. Black, I. Laptev, and C. Schmid, "Learning joint reconstruction of hands and manipulated objects," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11807-11816, 2019.
- B. Doosti, S. Naha, M. Mirbagheri, and D. Crandall, "HOPE-Net: A graph-based model for hand-object pose estimation,", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6608-6617, 2020.
- B. Tekin, F. Bogo, and M. Pollefeys, "H+O: Unified egocentric recognition of 3D hand-object poses and interactions," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511-4520, 2019.
- S. Baek, K. Kim, and T. Kim, "Weakly-supervised domain adaptation via gan and mesh model for estimating 3D hand poses interacting objects," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6120-6131, 2020.
- J. Wang, F. Mueller, F. Bernard, S. Sorli, O. Sotnychenko, N. Qian, M. Otaduy, D. Casas, and C. Theobalt, "RGB2Hands: real-time tacking of 3D hand interactions from monocular RGB video," ACM Transactions on Graphics (TOG), vol. 39, no. 6, 2020.
- J. Romero, D. Tzionas, and M. Black, "Embodied hands: modeling and capturing hands and bodies together," ACM Transactions on Graphics (TOG), vol 36, no. 6, 2017.
- B. Smith, C. Wu, P. Peluse, Y. Sheikh, J. Hodgins, and T. Shiratori, "Constraining dense hand surface tracking with elasticity," ACM Transactions on Graphics (TOG), vol 39. no. 6. 2020.
- X. Zhang, Q. Li, H. Mo, W. Zhang, and W. Zheng, "End-to-end hand mesh recov- ery from a monocular RGB image," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2354-2364, 2019.
- L. Yang, J. Li, W. Xu, Y. Diao, and C. Lu, "BiHand: recovering hand mesh with multi-stage bisected hourglass networks," In Proceedings of the British Machine Vision Conference (BMVC), 2020.
- D. Kulon, R. Guler, I. Kokkinos, M. Bronstein, and S. Zafeiriou, "Weakly-supervised mesh-convolutional hand re- construction in the wild," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4990-5000, 2020.
- C. Wan, T. Probst, L. Gool, and A. Yao, "Dual grid net: hand mesh vertex re- gression from single depth maps," In Proceedings of the European Conference on Computer Vision (ECCV), pp.442-459, 2020.
- G. Moon, T. Shiratori, and K. Lee, "DeepHandMesh: a weakly-supervised deep encoder-decoder framework for high-fidelity hand mesh modeling," In Proceedings of the European Conference on Computer Vision (ECCV), pp.440-455, 2020.
- D. Tang, H. Jin, A. Tejani, T. Kim, "Latent regression forest: Structured estimation of 3d articulated hand posture," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3786-3793, 2014.
- J. Tompson, M. Stein, Y. Lecun, and K. Perlin, "Real-time continuous pose recov- ery of human hands using convolutional networks,", ACM Transactions on Graphics (ToG), vol. 33, pp. 1-10, 2014.
- X. Sun, Y. Wei, S. Liang, X. Tang, and J. Sun, "Cascaded hand pose re- gression," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 824-832, 2015.
- A. Wetzler, R. Slossberg, and R. Kimmel, "Rule of thumb: Deep derotation for im- proved fingertip detection," arXiv:1507.05 726, 2015.
- S. Yuan, Q. Ye, B. Stenger, S. Jain, amd T. Kim, "Bighand2.2m benchmark: Hand pose dataset and state of the art analysis," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4866-4874. 2017.
- J. Malik, A. Elhayek, F. Nunnari, K. Varanasi, K. Tamaddon, A. Heloir, and D. Stricker, "Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth," In Proceedings of the International Conference on 3D Vision (3DV), pp. 110-119, 2018.
- C. Ziimmermann, D. Ceylan, J. Yang, B. Russell, M. Argus, and T. Brox, "Freihand:A dataset formarerless capture of hand pose and shape from single rgb images," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 813-822, 2019.
- G. Moon, S. Yu, H. Wen, T. Shiratori, and K. Lee, "InterHand2.6M: A dataset and baseline for 3D interacting hand pose esti- mation from a Single RGB Image," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 548-564, 2020.
- S. Sridhar, A. Oulasvirta, and C. Theobalt, "Interactive markerless articulated hand motion tracking using RGB and depth data," In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2456-2463, 2013.
- S. Sridhar, F. Mueller, and M. Zollhofer, D. Casas, A. Oulasvirta, and C. Theobalt, "Real-time joint tracking of a hand manip- ulating an object from rgb-d input," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 294-310, 2016.
- J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, and Q. Yang, "A hand pose tracking benchmark from stereo matching," In Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 982-986, 2017.
- F. Meller, D. Mehta, O. Sotnychenko, S. Sridhar, D. Casas, and C. Theobalt, "Real-time hand tracking under occlusion from an egocentric rgb-d sensor," In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1284-1293, 2017.
- S. Brahmbhatt, C. Tang, C. Twigg, C. Kemp, and J. Hays, "ContactPose: a dataset of grasps with object contact and hand pose," In Proceedings of the European Conference on Computer Vision (ECCV), pp. 361-378, 2020.