[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2020.09.020

HSFE Network and Fusion Model based Dynamic Hand Gesture Recognition

Tai, Do Nhu (Department of Computer Science, Chonnam National University)
Na, In Seop (SW Convergence Education Institute, Chosun University)
Kim, Soo Hyung (Department of Computer Science, Chonnam National University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.14, no.9, 2020 , pp. 3924-3940 More about this Journal

Abstract

Dynamic hand gesture recognition(d-HGR) plays an important role in human-computer interaction(HCI) system. With the growth of hand-pose estimation as well as 3D depth sensors, depth, and the hand-skeleton dataset is proposed to bring much research in depth and 3D hand skeleton approaches. However, it is still a challenging problem due to the low resolution, higher complexity, and self-occlusion. In this paper, we propose a hand-shape feature extraction(HSFE) network to produce robust hand-shapes. We build a hand-shape model, and hand-skeleton based on LSTM to exploit the temporal information from hand-shape and motion changes. Fusion between two models brings the best accuracy in dynamic hand gesture (DHG) dataset.

Keywords

HSFE network; dynamic hand gesture; hand detection; hand gesture recognition; LSTM;

Citations & Related Records

Reference

1	F. Coleca, T. Martinetz, and E. Barth, "Gesture interfaces with depth sensors," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8200 LNCS, pp. 207-227, 2013.
2	C. Zhang and Y. Tian, "Histogram of 3D Facets: A depth descriptor for human action and hand gesture recognition," Computer Vision and Image Understanding., vol. 139, pp. 29-39, 2015. DOI
3	J. Taylor et al., "Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences," ACM Transactions on Graphics (TOG), vol. 35, no. 4, pp. 1-12, 2016. DOI
4	D. J. Tan et al., "Fits Like a Glove: Rapid and Reliable Hand Shape Personalization," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5610-5619, 2016.
5	E. Ohn-Bar and M. M. Trivedi, "Joint Angles Similarities and HOG2 for Action Recognition," in Proc. of the IEEE conference on computer vision and pattern recognition workshops, pp. 465-470, 2013.
6	O. Oreifej and Z. Liu, "HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences," in Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 716-723, 2013.
7	L. Sun, K. Jia, D. Yeung, and B. E. Shi, "Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks," in Proceedings of the IEEE International Conference on Computer Vision, pp. 4597-4605, 2015.
8	A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017. DOI
9	V. Badrinarayanan, A. Kendall, and R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481-2495, 2017. DOI
10	O. Russakovsky, J.Deng, H.Su, J.Krause, S.Satheesh, S.Ma, Z.Huang, and A.Karpathy, "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015. DOI
11	G. Varol, I. Laptev, and C. Schmid, "Long-Term Temporal Convolutions for Action Recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 6, pp. 1510-1517, 2018. DOI
12	Z. Liu, C. Zhang, and Y. Tian, "3D-based Deep Convolutional Neural Network for action recognition with depth sequences," Image and Vision Computing, vol. 55, pp. 93-100, 2016. DOI
13	Y. Du, W. Wang, and L. Wang, "Hierarchical recurrent neural network for skeleton based action recognition," in Proc. of the IEEE conference on computer vision and pattern recognition, vol. 07-12-June, pp. 1110-1118, 2015.
14	P. Molchanov, S. Gupta, K. Kihwan, and J. Kautz, "Hand gesture recognition with 3D convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1-7, 2015.
15	Q. De Smedt, H. Wannous, and J. P. Vandeborre, "Skeleton-Based Dynamic Hand Gesture Recognition," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1206-1214, 2016.
16	P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz, "Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207-4215, 2016.
17	G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim, "First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 409-419, 2017.
18	J. S. Supancic, G. Rogez, Y. Yang, J. Shotton, and D. Ramanan, "Depth-Based Hand Pose Estimation: Data, Methods, and Challenges," in Proc. of the IEEE international conference on computer vision, pp.1868-1876, 2015.
19	X. Chen, H. Guo, G. Wang, and L. Zhang, "Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition," in Proc. of IEEE International Conference on Image Processing (ICIP), pp. 2881-2885, 2017.
20	J. C. Nunez, R. Cabido, J. J. Pantrigo, A. S. Montemayor, and J. F. Velez, "Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition," Pattern Recognition, vol. 76, pp. 80-94, 2018. DOI
21	G. Devineau, F. Moutarde, W. Xi, and J. Yang, "Deep learning for hand gesture recognition on skeletal data," in Proc. of IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 106-113, 2018.
22	Q. De Smedt, H. Wannous, J. Vandeborre, J. Guerry, B. Le Saux, and D. Filliat, "SHREC'17 Track : 3D Hand Gesture Recognition Using a Depth and Skeletal Dataset," 3DOR - 10th Eurographics Workshop on 3D Object Retrieval, pp. 1-6, 2017.
23	Q. De Smedt, "Dynamic hand gesture recognition - From traditional handcrafted to recent deep learning approaches," Computer Vision and Pattern Recognition [cs.CV], Universite de Lille 1, Sciences et Technologies; CRIStAL UMR 9189, 2017.
24	C. Zimmermann and T. Brox, "Learning to Estimate 3D Hand Pose from Single RGB Images," in Proc. of the IEEE International Conference on Computer Vision, pp. 4903-4911, 2017.
25	J. Y. Chang, G. Moon, and K. M. Lee, "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map," in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5079-5088, 2018.
26	T. Sharp et al., "Accurate, Robust, and Flexible Real-time Hand Tracking," in Proc. of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3633-3642, 2015.
27	X. Zhang, X. Chen, Y. Li, V. Lantz, K. Wang, and J. Yang, "A framework for hand gesture recognition based on accelerometer and EMG sensors," IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 41, no. 6, pp. 1064-1076, 2011. DOI
28	H. Olafsdottir and C. Appert, "Multi-touch gestures for discrete and continuous control," in Proc. of the 2014 International Working Conference on Advanced Visual Interfaces, pp.177-184, 2014.
29	J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, "Deep learning for sensor-based activity recognition: A Survey," Pattern Recognition Letters, vol. 119, pp. 3-11, 2019. DOI
30	S. S. Rautaray and A. Agrawal, "Vision based hand gesture recognition for human computer interaction: a survey," Artificial Intelligence Review, vol. 43, no. 1, pp. 1-54, 2015. DOI
31	C. Zhang and Y. Tian, "Edge enhanced depth motion map for dynamic hand gesture recognition," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 500-505, 2013.
32	J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu, "Robust 3D action recognition with random occupancy patterns," in Proc. of European Conference on Computer Vision, pp. 872-885, 2012.
33	M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, and A. Del Bimbo, "3-D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold," IEEE transactions on cybernetics, vol. 45, no. 7, pp. 1340-1352, 2015. DOI
34	S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp.1137-1149, 2017. DOI
35	M. Asadi-Aghbolaghi et al., "A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences," in Proc. of IEEE international conference on automatic face & gesture recognition (FG), pp. 476-483, 2017.
36	C. Szegedy et al., "Going deeper with convolutions," in Proc. of the IEEE conference on computer vision and pattern recognition, pp.1-9, 2015.
37	L. A. Anonymous, E. Krupka, N. Bloom, D. Freedman, A. Vinnikov, and A. B. Hillel, "Toward realistic hands gesture interface : Keeping it simple for developers and machines," in Proc. of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 1887-1898, 2017.
38	W. Lu, Z. Tong, and J. Chu, "Dynamic hand gesture recognition with leap motion controller," IEEE Signal Processing Letters, vol. 23, no. 9, pp. 1188-1192, 2016. DOI
39	N. Neverova, C. Wolf, G. Taylor, and F. Nebout, "ModDrop: Adaptive Multi-Modal Gesture Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1692-1706, 2016. DOI
40	N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," The Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014.
41	H. Jung, S. Lee, J. Yim, S. Park, and J. Kim, "Joint fine-tuning in deep neural networks for facial expression recognition," in Proc. of the IEEE international conference on computer vision, pp. 2983-2991, 2015.