DOI QR코드

DOI QR Code

Effective Hand Gesture Recognition by Key Frame Selection and 3D Neural Network

  • 투고 : 2019.07.25
  • 심사 : 2019.10.10
  • 발행 : 2020.03.31

초록

This paper presents an approach for dynamic hand gesture recognition by using algorithm based on 3D Convolutional Neural Network (3D_CNN), which is later extended to 3D Residual Networks (3D_ResNet), and the neural network based key frame selection. Typically, 3D deep neural network is used to classify gestures from the input of image frames, randomly sampled from a video data. In this work, to improve the classification performance, we employ key frames which represent the overall video, as the input of the classification network. The key frames are extracted by SegNet instead of conventional clustering algorithms for video summarization (VSUMM) which require heavy computation. By using a deep neural network, key frame selection can be performed in a real-time system. Experiments are conducted using 3D convolutional kernels such as 3D_CNN, Inflated 3D_CNN (I3D) and 3D_ResNet for gesture classification. Our algorithm achieved up to 97.8% of classification accuracy on the Cambridge gesture dataset. The experimental results show that the proposed approach is efficient and outperforms existing methods.

키워드

참고문헌

  1. Q. D. Smedt; H. Wannous; J.-P. Vandeborr; "Skeleton-Based Dynamic Hand Gesture Recognition", Computer Vision and Pattern Recognition Workshops (CVPRW), 2016
  2. U. Cote-Allard; C. L. Fall; A. Campeau-Lecoursy; C. Gosselin; F. Laviolettez; B. Gosselin; "Transfer Learning for sEMG Hand gesture recognition Using Convolutional Neural Networks," IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2017
  3. M. H. Rahman; J. Afrin; "Hand Gesture Recognition using Multiclass Support Vector Machine," International Journal of Computer Applications, vol.74, no.1, 2013
  4. D. Tran; L. Bourdev; R. Fergus; L. Torresani; M. Paluri; "Learning spatiotemporal features with 3D convolutional networks," Proc. of IEEE Int. Conf. Comput. Vis. (ICCV), pp.4489-4497, 2015
  5. G. Zhu; L. Zhang; P. Shen; J. Song; "Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM," IEEE Access, vol.5, pp.4517-4524, 2017 https://doi.org/10.1109/ACCESS.2017.2684186
  6. J. Donahue; L. A. Hendricks; S. Guadarrama; M. Rohrbach; S. Venugopalan; K. Saenko; T. Darrell; "Long-term recurrent convolutional networks for visual recognition and description," Conference on Computer Vision and Pattern Recognition (CVPR), 2015
  7. V. John; A. Boyali; S. Mita; M. Imanishi; N. Sanma; "Deep Learning-Based Fast Hand Gesture Recognition Using Representative Frames," International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2016
  8. R. F. Rachmadi; K. Uchimura; G. Koutaki; "Video classification using compacted dataset based on selected keyframe," IEEE Region 10 Conference (TENCON), 2016
  9. H. Tang; H. Liu; W. Xiao; N. Sebe; "Fast and powerful hand gesture recognition extraction and feature fusion," NeuroComputing , 2019
  10. H. Jiang; X. Ma; W. Li; S. Ding; C. Mu; "Adaptive key frame extraction from RGB-D for hand gesture recognition," Tenth International Conference on Digital Image Processing (ICDIP 2018), 2018
  11. J. Carreira; A. Zisserman; "Quo Vadis, Action Recognition? A New Model and the Kinetics Datase," Conference on Computer Vision and Pattern Recognition (CVPR), 2017
  12. K. Hara; H. Kataoka; Y. Satoh; "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?," Conference on Computer Vision and Pattern Recognition (CVPR), 2018
  13. C. Szegedy; W. Liu; Y. Jiaj; P. Sermanet; S. Reed; D. Anguelov; D. Erhan; V. Vanhoucke; A. Rabinovich; "Going Deeper with Convolutions," Conference on Computer Vision and Pattern Recognition (CVPR), 2015
  14. K. He; X. Zhang; S. Ren; J. Sun; "Deep residual learning for image recognition," Computer Vision and Pattern Recognition (CVPR), Proc. of the IEEE Conference on, pp.770-778, 2016
  15. S. E. F. d. Avila; A. P. B. Lopes; A. d. L. Jr.; A. d. A. Arajo; "Vsumm: A mechanism designed to produce static video summaries and a novel evaluation method," Pattern Recognition Letters, vol.32, no.1, pp.56 - 68, 2011 https://doi.org/10.1016/j.patrec.2010.08.004
  16. N. N. Hoang; G.-S. Lee; S.-H. Kim; H.-J. Yang; "A Real-time Multimodal Hand Gesture Recognition via 3D Convolutional Neural Network and Key Frame Extraction," Machine Learning in Medical Imaging (MLMI), pp.32-37, 2018
  17. V. Badrinarayanan; A. Kendall; R. Cipolla; "SegNet: A Deep Convolutional Encoder-Decoder Architecture Segmentation," Conference on Computer Vision and Pattern Recognition (CVPR), 2014
  18. https://20bn.com/datasets/jester (accessed Mar.,03, 2020).
  19. Abhijeet Boragule; Guee Sang Lee; "Text Line Segmentation of Handwritten Documents by Area Mapping," Smart Media Journal, vol.4, no.3, pp.44-49, 2015
  20. Son Tung Trieu; Guee Sang Lee; "Machine Printed and Handwritten Text Discrimination in Korean Document Images," Smart Media Journal, vol.5, no.3, pp.30-34, 2016
  21. Tae Seok Lee; Seung Shik Kang; "LSTM based wequence-to-wequence Model for Korean Automatic Word-spacing," Smart Media Journal, vol.7, no.4, pp.17-23, 2018
  22. Do Nhu Tai; Soo-Hyung Kim; Guee-Sang Lee; Hyung-Jeong Yang; In-Seop Na; A-Ran Oh; "Tracking by Detection of Multiple Faces using SSD and CNN Features," Smart Media Journal, vol.7, no.4, pp.1-69, 2018