Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2022.22.1.19

Automatic Gesture Recognition for Human-Machine Interaction: An Overview  

Nataliia, Konkina (Department of Automation of Power Processes and Systems Engineering (APEPS), Faculty of heat power engineering, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute")
Publication Information
International Journal of Computer Science & Network Security / v.22, no.1, 2022 , pp. 129-138 More about this Journal
Abstract
With the increasing reliance of computing systems in our everyday life, there is always a constant need to improve the ways users can interact with such systems in a more natural, effective, and convenient way. In the initial computing revolution, the interaction between the humans and machines have been limited. The machines were not necessarily meant to be intelligent. This begged for the need to develop systems that could automatically identify and interpret our actions. Automatic gesture recognition is one of the popular methods users can control systems with their gestures. This includes various kinds of tracking including the whole body, hands, head, face, etc. We also touch upon a different line of work including Brain-Computer Interface (BCI), Electromyography (EMG) as potential additions to the gesture recognition regime. In this work, we present an overview of several applications of automated gesture recognition systems and a brief look at the popular methods employed.
Keywords
Human-Machine Interaction; Gesture Recognition Systems; Hand Tracking; Activity Recognition; Brain Computer Interface (BCI);
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Ergurel, D.: Leap Motion announces $50 million in Series C funding. In: Haptical. https://haptic.al/leap-motionannounces-50-million-in-series-c-funding-a1a1f8c0440a (2017, July 18)
2 Hayward, A.: Nintendo Wii U review. In: TechRadar. https://www.techradar.com/reviews/gaming/gamesconsoles/nintendo-wii-u-1084120/review (2015, December 1)
3 Khan, U.M., Kabir, Z., Hassan, S. A., Ahmed, S. H.: A Deep Learning Framework Using Passive WiFi Sensing for Respiration Monitoring. In: GLOBECOM 2017 - 2017 IEEE Global Communications Conference, pp. 1-6, doi: 10.1109/GLOCOM.2017.8255027 (2017)   DOI
4 Shokat, S., Riaz, R., Rizvi, S. S., Abbasi, A. M., Abbasi, A. A., & Kwon, S. J.: Deep learning scheme for character prediction with position-free touch screen-based Braille input method. In: Human-Centric Computing and Information Sciences, vol. 10(1), pp. 1-24 (2020)   DOI
5 Nunez, J. C., Cabido, R., Pantrigo, J. J., Montemayor, A. S., & Velez, J. F.: Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition, vol. 76, pp.80-94 (2018)   DOI
6 Grover, S.: Myo gesture armband. In: CyberGeeks. https://cybergeeks.in/myo-armband/ (2014, December 30)
7 Data Glove_Products & Solutions_Goertek. In: Goertek.Com. Retrieved December 19, 2021, from https://www.goertek.com/en/content/details62_16718.html (n.d.)
8 Koles, Z.J, Lazar, M.S, Zhou, S.Z.: Spatial patterns underlying population differences in the background EEG. In: Brain Topography, vol. 2(4), pp. 275-284 (1990)   DOI
9 Zheng, Z., Chen, Z., Hu, F., Zhu, J., Tang, Q., Liang, Y.: An Automatic Diagnosis of Arrhythmias Using a Combination of CNN and LSTM Technology. In: Electronics, vol.9(1), p.121 (2020)   DOI
10 Fenghour, S., Chen, D., Guo, K., Li, B., & Xiao, P. Deep learning-based automated lip-reading: A survey. IEEE Access: Practical Innovations, Open Solutions, vol. 9, pp. 121184-121205 (2021)
11 Zheng, C., Wu, W., Yang, T., Zhu, S., Chen, C., Liu, ., Shen, J., Kehtarnavaz, N., Shah, M.: Deep Learning-Based Human Pose Estimation: A Survey. Arxiv Preprint (2021)
12 Mitra, S., & Acharya, T.: Gesture Recognition: A Survey. In: IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews: A Publication of the IEEE Systems, Man, and Cybernetics Society, vol. 37(3), pp. 311-324. doi:10.1109/tsmcc.2007.893280 (2007).   DOI
13 Sarkar, A.R., Sanyal, G., Majumder, S.: Hand Gesture Recognition Systems: A Survey. In: International Journal of Computer Applications (2013)
14 Janveja, I., Nambi, A., Bannur, S., Gupta, S., & Padmanabhan, V.: InSight: Monitoring the state of the driver in low-light using smartphones. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4(3), pp.1-29. doi:10.1145/3411819 (2020).   DOI
15 Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: R-cnns for pose estimation and action detection. In: arXiv preprint arXiv:1406.5212 (2014).
16 Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: arXiv preprint arXiv:1504.07159 (2015)
17 Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. In: arXiv preprint arXiv:1710.02322 (2017)
18 Xu, N., Gao, X., Hong, B., Miao, X., Gao, S., Yang, F. BCI competition 2003-dataset IIb: Enhancing P300 wave detection using ICA-based subspace projections for BCI applications. In: IEEE Transactions on Biomedical Engineering, vol.51(6), pp.1067-1072 (2004)   DOI
19 Andersen, A. H., Gash, D. M., & Avison, M. J.: Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. In: Magnetic Resonance Imaging, vol.17(6), pp.795-815 (1999)   DOI
20 Herault, J., Jutten, C., Denker, J.S.: Space or time adaptive signal processing by neural network models. In: AIP Conference Proceedings, vol. 151, pp. 206-211 (1986)
21 Bell, A.J, Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. In: Neural Computation, vol.7(6), pp. 1129-1159 (1995)   DOI
22 Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp. 1799-1807 (2014)
23 Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. In: arXiv preprint arXiv:1801.07372 (2018)
24 Trachuk, T., Vdovichena, O., Andriushchenko, M., Semenda, O., Pashkevych, M.: Branding and Advertising on Social Networks: Current Trends. In: IJCSNS International Journal of Computer Science and Network Security, vol.21 no.4, pp. 178-185 (2021)
25 Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks. In: arXiv preprint arXiv:1312.7302 (2013)
26 Jain, A., Tompson, J., LeCun, Y., Bregler, C. Modeep.: A deep learning framework using motion features for human pose estimation. In: Proc. Asian conference on computer vision, Springer. pp. 302-315 (2014)
27 Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656c (2015)
28 Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proc. IEEE International Conference on Computer Vision, p. 7 (2017)
29 Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D.: Quantized densely connected u-nets for efficient landmark localization. In: Proc. European Conference on Computer Vision, pp. 339-354 (2018)
30 Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733-4742 (2016)
31 Holland, A. C., O'Connell, G., & Dziobek, I.: Facial mimicry, empathy, and emotion recognition: a meta-analysis of correlations. In: Cognition & Emotion, vol. 35(1), pp.150-168. (2021). https://doi.org/10.1080/02699931.2020.1815655   DOI
32 Iqbal, U., Milan, A., & Gall, J.: PoseTrack: Joint Multi-person Pose Estimation and Tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
33 Marinoiu, E., Papava, D., & Sminchisescu, C.: Pictorial human spaces: How well do humans perceive a 3D articulated pose? In: 2013 IEEE International Conference on Computer Vision (2013)
34 Pons-Moll, G., Romero, J., Mahmood, N., & Black, M. J.: Dyna: A model of dynamic human shape in motion. In: ACM Transactions on Graphics, vol. 34(4), pp. 1-14 (2015)
35 Zuffi, S., Black, M.J.: The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose. In: Computer Vision and Pattern Recognition (CVPR) (2015).
36 Xu, H., Bazavan, E. G., Zanfir, A., Freeman, W. T., Sukthankar, R., Sminchisescu, C.: Ghum & ghuml: Generative 3d human shape and articulated pose models. In: Computer Vision and Pattern Recognition (CVPR) (2020).
37 Cuncic, A.: How to better understand facial expressions. In: Verywell Mind. Retrieved December 19, 2021, from https://www.verywellmind.com/understanding-emotionsthrough-facial-expressions-3024851 (March 30, 2021)
38 Ting, W., Guo-zheng, Y., Bang-hua, Y., & Hong, S.: EEG feature extraction based on wavelet packet decomposition for brain computer interface. In: Measurement: Journal of the International Measurement Confederation, vol.41(6), pp. 618-625 (2008)   DOI
39 Delorme, A., & Makeig, S.: EEG changes accompanying learned regulation of 12-Hz EEG activity. In: IEEE Transactions on Neural Systems and Rehabilitation Engineering: A Publication of the IEEE Engineering in Medicine and Biology Society, vol.11(2), pp.133-137 (2003)   DOI
40 Kanoga, S., Nakanishi, M., & Mitsukura, Y.: Assessing the effects of voluntary and involuntary eyeblinks in independent components of electroencephalogram. In: Neurocomputing, vol.193, pp. 20-32 (2016)   DOI
41 Yang, B.-H., Yan, G.-Z., Wu, T., & Yan, R.-G.: Subject-based feature extraction using fuzzy wavelet packet in brain-computer interfaces. In: Signal Processing, vol.87(7), pp. 1569-1574 (2007)   DOI
42 Black, M. J., Jepson, A. D.: A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. In: Proc. 5th Eur. Conf. Comput. Vis., vol. 1, pp. 909-924 (1998)
43 Luo, W., Schwing, A. G., & Urtasun, R.: Efficient deep learning for stereo matching. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
44 Wang, X., Xia, M., Cai, H., Gao, Y., & Cattani, C.: Hidden-Markov-Models-based dynamic hand gesture recognition. In: Mathematical Problems in Engineering, pp. 1-11 (2012)
45 Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time sequential images using hidden Markov model. In: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recogn., Champaign, IL, pp. 379-385 (1992)
46 Starner, T., Weaver, J., & Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 1371-1378 (1998)   DOI
47 Isard, M., Blake A.: CONDENSATION -- conditional density propagation for visual tracking. In: Int. J. Comput. Vis., vol. 1, pp. 5-28 (1998)   DOI
48 Hong, P., Turk M., Huang, T. S.: Gesture modeling and recognition using finite state machines. In: Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recogn., Grenoble, France, pp. 410-415 (2000)
49 Davis, J., Shah, M.: Visual gesture recognition. In: Vis., Image Signal Process., vol. 141, pp. 101-106 (1994)   DOI
50 Starner, T., & Pentland, A. Real-time american sign language recognition from video using hidden markov models. In: Motion-based recognition. Springer, Dordrecht, pp. 227-243 (1997)
51 Bablani, A., Edla, D. R., Tripathi, D., & Cheruku, R.: Survey on brain-computer interface: An emerging computational intelligence paradigm. In: ACM Computing Surveys, vol. 52(1), pp.1-32 (2019)
52 Low, T., Bubalo, N., Gossen, T., Kotzyba, M., Brechmann, A, Huckauf, A., Nurnberger, A.: Towards Identifying User Intentions in Exploratory Search using Gaze and Pupil Tracking. In: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (CHIIR '17). Association for Computing Machinery, New York, NY, USA, https://doi.org/10.1145/3020165.3022131 (2017).   DOI
53 Nogales, R.E., Benalcazar, M.E.: Hand gesture recognition using machine learning and infrared information: a systematic literature review. In: International Journal of Machine Learning and Cybernetics, vol. 12, pp. 2859-2886 (2021)   DOI
54 Hamid, M. S., Fajar, N., Manap, A., Hamzah R.A., Kadmin A.F.: Stereo matching algorithm based on deep learning: A survey. In: Journal of King Saud University - Computer and Information Sciences (2020)
55 Anson, D., Brandon, C., Davis, A., Hill, M., Michalik, B., & Sennett, C.: Swype vrs. conventional on-screen keyboards: Efficacy compared. In: RESNA Annual Conference (2012)
56 Lu, D., Yu, Y., & Liu, H.: Gesture recognition using data glove: An extreme learning machine method. In : 2016 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, pp. 1349-1354 (2016) https://doi.org/10.1109/robio.2016.7866514   DOI
57 Cowen, A.S., Keltner, D., Schroff, F., Jou, B., Adam, H., Prasad G.: Sixteen facial expressions occur in similar contexts worldwide. In: Nature, vol.589(7841), pp. 251-257 (2021)   DOI
58 Joo, H., Simon, T., Sheikh, Y.: Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies. In: Computer Vision and Pattern Recognition (CVPR) (2018).
59 Yeasin, M., Chaudhuri, S.: Visual understanding of dynamic hand gestures. In: Pattern Recogn., vol. 33, pp. 1805-1817, (2000)   DOI
60 Boye, A. T., Kristiansen, U. Q., Billinger, M., Nascimento, O. F. do, & Farina, D.: Identification of movement-related cortical potentials with optimized spatial filtering and principal component analysis. In: Biomedical Signal Processing and Control, vol.3(4), pp.300-304 (2008)   DOI
61 Li, S., Liu, Z.Q., Chan, A.B.: Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 482-489 (2014).
62 Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137-5146 (2018)
63 Bansode, R., Pashte, S., Sawant, S., Sabnis, S.K.: Drowsy Driver Detection System. In: International Journal for Scientific Research & Development, vol. 5, no. 2, pp. 2134-2137 (2016)
64 Tur, A. O., & Keles, H. Y.: Evaluation of hidden Markov models using deep CNN features in isolated sign recognition. In: Multimedia Tools and Applications, vol. 80(13), pp. 19137-19155 (2021)   DOI
65 Pigou, L., Dieleman, S., Kindermans, P.J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: Workshop at the European Conference on Computer Vision, pp. 572-578. Springer (2014)
66 Nishida, N., Nakayama, H.: Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network. In: Image and Video Technology, Lecture Notes in Computer Science, pp. 682-694. Springer International Publishing, Cham (2016)
67 Fang, H., Xie, S., Tai, Y.W., Lu, C.: Rmpe: Regional multi-person pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2334-2343 (2017)
68 Yang, W., Peng, Y., & Xie, H.: Action Recognition Based on Kinect Deep Learning. In: Journal of Frontiers of Society, Science and Technology, vol. 1(2), pp.11-15 (2021)
69 Zhao, M., Li T., Alsheikh, M.A., Tian Y., Zhao H., Torralba A., Katabi D.: Through-Wall Human Pose Estimation Using Radio Signals. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7356-7365, doi: 10.1109/CVPR.2018.00768 (2018)   DOI
70 Dix, A.: Human-Computer Interaction. In: Encyclopedia of Database Systems, pp. 1734-1739. New York, NY: Springer New York (2018).
71 Chen, Y., Tian, Y., He, M.: Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods. In: Computer Vision and Image Understanding (CVIU), vol. 192, https://doi.org/10.1016/j.cviu.2019.102897 (2020).   DOI
72 Baldeon, K., Onate, W., & Caiza, G.: Augmented reality for learning sign language using Kinect tool. In: Smart Innovation, Systems and Technologies, pp. 447-457. Springer Singapore (2021)
73 Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: Joint subset partition and labeling for multiperson pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929-4937 (2016)
74 Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In: Proc. European Conference on Computer Vision, Springer. pp. 34-50 (2016)
75 Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J.: SMPL: A skinned multi-person linear model. In: ACM Transactions on Graphics, vol. 34(6), pp. 1-16 (2015)
76 Pfister T., Simonyan K., Charles J., Zisserman A.: Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos. In: (eds) Computer Vision - ACCV 2014. ACCV 2014. Lecture Notes in Computer Science, vol. 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_35 (2015).   DOI
77 Rao, N., Surana, P.M, Ragesh, R., Srinivasa G.: Analysis of Joints for Tracking Fitness and Monitoring Progress in Physiotherapy. In: The Proceedings of IEEE International Conference on Signal and Image Processing Applications (IEEE ICSIPA 2019), Malaysia (2019)
78 Jaramillo, A. G., & Benalcazar, M. E.: Real-time hand gesture recognition with EMG using machine learning. In: 2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM). pp. 1-5 (2017)