Browse > Article
http://dx.doi.org/10.14372/IEMEK.2021.16.3.89

Development of a Work Management System Based on Speech and Speaker Recognition  

Gaybulayev, Abdulaziz (Kumoh National Institute of Technology)
Yunusov, Jahongir (Kumoh National Institute of Technology)
Kim, Tae-Hyong (Kumoh National Institute of Technology)
Publication Information
Abstract
Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.
Keywords
Work management system; Speech recognition; Speaker identification; Smart factory; Voice assistance;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Google, Cloud Speech-to-Text, see https://cloud.google.com/speech-to-text
2 Herve Bourlard, Nelson Morgan, Connectionist Speech Recognition: A Hybrid Approach, The Kluwer International Series in Engineering and Computer Science; v. 247, Kluwer Academic Publishers, 1994.
3 R. Parente, N. Kock, John Sonsini, "An Analysis of the Implementation and Impact of Speech-recognition Technology in the Healthcare Sector." Perspectives in health information management Vol. 1, 2004.
4 Norberto Pires, J. (2005), "Robot by Voice: Experiments on Commanding an Industrial Robot Using the Human Voice", Industrial Robot, Vol. 32 No. 6, pp. 505-511.   DOI
5 Adam Rogowski, Industrially oriented voice control system, Robotics and Computer-Integrated Manufacturing, Elsevier. Vol. 28, Issue 3, June 2012, pp. 303-315.   DOI
6 Ohneiser, Oliver; Jauer, Malte; Rein, Jonathan R.; Wallace, Matt. 2018. "Faster Command Input Using the Multimodal Controller Working Position "TriControl"" Aerospace 5, No. 2: 54.   DOI
7 Anwer, Saba; Waris, Asim; Sultan, Hajrah; Butt, Shahid I.; Zafar, Muhammad H.; Sarwar, Moaz; Niazi, Imran K.; Shafique, Muhammad; Pujari, Amit N. 2020. "Eye and Voice-Controlled Human Machine Interface System for Wheelchairs Using Image Gradient Approach" Sensors 20, No. 19: 5510.   DOI
8 Kulyukin, V. Human-Robot Interaction Through Gesture-Free Spoken Dialogue. Autonomous Robots 16, pp. 239-257 (2004).   DOI
9 K. Zinchenko, C. Wu, K. Song, "A Study on Speech Recognition Control for a Surgical Robot," in IEEE Transactions on Industrial Informatics, Vol. 13, No. 2, pp. 607-615, April 2017.   DOI
10 Ismail, Ahmed; Abdlerazek, Samir; El-Henawy, Ibrahim M. 2020. "Development of Smart Healthcare System Based on Speech Recognition Using Support Vector Machine and Dynamic Time Warping" Sustainability 12, No. 6: 2403.   DOI
11 C. Shayamunda, T. D. Ramotsoela, G. P. Hancke, "Biometric Authentication System for Industrial Applications using Speaker Recognition," IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 2020, pp. 4459-4464.
12 Kaczmarek, Wojciech; Panasiuk, Jaroslaw; Borys, Szymon; Banach, Patryk. 2020. "Industrial Robot Control by Means of Gestures and Voice Commands in Off-Line and On-Line Mode" Sensors 20, No. 21: 6358.   DOI
13 Microsoft, Azure Speech to Text, see https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
14 Ye-Ji Kim, Yong-Seong Moon, Seong-Hun Jeong, Dae-Han Jeong, Tae-Hyong Kim, "Voice Recognition and Control System Based on Deep Learning for Smart Lighting", KSC2017, Korea Information Science Society, 2017.12.
15 E. K. Wang, X. Liu, C. -M. Chen, S. Kumari, M. Shojafar, M. S. Hossain, "Voice-Transfer Attacking on Industrial Voice Control Systems in 5G-Aided IIoT Domain," in IEEE Transactions on Industrial Informatics, 2020. doi: 10.1109/TII. 2020.3023677.   DOI
16 Mozilla, Project DeepSpeech, see https://github.com/mozilla /DeepSpeech, 2016.
17 Mirco Ravanelli, Yoshua Bengio, "Speaker Recognition from Raw Waveform with SincNet", arXiv:1808.00158,2018.
18 Awni Y. Hannun, Carl Case, J. Casper, Bryan Catanzaro, G. Diamos, Erich Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, A. Ng, "Deep Speech: Scaling up end-to-end speech recognition", arXiv:1412.5567, 2014.
19 Wit.ai, Inc, wit.ai: Build Natural Language Experiences, see https://wit.ai/
20 DeepSpeech, "DeepSpeech Model", https://deepspeech.readthedocs.io/en/v0.9.3/DeepSpeech.html
21 Mark Heath, NAudio, see https://github.com/naudio/NAudio
22 Yong-Seong Moon, Ye-Ji Kim, Seong-Hun Jeong, Yu-Hee Kim, Chang-Yeol Lee, Tae-Hyong Kim, "Dialog Management for Voice Recognition based Light Control", KCC2018, Korea Information Science Society, 2018.06.
23 librosa, A python package for music and audio analysis, see https://github.com/librosa/librosa
24 kenlm, KenLM: Faster and Smaller Language Model Queries, see https://github.com/kpu/kenlm.
25 Wikipedia, Levenshtein distance, see https://en.wikipedia.org /wiki/Levenshtein_distance