Browse > Article
http://dx.doi.org/10.7776/ASK.2022.41.6.655

Effective speech recognition system for patients with Parkinson's disease  

Huiyong, Bak (Department of Electrical and Computer Engineering, Inha University)
Ryul, Kim (Department of Neurology, Inha University Hospital, Inha University College of Medicine)
Sangmin, Lee (Department of Electrical and Computer Engineering, Inha University)
Abstract
Since speech impairment is prevalent in patients with Parkinson's disease (PD), speech recognition systems suitable for these patients are needed. In this paper, we propose a speech recognition system that effectively recognizes the speech of patients with PD. The speech recognition system is firstly pre-trained with the Globalformer using the speech data from healthy people, and then fine-tuned using relatively small amount of speech data from the patient with PD. For this analysis, we used the speech dataset of healthy people built by AI hub and that of patients with PD collected at Inha University Hospital. As a result of the experiment, the proposed speech recognition system recognized the speech of patients with PD with Character Error Rate (CER) of 22.15 %, which was a better result compared to other methods.
Keywords
Speech recognition; Parkinson's disease; Globalformer; Signal processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. K. Ho, R. Iansek, C. Marigliani, J. L. Bradshaw, and S. Gates, "Speech impairment in a large sample of patients with Parkinson's disease," Behav Neurol. 11, 131-137 (1999).   DOI
2 A. Kain, X. Niu, J.-P. Hosom, Q. Miao, and J. van Santen, "Formant re-synthesis of dysarthric speech," Proc. ISCA Workshop on SSW5, 25-30 (2004).
3 L. Moro-Velazquez, J. Cho, S. Watanabe, M. A. Hasegawa-Johnson, O. Scharenborg, H. Kim, and N. Dehak, "Study of the performance of automatic speech recognition systems in speakers with Parkinson's disease," Proc. 20th Interspeech, 3875-3879 (2019).
4 Q. Yu, Y. Ma, and Y. Li, "Enhancing speech recognition for parkinson's disease patient using transfer learning technique," J. Shanghai Jiaotong Univ. (Science), 27, 90-98 (2022).   DOI
5 S. O. Caballero-Morales and F. Trujillo-Romero, "Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition," Expert Syst. Appl. 41, 841-852 (2014).   DOI
6 A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. Advances in NIPS, 1-11 (2017).
7 L. Dong, S. Xu, and B. Xu, "Speech transformer: a norecurrence sequence-to sequence model for speech recognition," Proc. IEEE ICASSP, 5884-5888 (2018).
8 J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," Proc. the IEEE conf. CVPR, 7132-7141 (2018).
9 W. Han, Z. Zhang, Y. Zhang, J. Yu, C. C. Chiu, J. Qin, A. Gulati, R. Pang, and Y. Wu, "ContextNet: Improving convolutional neural networks for automatic speech recognition with global context," Proc. Interspeech, 3610-3614 (2020).
10 J. W. Ha, K. Nam, J. Kang, S. W. Lee, S. Yang, H. Jung, E. Kim, H. Kim, S. Kim, H. A. Kim, K. Doh, C. K. Lee, N. K. Sung, and S. Kim, "ClovaCall: Korean goal-oriented dialog speech corpus for automatic speech recognition of contact centers," Proc. Interspeech, 409-413 (2020).
11 J.-U. Bang, S. Yun, S. H. Kim, M. Y. Choi, M. K. Lee, Y. J. Kim, D. H. Kim, J. Park, Y. J. Lee, and S. H. Kim, "KsponSpeech: Korean spontaneous speech corpus for automatic speech recognition," Appl. Sci. 10, 6936 (2020).
12 A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, "Conformer: Convolution-augmented transformer for speech recognition," Proc. Interspeech, 5036-5040 (2020).