DOI QR코드

DOI QR Code

Effective speech recognition system for patients with Parkinson's disease

파킨슨병 환자에 대한 효과적인 음성인식 시스템

  • Huiyong, Bak (Department of Electrical and Computer Engineering, Inha University) ;
  • Ryul, Kim (Department of Neurology, Inha University Hospital, Inha University College of Medicine) ;
  • Sangmin, Lee (Department of Electrical and Computer Engineering, Inha University)
  • 박희용 (인하대학교 전기컴퓨터공학과) ;
  • 김률 (인하대병원 신경과) ;
  • 이상민 (인하대학교 전기컴퓨터공학과)
  • Received : 2022.08.30
  • Accepted : 2022.11.09
  • Published : 2022.11.30

Abstract

Since speech impairment is prevalent in patients with Parkinson's disease (PD), speech recognition systems suitable for these patients are needed. In this paper, we propose a speech recognition system that effectively recognizes the speech of patients with PD. The speech recognition system is firstly pre-trained with the Globalformer using the speech data from healthy people, and then fine-tuned using relatively small amount of speech data from the patient with PD. For this analysis, we used the speech dataset of healthy people built by AI hub and that of patients with PD collected at Inha University Hospital. As a result of the experiment, the proposed speech recognition system recognized the speech of patients with PD with Character Error Rate (CER) of 22.15 %, which was a better result compared to other methods.

파킨슨병 환자에게는 언어 장애가 만연하기 때문에 이러한 환자에게 적합한 음성인식 시스템이 필요하다. 본 논문에서는 파킨슨병 환자의 음성을 효과적으로 인식하는 음성인식 시스템을 제안한다. 음성인식 시스템은 먼저 건강한 사람의 음성 데이터를 사용하여 Globalformer를 사전 학습한 다음 상대적으로 매우 작은 양의 파킨슨병 환자의 음성 데이터를 사용하여 Globalformer를 미세 조정한다. 실험에는 AI 허브에서 구축한 건강한 사람의 음성 데이터셋과 인하대병원에서 수집한 파킨슨병 환자의 음성 데이터셋이 사용되어졌다. 실험 결과 제안된 음성인식 시스템은 22.15 %의 Character Error Rate(CER)으로 파킨슨병 환자의 음성을 인식하였으며, 다른 방법에 비해 우수한 인식률을 보였다.

Keywords

Acknowledgement

이 연구는 한국연구재단 기초연구지원사업의 지원으로 수행됨(과제번호, NRF-2020R1A2C2004624와 NRF-2021R1C1C1011822).

References

  1. A. K. Ho, R. Iansek, C. Marigliani, J. L. Bradshaw, and S. Gates, "Speech impairment in a large sample of patients with Parkinson's disease," Behav Neurol. 11, 131-137 (1999). https://doi.org/10.1155/1999/327643
  2. A. Kain, X. Niu, J.-P. Hosom, Q. Miao, and J. van Santen, "Formant re-synthesis of dysarthric speech," Proc. ISCA Workshop on SSW5, 25-30 (2004).
  3. L. Moro-Velazquez, J. Cho, S. Watanabe, M. A. Hasegawa-Johnson, O. Scharenborg, H. Kim, and N. Dehak, "Study of the performance of automatic speech recognition systems in speakers with Parkinson's disease," Proc. 20th Interspeech, 3875-3879 (2019).
  4. Q. Yu, Y. Ma, and Y. Li, "Enhancing speech recognition for parkinson's disease patient using transfer learning technique," J. Shanghai Jiaotong Univ. (Science), 27, 90-98 (2022). https://doi.org/10.1007/s12204-021-2376-3
  5. S. O. Caballero-Morales and F. Trujillo-Romero, "Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition," Expert Syst. Appl. 41, 841-852 (2014). https://doi.org/10.1016/j.eswa.2013.08.014
  6. A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. Advances in NIPS, 1-11 (2017).
  7. L. Dong, S. Xu, and B. Xu, "Speech transformer: a norecurrence sequence-to sequence model for speech recognition," Proc. IEEE ICASSP, 5884-5888 (2018).
  8. J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," Proc. the IEEE conf. CVPR, 7132-7141 (2018).
  9. W. Han, Z. Zhang, Y. Zhang, J. Yu, C. C. Chiu, J. Qin, A. Gulati, R. Pang, and Y. Wu, "ContextNet: Improving convolutional neural networks for automatic speech recognition with global context," Proc. Interspeech, 3610-3614 (2020).
  10. J. W. Ha, K. Nam, J. Kang, S. W. Lee, S. Yang, H. Jung, E. Kim, H. Kim, S. Kim, H. A. Kim, K. Doh, C. K. Lee, N. K. Sung, and S. Kim, "ClovaCall: Korean goal-oriented dialog speech corpus for automatic speech recognition of contact centers," Proc. Interspeech, 409-413 (2020).
  11. J.-U. Bang, S. Yun, S. H. Kim, M. Y. Choi, M. K. Lee, Y. J. Kim, D. H. Kim, J. Park, Y. J. Lee, and S. H. Kim, "KsponSpeech: Korean spontaneous speech corpus for automatic speech recognition," Appl. Sci. 10, 6936 (2020).
  12. A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, and R. Pang, "Conformer: Convolution-augmented transformer for speech recognition," Proc. Interspeech, 5036-5040 (2020).