Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features

음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류

  • Yeo, Eun Jung (Department of Linguistics, Seoul National University) ;
  • Kim, Sunhee (Department of French Language Education, Seoul National University) ;
  • Chung, Minhwa (Department of Linguistics, Seoul National University)
  • Received : 2021.06.01
  • Accepted : 2021.06.14
  • Published : 2021.06.30


This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

본 논문은 말 명료도 기준의 마비말장애 중증도 자동 분류 문제에 초점을 둔다. 말 명료도는 호흡, 발성, 공명, 조음, 운율 등 다양한 말 기능 특징의 영향을 받는다. 그러나 대부분의 선행연구는 한 개의 말 기능 특징만을 중증도 자동분류에 사용하였다. 본 논문에서는 음성의 장애 특성을 효과적으로 포착하기 위해 마비말장애 중증도 자동 분류에서 음질, 운율, 발음의 다양한 말 기능 특징을 반영하고자 하였다. 음질은 jitter, shimmer, HNR, voice breaks 개수, voice breaks 정도로 구성된다. 운율은 발화 속도(전체 길이, 말 길이, 말 속도, 조음 속도), 음높이(F0 평균, 표준편차, 최솟값, 최댓값, 중간값, 25 사분위값, 75 사분위값), 그리고 리듬(% V, deltas, Varcos, rPVIs, nPVIs)을 포함한다. 발음에는 음소 정확도(자음 정확도, 모음 정확도, 전체 음소 정확도)와 모음 왜곡도[VSA(vowel space area), FCR (formant centralized ratio), VAI(vowel articulatory index), F2 비율]가 있다. 본 논문에서는 다양한 특징 조합을 사용하여 중증도 자동 분류를 시행하였다. 실험 결과, 음질, 운율, 발음 특징 세 가지 말 기능 특징 모두를 분류에 사용했을 때 F1-score 80.15%로 가장 높은 성능이 나타났다. 이는 마비말장애 중증도 자동 분류에는 음질, 운율, 발음 특징이 모두 함께 고려되어야 함을 시사한다.



본 연구는 문화체육관광부 및 한국콘텐츠진흥원의 연구개발지원사업으로 수행되었음(과제번호: R2019080018).


  1. Bhat, C., & Strik, H. (2020). Automatic assessment of sentence-level dysarthria intelligibility using BLSTM. IEEE Journal of Selected Topics in Signal Processing, 14(2), 322-330.
  2. Boersma, P., & Weenink, D. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341-345.
  3. Choi, D. L., Kim, B. W., Kim, Y. W., Lee, Y. J., Um, Y., & Chung, M. (2012, May). Dysarthric speech database for development of QoLT software technology. Proceedings of the 8th International Conference on Language Resources and Evaluation (pp. 3378-3381). Istanbul, Turkey.
  4. Clarke, W. M., & Hoops, H. R. (1980). Predictive measures of speech proficiency in cerebral palsied speakers. Journal of Communi- cation Disorders, 13(5), 385-394.
  5. Darley, F. L., Aronson, A. E., & Brown, J. R. (1969). Differential diagnostic patterns of dysarthria. Journal of Speech and Hearing Research, 12(2), 246-269.
  6. Dellwo, V., & Wagner, P. (2003, August). Relationships between speech rate and rhythm. Proceedings of the 15th International Congress of the Phonetic Sciences. Barcelona, Spain.
  7. Hernandez, A., Kim, S., & Chung, M. (2020). Prosody-based measures for automatic severity assessment of dysarthric speech. Applied Sciences, 10(19), 6999.
  8. Hernandez, A., Yeo, E. J., Kim, S., & Chung, M. (2020). Dysarthria detection and severity assessment using rhythm-based metrics. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) (pp. 2897-2901). Shanghai, China.
  9. Hong, S., & Byeon, H. (2014). Speech rate and pause characteristics in speaker with flaccid dysarthria. The Korea Academia-Industrial Cooperation Society, 15(1), 2930-2936.
  10. Hong, S. M., Jeong, P. Y., & Sim, H. S. (2018). Comparison of perceptual assessment for dysarthric speech: The detailed and general assessments. Communication Sciences & Disorders, 23(1), 242-253.
  11. Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1-15.
  12. Janbakhshi, P., Kodrasi, I., & Bourlard, H. (2019, May). Pathological speech intelligibility assessment based on the short-time objective intelligibility measure. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton, UK.
  13. Kadi, K. L., Selouani, S. A., Boudraa, B., & Boudraa, M. (2013, October). Discriminative prosodic features to assess the dysarthria severity levels. Proceedings of the World Congress on Engi- neering. London, UK.
  14. Kang, Y. A., Yoon, K. C., Lee, H. S., & Seong, C. J. (2010). A comparison of parameters of acoustic vowel space in patients with Parkinson's disease. Phonetics and Speech Sciences, 2(4), 185-192.
  15. Kim, M. J., & Kim, H. (2012, September). Combination of multiple speech dimensions for automatic assessment of dysarthric speech intelligibility. Proceedings of the 13th Annual Conference of the International Speech Communication Association (INTERSPEECH). Portland, OR.
  16. Kim, M. J., Kim, Y., & Kim, H. (2015). Automatic intelligibility assessment of dysarthric speech using phonologically-structured sparse linear model. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(4), 694-704.
  17. Kim, S., Kim, J. H., & Ko, D. H. (2014). Characteristics of vowel space and speech intelligibility in patients with spastic dysarthria. Communication Sciences & Disorders, 19(3), 352-360.
  18. Lansford, K. L., & Liss, J. M. (2014). Vowel acoustics in dysarthria: Speech disorder diagnosis and classification. Journal of Speech, Language, and Hearing Research, 57(1), 57-67.
  19. Lee, Y. M., Sung, J. E., Sim, H. S., Han, J. H., & Song, H. N. (2012). Analysis of articulation error patterns depending on the level of speech intelligibility in adults with dysarthria. The Korean Academy of Speech-Language Pathology and Audiology, 17(1), 130-142.
  20. Lee, E., & Kim, J. (2012). Correlation of speech rate changes on intelligibility and acceptability in dysarthric speakers. Journal of Speech-language & Hearing Disorders, 21(3), 127-144.
  21. Mairano, P., & Romano, A. (2010). Un confronto tra diverse metriche ritmiche usando Correlatore. In S. Schmid, M. Schwarzenbach, & D. Studer (Eds.), La dimensione temporale del parlato (pp. 79-100). Torriana, Italy: EDK.
  22. McFee, B., Colin, R., Dawen, L., Ellis, D. P. W., McVicar, M., Battenberg, E., & Nieto, O. (2015, July). Librosa: Audio and music signal analysis in Python. Proceedings of the 14th Python in Science Conference (pp. 18-25). Austin, TX.
  23. Narendra, N. P., & Alku, P. (2018, September). Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH 2020) (pp. 3403-3407). Hyderabad, India.
  24. Narendra, N. P., & Alku, P. (2021). Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Computer Speech & Language, 65, 101117.
  25. Seo, I., & Seong, C. (2013). Voice quality of dysarthric speakers in connected speech. Phonetics and Speech Sciences, 5(4), 33-41.
  26. Seo I. H. (2014). Acoustic measures of voice quality and phonation types across speech conditions in dysarthria (Doctoral dissertation). Chungnam National University, Daejeon, Korea.
  27. Whitehill, T. L., & Ciocca, V. (2000). Speech errors in Cantonese speaking adults with cerebral palsy. Clinical Linguistics & Phonetics, 14(2), 111-130.