DOI QR코드

DOI QR Code

머신러닝 분류기를 사용한 만성콩팥병 자동 진단 및 중증도 예측 연구

Automatic detection and severity prediction of chronic kidney disease using machine learning classifiers

  • Jihyun, Mun (Department of Linguistics, Seoul National University) ;
  • Sunhee, Kim (Department of French Language in Education, Seoul National University) ;
  • Myeong Ju, Kim (Center of Artificial Intelligence in Healthcare, Seoul National University) ;
  • Jiwon, Ryu (Department of Internal Medicine, Seoul National University Bundang Hospital) ;
  • Sejoong, Kim (Center of Artificial Intelligence in Healthcare, Seoul National University) ;
  • Minhwa, Chung (Department of Linguistics, Seoul National University)
  • 투고 : 2022.11.15
  • 심사 : 2022.12.06
  • 발행 : 2022.12.31

초록

본 논문은 만성콩팥병 환자의 음성을 사용하여 질병을 자동으로 진단하고 중증도를 예측하는 최적의 방법론을 제안한다. 만성콩팥병 환자는 호흡계 근력의 약화와 성대 부종 등으로 인해 음성이 변화하게 된다. 만성콩팥병 환자의 음성을 음성학적으로 분석한 선행 연구는 존재했으나, 환자의 음성을 분류하는 연구는 진행된 바가 없다. 본 논문에서는 모음연장발화, 유성음 문장 발화, 일반 문장 발화의 발화 목록과, 수제 특징 집합, eGeMAPS, CNN 추출 특징의 특징 집합, SVM, XGBoost의 머신러닝 분류기를 사용하여 만성콩팥병 환자의 음성을 분류하였다. 총 3시간 26분 25초 분량의 1,523개 발화가 실험에 사용되었다. 그 결과, 질병을 자동으로 진단하는 데에는 0.93, 중증도를 예측하는 3분류 문제에서는 0.89, 5분류 문제에서는 0.84의 F1-score가 나타났고, 모든 과제에서 일반 문장 발화, 수제 특징 집합, XGBoost의 조합을 사용했을 때 가장 높은 성능이 나타났다. 이는 만성콩팥병 음성 자동 분류에는 화자의 발화 특성을 모두 반영할 수 있는 일반 문장 발화와 거기로부터 추출한 적절한 특징 집합이 효과적임을 시사한다.

This paper proposes an optimal methodology for automatically diagnosing and predicting the severity of the chronic kidney disease (CKD) using patients' utterances. In patients with CKD, the voice changes due to the weakening of respiratory and laryngeal muscles and vocal fold edema. Previous studies have phonetically analyzed the voices of patients with CKD, but no studies have been conducted to classify the voices of patients. In this paper, the utterances of patients with CKD were classified using the variety of utterance types (sustained vowel, sentence, general sentence), the feature sets [handcrafted features, extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS), CNN extracted features], and the classifiers (SVM, XGBoost). Total of 1,523 utterances which are 3 hours, 26 minutes, and 25 seconds long, are used. F1-score of 0.93 for automatically diagnosing a disease, 0.89 for a 3-classes problem, and 0.84 for a 5-classes problem were achieved. The highest performance was obtained when the combination of general sentence utterances, handcrafted feature set, and XGBoost was used. The result suggests that a general sentence utterance that can reflect all speakers' speech characteristics and an appropriate feature set extracted from there are adequate for the automatic classification of CKD patients' utterances.

키워드

과제정보

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 대학ICT연구센터육성지원사업의 연구결과로 수행되었음(IITP-2022-2018-0-01833*). 이 논문의 일부는 분당서울대학교병원 연구비 (grant no 13-2022-0008) 지원에 의해 이루어짐.

참고문헌

  1. Abd El-gaber, F. M., Sallam, Y., & El Sayed, H. M. E. (2021). Acoustic characteristics of voice in patients with chronic kidney disease. International Journal of General Medicine, 14, 2465-2473. https://doi.org/10.2147/IJGM.S307684
  2. disease. International Journal of General Medicine, 14, 2465-2473. Ahn, H. K. (2000). The H1*-H2* measure. Speech Sciences, 7(2), 85-95.
  3. Benba, A., Jilbab, A., Hammouch, A., & Sandabad, S. (2015, March). Voiceprints analysis using MFCC and SVM for detecting patients  with Parkinson's disease. Proceedings of the 2015 International Conference on Electrical and Information Technologies (ICEIT) (pp. 300-304). Marrakech, Morocco.
  4. Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9), 341-345.
  5. Darouiche, M. S., El Moubtahij, H., Yakhlef, M. B., & Tazi, E. B. (2022, March). An automatic voice disorder detection system based on extreme gradient boosting classifier. Proceedings of the 2022 2nd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET) (pp. 1-5). Meknes, Morocco.
  6. Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., Andre, E., Busso, C., Devillers, L. Y., ... Truong, K. P. (2015). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190-202.
  7. Eyben, F., Wollmer, M., & Schuller, B. (2010, October). Opensmile: The munich versatile and fast open-source audio feature extractor. MM '10: Proceedings of the 18th ACM International Conference on Multimedia (pp. 1459-1462). Firenze, Italy.
  8. Harar, P., Galaz, Z., Alonso-Hernandez, J. B., Mekyska, J., Burget, R., & Smekal, Z. (2020). Towards robust voice pathology detection. Neural Computing and Applications, 32(20), 15747-15757. https://doi.org/10.1007/s00521-018-3464-7
  9. Hassan, E. S. (2014). Effect of chronic renal failure on voice: An acoustic and aerodynamic analysis. The Egyptian Journal of Otolaryngology, 30(1), 53-57. https://doi.org/10.4103/1012-5574.127207
  10. Hegde, S., Shetty, S., Rai, S., & Dodderi, T. (2019). A survey on machine learning approaches for automatic detection of voice disorders. Journal of Voice, 33(6), 947.E11-947.E33.
  11. Hershey, S., Chaudhuri, S., Ellis, D. P. W., Gemmeke, J. F., Jansen, A., Channing Moore, R., Plakal, M., ... Wilson, K. (2017, March). CNN architectures for large-scale audio classification. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 131-135). New Orleans, LA.
  12. Jung, S. Y., Ryu, J. H., Park, H. S., Chung, S. M., Ryu, D. R., & Kim, H. S. (2014). Voice change in end-stage renal disease patients after hemodialysis: Correlation of subjective hoarseness and objective acoustic parameters. Journal of Voice, 28(2), 226-230. https://doi.org/10.1016/j.jvoice.2013.07.009
  13. Kumar, R. B., & Bhat, J. S. (2010). Voice in chronic renal failure. Journal of Voice, 24(6), 690-693. https://doi.org/10.1016/j.jvoice.2009.03.001
  14. Lee, S. J., Cho, Y., Song, J. Y., Lee, D., Kim, Y., & Kim, H. (2015). Aging effect on Korean female voice: Acoustic and perceptual examinations of breathiness. Folia Phoniatrica et Logopaedica, 67(6), 300-307. https://doi.org/10.1159/000445290
  15. Liu, Y., Lee, T., Law, T., Lee, K., & Ching, P. C. (2018, November). Prediction of voice disorder severity: Contributions from sustained vowels and continuous speech. Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 290-294). Taipei, Taiwan.
  16. McFee, B., Raffel, C., Liang, L., Ellis, D. P. W., McVicar, M., Battenberg, E., & Nieto, O. (2015, July). Librosa: Audio and music signal analysis in Python. Proceedings of the 14th Python in Science Conference (pp. 18-25). Austin, TX.
  17. Moon, K. R., Chung, S. M., Park, H. S., & Kim, H. S. (2012). Materials of acoustic analysis: Sustained vowel versus sentence. Journal of Voice, 26(5), 563-565. https://doi.org/10.1016/j.jvoice.2011.09.007
  18. Mudawwar, W. A., Alam, E. S., Sarieddine, D. S., Turfe, Z. A., & Hamdan, A. H. (2017). Effect of renal failure on voice. ENT: Ear, Nose &and Throat Journal, 96, 32-36.
  19. Mun, J., Kim, S., Kim, M. J., Ryu, J., Kim, S., & Chung, M. (2022). A speech corpus for chronic kidney disease. arXiv. https://doi.org/10.48550/arXiv.2211.01705
  20. Narendra, N. P., & Alku, P. (2020). Glottal source information for pathological voice detection. IEEE Access, 8, 67745-67755. https://doi.org/10.1109/access.2020.2986171
  21. Omeroglu, A. N., Mohammed, H. M. A., & Oral, E. A. (2022). Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion. Engineering Science and Technology, an International Journal, 36, 101148.
  22. Shetty, S., Hegde, S., & Dodderi, T. (2018, February). Classification of healthy and pathological voices using MFCC and ANN. Proceedings of the 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC) (pp. 1-5). Bangalore, India.
  23. Speyer, R., Bogaardt, H. C. A., Passos, V. L., Roodenburg, N. P. H. D., Zumach, A., Heijnen, M. A. M., Baijens, L. W. J., ... Brunings, J. W. (2010). Maximum phonation time: Variability and reliability. Journal of Voice, 24(3), 281-284. https://doi.org/10.1016/j.jvoice.2008.10.004
  24. Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4), 687-719.
  25. Teixeira, J. P., Oliveira, C., & Lopes, C. (2013). Vocal acoustic analysis-jitter, shimmer and hnr parameters. Procedia Technology, 9, 1112-1122. https://doi.org/10.1016/j.protcy.2013.12.124
  26. Triantafyllopoulos, A., Fendler, M., Batliner, A., Gerczuk, M., Amiriparian, S., Berghaus, T. M., & Schuller, B. W. (2022, September). Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease. Proceedings of the Interspeech 2022 (pp. 3623-3627), Incheon, Korea.
  27. Webster, A. C., Nagler, E. V., Morton, R. L., & Masson, P. (2017). Chronic kidney disease. The Lancet, 389(10075), 1238-1252. https://doi.org/10.1016/S0140-6736(16)32064-5
  28. Yeo, E., Kim, S., & Chung, M. (2021). Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features. Phonetics and Speech Sciences, 13(2), 57-66. https://doi.org/10.13064/KSSS.2021.13.2.057
  29. Zaky, E. A., Mamdouh, H., Esmat, O., & Khalaf, Z. (2020). Voice problem in a patient with chronic renal failure. The Egyptian Journal of Otolaryngology, 36(1), 1-8. https://doi.org/10.1186/s43163-020-00001-9