Browse > Article
http://dx.doi.org/10.22469/jkslp.2022.33.3.142

Artificial Intelligence for Clinical Research in Voice Disease  

Jungirl, Seok (Department of Otorhinolaryngology-Head and Neck Surgery, National Cancer Center)
Tack-Kyun, Kwon (Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University College of Medicine)
Publication Information
Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics / v.33, no.3, 2022 , pp. 142-155 More about this Journal
Abstract
Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.
Keywords
Voice; Artificial Intelligence; Machine learning; Deep learning; Supervised machine learning; Unsupervised machine learning.;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 Schwartz SR, Cohen SM, Dailey SH, Rosenfeld RM, Deutsch ES, Gillespie MB, et al. Clinical practice guideline: Hoarseness (dysphonia). Otolaryngol Head Neck Surg 2009;141(1_suppl):1-31.
2 Pyo HY, Song Y. Recent trends in evaluation and diagnosis of voice disorders: A literature review. Commun Sci Disord 2010;15(4):506-25.
3 Kim GH, Kwon SB. Auditory-perceptual and acoustic assessment in measuring dysphonia severity of vocal fold nodules. Journal of the Korea Contents Association 2018;18(1):108-16.   DOI
4 Pruszewicz A, Obrebowski A, Swidzinski P, Demenko G, Wika T, Wojciechowska A. Usefulness of acoustic studies on the differential diagnostics of organic and functional dysphonia. Acta Otolaryngol 1991;111(2):414-9.   DOI
5 Hirano M, Hibi S, Yoshida T, Hirade Y, Kasuya H, Kikuchi Y. Acoustic analysis of pathological voice. Some results of clinical application. Acta Otolaryngol 1988;105(5-6):432-8.   DOI
6 Kim H, Jeon J, Han YJ, Joo Y, Lee J, Lee S, et al. Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med 2020;9(11):3415.
7 Singh S, Xu W. Robust detection of Parkinsons disease using harvested smartphone voice data: A telemedicine approach. Telemed J E Health 2020;26(3):327-34.   DOI
8 Duffy JR, Werven GW, Aronson AE. Telemedicine and the diagnosis of speech and language disorders. Mayo Clin Proc 1997;72(12):1116-22.   DOI
9 Hemmerling D, Wojcik-Pedziwiatr M. Prediction and estimation of Parkinsons disease severity based on voice signal. J Voice 2022;36(3): 439.e9-20.
10 Wu H, Soraghan J, Lowit A, Di Caterina G. A deep learning method for pathological voice detection using convolutional deep belief networks. Proceedings of the Interspeech 2018; 2018 Sep 2-6; Hyderabad, India: Interspeech;2018.
11 Mittal V, Sharma R. Deep learning approach for voice pathology detection and classification. Int J Healthc Inf Syst Inform 2021;16(4):1-30.
12 Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of go without human knowledge. Nature 2017;550(7676):354-9.   DOI
13 Park HJ. Trend analysis of korea papers in the fields of 'artificial intelligence', 'machine learning' and 'deep learning'. J Korea Inst Inf Commun Eng 2020;13(4):283-92.
14 Park SH. Artificial intelligence in medicine: Beginner's guide. J Korean Soc Radiol 2018;78(5):301-8.   DOI
15 Jakhar D, Kaur I. Artificial intelligence, machine learning and deep learning: Definitions and differences. Clin Exp Dermatol 2020;45(1): 131-2.   DOI
16 Hu HC, Chang SY, Wang CH, Li KJ, Cho HY, Chen YT, et al. Deep learning application for vocal fold disease prediction through voice recognition: Preliminary development study. J Med Internet Res 2021; 23(6):e25247.
17 Zhan A, Mohan S, Tarolli C, Schneider RB, Adams JL, Sharma S, et al. Using smartphones and machine learning to quantify Parkinson disease severity: The mobile Parkinson disease score. JAMA Neurol 2018;75(7):876-80.   DOI
18 Morales MR. Multimodal depression detection: An investigation of features and fusion techniques for automated systems. New York: City University of New York;2018.
19 Rusz J, Tykalova T, Novotny M, Zogala D, Sonka K, Ruzicka E, et al. Defining speech subtypes in de novo Parkinson disease: Response to long-term levodopa therapy. Neurology 2021;97(21):e2124-35.   DOI
20 Ozkanca YS, Demiroglu C, Besirli A, Celik S. Multi-lingual depression-level assessment from conversational speech using acoustic and text features. Proceedings of the Interspeech 2018; 2018 Sep 2-6; Hyderabad, India: Interspeech;2018.
21 McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 1943;5(4):115-33.   DOI
22 LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput 1989;1(4):541-51.   DOI
23 Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput 2006;18(7):1527-54.   DOI
24 Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929-58.
25 Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, et al. Recent advances in convolutional neural networks. Pattern Recognit 2018;77: 354-77.   DOI
26 Mikolov T, Karafiat M, Burget L, Cernocky J, Khudanpur S. Recurrent neural network based language model. Proceedings of the 11th Annual Conference of the International Speech Communication Association 2010 (Interspeech 2010); 2010 Sep 26-30; Chiba, Japan: ISCA;2010. p.1045-8.
27 Zabidi A, Yassin I, Hassan H, Ismail N, Hamzah M, Rizman Z, et al. Detection of asphyxia in infants using deep learning convolutional neural network (CNN) trained on Mel frequency cepstrum coefficient (MFCC) features extracted from cry sounds. J Fundam Appl Sci 2017;9(3S):768-78.   DOI
28 Fujimura S, Kojima T, Okanoue Y, Shoji K, Inoue M, Omori K, et al. Classification of voice disorders using a one-dimensional convolutional neural network. J Voice 2022;36(1):15-20.   DOI
29 Syed SA, Rashid M, Hussain S, Zahid H. Comparative analysis of CNN and RNN for voice pathology detection. Biomed Res Int 2021; 2021:6635964.
30 Hung CH, Wang SS, Wang CT, Fang SH. Using SincNet for learning pathological voice disorders. Sensors (Basel) 2022;22(17):6634.
31 Woldert-Jokisz B. Saarbruecken voice database. Saarbruecken: Institute for Phonetics, Saarland University;2007.
32 Elemetrics K. Kay elemetrics corp. Disordered voice database. Model 4337 (Ver. 1.03). Boston, MA: Kay Elemetrics Corp.;1994.
33 Mesallam TA, Farahat M, Malki KH, Alsulaiman M, Ali Z, Al-Nasheri A, et al. Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthc Eng 2017;2017:8783751.
34 Syed SA, Rashid M, Hussain S. Meta-analysis of voice disorders databases and applied machine learning techniques. Math Biosci Eng 2020;17(6):7958-79.   DOI
35 Fang SH, Tsao Y, Hsiao MJ, Chen JY, Lai YH, Lin FC, et al. Detection of pathological voice using cepstrum vectors: A deep learning approach. J Voice 2019;33(5):634-41.   DOI
36 Lee JH, Lee CY, Eom JS, Pak M, Jeong HS, Son HY. Predictions for three-month postoperative vocal recovery after thyroid surgery from spectrograms with deep neural network. Sensors (Basel) 2022;22(17):6387.
37 Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17(1):195.
38 Mulfari D, Meoni G, Marini M, Fanucci L. Machine learning assistive application for users with speech disorders. Appl Soft Comput 2021;103:107147.
39 Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. Lancet Digit Healt 2019;1(6):e271-97.   DOI
40 Vieira FG, Venugopalan S, Premasiri AS, McNally M, Jansen A, McCloskey K, et al. A machine-learning based objective measure for ALS disease severity. NPJ Digit Med 2022;5(1):45.
41 Mulfari D, La Placa D, Rovito C, Celesti A, Villari M. Deep learning applications in telerehabilitation speech therapy scenarios. Comput Biol Med 2022;148:105864.
42 Suppakitjanusant P, Sungkanuparph S, Wongsinin T, Virapongsiri S, Kasemkosin N, Chailurkit L, et al. Identifying individuals with recent COVID-19 through voice classification using deep learning. Sci Rep 2021;11(1):19149.