Browse > Article
http://dx.doi.org/10.13064/KSSS.2022.14.4.035

Performance comparison on vocal cords disordered voice discrimination via machine learning methods  

Cheolwoo Jo (School of Electrical, Electronics and Control Engineering, Changwon National University)
Soo-Geun Wang (Department of Otolaryngology, Pusan National University Hospital)
Ickhwan Kwon (Department of Applied IT and Engineering, Pusan National University Hospital)
Publication Information
Phonetics and Speech Sciences / v.14, no.4, 2022 , pp. 35-43 More about this Journal
Abstract
This paper studies how to improve the identification rate of laryngeal disability speech data by convolutional neural network (CNN) and machine learning ensemble learning methods. In general, the number of laryngeal dysfunction speech data is small, so even if identifiers are constructed by statistical methods, the phenomenon caused by overfitting depending on the training method can lead to a decrease the identification rate when exposed to external data. In this work, we try to combine results derived from CNN models and machine learning models with various accuracy in a multi-voting manner to ensure improved classification efficiency compared to the original trained models. The Pusan National University Hospital (PNUH) dataset was used to train and validate algorithms. The dataset contains normal voice and voice data of benign and malignant tumors. In the experiment, an attempt was made to distinguish between normal and benign tumors and malignant tumors. As a result of the experiment, the random forest method was found to be the best ensemble method and showed an identification rate of 85%.
Keywords
diagnosis; glottic cancer; vocal cords disorder; machine learning; convolutional neural network (CNN);
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Aicha, A. B. (2018). Noninvasive detection of potentially precancerous lesions of vocal fold based on glottal wave signal and SVM approaches. Procedia Computer Science, 126, 586-595.   DOI
2 Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T. A., Farahat, M., Malki, K. H., ... Bencherif, M. A. (2017). An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31(1), 113.e9-113.e18.
3 Bezdek, J. C., Keller, J., Krisnapuram, R., Pal, N. R. (2005). Fuzzy models and algorithms for pattern recognition and image processing. (pp. 442-490). New York, NY: Springer.
4 Fang, S. H., Tsao, Y., Hsiao, M. J., Chen, J. Y., Lai, Y. H., Lin, F. C., & Wang, C. T. (2019). Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice, 33(5), 634-641.   DOI
5 Hegde, S., Shetty, S., Rai, S., & Dodderi, T. (2019). A survey on machine learning approaches for automatic detection of voice disorders. Journal of Voice, 33(6), 947.e11-947.e33.
6 Jeon, B. U., Kang, J. S., & Chung, K. (2021). AutoLM and CNN-based soft-voting ensemble classification model for road traffic emerging risk detection. Journal of Convergence for Information Technology, 11(7), 14-20.
7 Jo, C., Kim, K., Kim, D., & Wang, S. (2001, September). Screening of pathological voice from ARS using neural networks. Proceedings of the Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) 2nd International Workshop (pp. 241-245).
8 Florence, Italy. Jung, H., Choi, M. K., Kim, J., Kwon, S., & Jung, W. (2020). CNN-based weighted ensemble technique for ImageNet classification. IEMEK Journal of Embedded Systems and Applications, 15(4), 197-204.   DOI
9 Kim, H. B., Jeon, J., Han, Y. J., Joo, Y. H., Lee, J., Lee, S., & Im, S. (2020). Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. Journal of Clinical Medicine, 9(11), 3415.
10 Ko, H., Ha, H., Cho, H., Seo, K., & Lee, J. (2019, May). Pneumonia detection with weighted voting ensemble of CNN models. Proceedings of the 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 306-310). Chengdu, China.
11 Lee, J. Y. (2021). Experimental evaluation of deep learning methods for an intelligent pathological voice detection system using the Saarbruecken voice database. Applied Sciences, 11(15), 7149.   DOI
12 Librosa. (2021). Librosa: Audio and music processing in Python. Retrieved from http://librosa.org/
13 Liu, F., Liu, Y., & Sang, H. (2020). Multi-classifier decision-level fusion classification of workpiece surface defects based on a convolutional neural network. Symmetry, 12(5), 867.   DOI
14 Lv, X., Ming, D., Lu, T., Zhou, K., Wang, M., & Bao, H. (2018). A new method for region-based majority voting CNNs for very high resolution image classification. Remote Sensing, 10(12), 1946.   DOI
15 Massachusetts Eye and Ear Infirmary. (1994). Voice disorders database, version.1.03 (CD-ROM). Lincoln Park, NJ: Kay Elemetrics.
16 Morvant, E., Habrard, A., & Ayache, S. (2014, August). Majority vote of diverse classifiers for late fusion. Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (p. 20). Joensuu, Finland.
17 Saldanha, J. C., Ananthakrishna, T., & Pinto, R. (2014). Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. Journal of Medical Imaging and Health Informatics, 4(2), 168-173.   DOI
18 Roy, S., Sayim, M. I., & Akhand, M. A. H. (2019, May). Pathological voice classification using deep learning. Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). Dhaka, Bangladesh.
19 Ruta, D., & Gabrys, B. (2000). An overview of classifier fusion methods. Computing and Information Systems, 7(1), 1-10.
20 Saarbruecken Voice Database. (2020). Saarbruecken Voice Database. Retrieved from http://www.stimmdatenbank.coli.uni-saarland.de/
21 Scikit learn. (2022). Ensemble methods. Retrieved from https://scikit-learn.org/stable/modules/ensemble.html
22 Su, Y., Zhang, K., Wang, J., & Madani, K. (2019). Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors, 19(7), 1733.   DOI
23 Szmurlo, R., & Osowski, S. (2021, September). Deep CNN ensemble for recognition of face images. Proceedings of the 2021 22nd International Conference on Computational Problems of Electrical Engineering (CPEE) (pp. 1-4). Hradek u Susice, Czech Republic.
24 Tensorflow. (2021). Retrieved from http://www.tensorflow.org/
25 Wu, H., Soraghan, J., Lowit, A., & Di Caterina, G. (2018, July). Convolutional neural networks for pathological voice detection. Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 1-4). Honolulu, HI.