[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9708/jksci.2021.26.10.037

Performance Comparison of Korean Dialect Classification Models Based on Acoustic Features

Kim, Young Kook (Dept. of Software, Soongsil University)
Kim, Myung Ho (Dept. of Software, Soongsil University)

Publication Information

Journal of the Korea Society of Computer and Information / v.26, no.10, 2021 , pp. 37-43 More about this Journal

Abstract

Using the acoustic features of speech, important social and linguistic information about the speaker can be obtained, and one of the key features is the dialect. A speaker's use of a dialect is a major barrier to interaction with a computer. Dialects can be distinguished at various levels such as phonemes, syllables, words, phrases, and sentences, but it is difficult to distinguish dialects by identifying them one by one. Therefore, in this paper, we propose a lightweight Korean dialect classification model using only MFCC among the features of speech data. We study the optimal method to utilize MFCC features through Korean conversational voice data, and compare the classification performance of five Korean dialects in Gyeonggi/Seoul, Gangwon, Chungcheong, Jeolla, and Gyeongsang in eight machine learning and deep learning classification models. The performance of most classification models was improved by normalizing the MFCC, and the accuracy was improved by 1.07% and F1-score by 2.04% compared to the best performance of the classification model before normalizing the MFCC.

Keywords

Machine Learning; Deep Learning; MFCC; Dialect Classification; Speech Analysis;

Citations & Related Records

Reference

1	Thomas Purnell, William Idsardi, and John Baugh. "Perceptual and phonetic experiments on american english dialect identification", Journal of language and social psychology, 18(1):10-30, 1999. DOI
2	Li, Ming, et al. "Spoken language identification using score vector modeling and support vector machine." Eighth Annual Conference of the International Speech Communication Association. pp. 350-353, 2007.
3	S. S. Jo and Y. G. Kim, "AI (Artificial Intelligence) Voice Assistant Evolving to Platform", IITP, pp. 1-25, Feb. 2017
4	Rongqing Huang and John HL Hansen, "Dialect/accent classification via boosted word modeling", In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings (ICASSP'05), volume 1, pages I-585. IEEE, 2005
5	Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., and Ouellet, P, "Front-end factor analysis for speaker verification", IEEE Trans. Audio Speech Lang. Process. 19, pp. 788-798, August 2010, DOI: 10.1109/TASL.2010.2064307 DOI
6	Dehak, N., Torres-Carrasquillo, P., Reynolds, D., and Dehak, R, "Language recognition via ivectors and dimensionality reduction" in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (San Francisco, CA), pp. 857-860, August 2011.
7	Song, Y., Jiang, B., Bao, Y., Wei, S., and Dai, L.-R, "I-vector representation based on bottleneck features for language identification", Electron. Lett. 49, pp. 1569-1570, 2013, DOI: 10.1049/el.2013.1721 DOI
8	C. Themistocleous, "Dialect Classification From a Single Sonorant Sound Using Deep Neural Networks" frontiers om Communication, November 2019. DOI: 10.3389/fcomm.2019.00064
9	Nagaratna B. Chittaragi; Shashidhar G. Koolagudi, "Acoustic features based word level dialect classification using SVM and ensemble methods" IEEE Trans, In 2017 Tenth International Conference on Contemporary Computing (IC3), pp. 1-6, August 2017, DOI: 10.1109/IC3.2017.8284315. DOI
10	Li, Ming, Chi-Sang Jung, and Kyu J. Han. "Combining five acoustic level modeling methods for automatic speaker age and gender recognition." Eleventh Annual Conference of the International Speech Communication Association. pp. 2526-2829, 2010.
11	Ghahremani, P., Nidadavolu, P. S., Chen, N., Villalba, J., Povey, D., Khudanpur, S., & Dehak, N. "End-to-end Deep Neural Network Age Estimation." In INTERSPEECH, pp. 277-281, December 2018.
12	Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3, pp. 19-41, 2000. DOI
13	Stolcke, Andreas, et al. "Speaker recognition with session variability normalization based on MLLR adaptation transforms." IEEE Transactions on Audio, Speech, and Language Processing 15.7, pp. 1987-1998, 2007. DOI
14	Qawaqneh, Zakariya, Arafat Abu Mallouh, and Buket D. Barkana. "Deep neural network framework and transformed MFCCs for speaker's age and gender classification." Knowledge-Based Systems 115, pp. 5-14, 2017. DOI
15	Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., & Khudanpur, S. "X-vectors: Robust dnn embeddings for speaker recognition.", In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5329-5333, April 2018.
16	Hourri, Soufiane, and Jamal Kharroubi. "A deep learning approach for speaker recognition." International Journal of Speech Technology 23.1, pp. 123-131, 2020. DOI
17	AI Hub, Korean Conversation Voice, https://aihub.or.kr/aidata/7968.
18	Mallouh, Arafat Abu, Zakariya Qawaqneh, and Buket D. Barkana. "New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification." Neural Computing and Applications 30.8, pp. 2581-2593, 2018. DOI
19	S. Gopal Krishna Patro, Kishore Kumar Sahu, "Normalization: A Preprocessing Stage" arXiv preprint arXiv:1503.06462 (2015).
20	Goutte, Cyril, and Eric Gaussier. "A probabilistic interpretation of precision, recall and F-score, with implication for evaluation." European conference on information retrieval. Springer, Berlin, Heidelberg, pp. 345-359, 2005, DOI: 10.1007/978-3-540-31865-1_25. DOI
21	Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D., and Khudanpur, S, "Spoken language recognition using x-vectors," in Proceedings of Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 105-111, 2018, DOI: 10.21437/Odyssey.2018-15 DOI
22	Park Jeon-gyu, "Deep Learning-based Speech Recognition Technology", http://www.itdaily.kr/news/articleView.html?idxno=76405