Browse > Article
http://dx.doi.org/10.13088/jiis.2021.27.2.017

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations  

Kang, Garam (Department of BIG DATA Analytics, Kyung Hee University)
Kwon, Ohbyung (School of Management, Kyung Hee University)
Publication Information
Journal of Intelligence and Information Systems / v.27, no.2, 2021 , pp. 17-32 More about this Journal
Abstract
Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.
Keywords
Speaker recognition; Sentence-final ending; Sentence sequencing; Speech recognition services; Text analysis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Han, G., Study on the Endings of Modern Hangeul, Yuk Rack, (2004).
2 Han. S., "A Study on the Use of Final Endings in Korean Language Conversation". The Journal of Humanities and Social science, Vol.11, No.4(2020), 2315-2327.
3 Ioffe. S., "Probabilistic linear discriminant analysis" Computer Vision-ECCV, No.5(2006), 531-542.
4 Chakroun, R., and M. Frikha., "A New Text Independent Speaker Recognition System with Short Utterances Using SVM". European, Mediterranean, and Middle Eastern Conference on Information Systems, No.11(2020), 566-574.
5 Ahn, J., "The use of new forms of honorific final ending in Modern Korean". The Linguistic Association of Korea Journal, Vol.25, No.3 (2017), 173-192.   DOI
6 Ai, H., W. Xia., and Q. Zhang., "Speaker Recognition Based on Lightweight Neural Network for Smart Home Solutions". International Symposium on Cyberspace Safety and Security, No.12(2019), 421-431.
7 Bhattacharya, G., M. Alam., and P. Kenny., "Deep speaker recognition: Modular or monolithic?". INTERSPEECH, No.9(2019), 1143-1147.
8 Bu, S., and S. B. Cho, "Speaker Identification Method based on Convolutional Neural Network with STFT Sound-Map". KIISE Transactions on Computing Practices, Vol.24, No.6(2018), 289-294.   DOI
9 Chae. S., "Theories and methods of sociolinguistic research". Saegugeosaenghwal, Vol.14, No.4 (2004), 83-103.
10 Chen, G., S. Chen., L, Fan., X. Du., Z. Zhao., F. Song., & Y. Liu., "Who is real bob? adversarial attacks on speaker recognition systems". arXiv:1911.01840, No.4(2020).
11 Choi. J., "Classification of Continuous Speech Speakers by Multilayer Perceptron Network". Proceedings of the Korean Institute of Information and Commucation Sciences Conference, No.5(2017), 682-683.
12 Choi. J., "Speech-dependent Speaker Identification Using Mel Frequency Cepstrum Coefficients for Continuous Speech Recognition". Journal of KIIT, Vol.14, No.10(2016), 67-72.   DOI
13 Dehak, N., P. Kenny. R. Dehak., P. Dumouchel., and P. Ouellet., "Front-end factor analysis for speaker verification". IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No.4(2011), 788-798.   DOI
14 Devi, K., N. Singh., and K. Thongam., "Automatic Speaker Recognition from Speech Signals Using Self Organizing Feature Map and Hybrid Neural Network". Microprocessors and Microsystems, Vol.79, No.(2020), 103264.   DOI
15 Kang, H., and M. H. Kim, "A Multivariate Analytical Study of Variation Patterns of Honorific Final Endings in KakaoTalk Dialogue". The Sociolinguistic Journal of Korea, Vol.26, No.1(2018), 1~30.   DOI
16 Garcia-Romero. D., D. Snyder., G. Sell., A. McCree., D. Povey., and S. Khudanpur., "X-vector DNN Refinement with Full-Length Recordings for Speaker Recognition". INTERSPEECH, No.9(2019), 1493-1496.
17 Alluri, K., V. Raju., S. Gangashetty., and A. K. Vuppala., "Analysis of Source and System features for Speaker Recognition in Emotional Conditions". IEEE Region 10 Conference, No.(2016), 2847-2850.
18 Chae. S., "Noise Robust Text-Dependent Speaker Verification Using Teacher-Student Learning Framework". Department of Electrical Engineeting and Computer Science College of Engineering SEOUL NATIONAL UNIVERSITY. (2019).
19 Ha. B., and M. Huh., "The Effect of Pitch, Duration, and Intensity on a Preception of Speech". Journal of speech-language & hearing disorders, Vol.27, No.3(2018), 45-54.   DOI
20 Kang, B., "Lexical Differences between Utterances of Men and Women : A Corpus Based Classification Study". Korean Linguistics, Vol.58, No.2(2013), 1-30.
21 Jung. H., S. Yoon., and N. Park., "Speaker Recognition Using Convolutional Siamese Neural Networks". The transactions of The Korean Institute of Electrical Engineers. Vol.69, No.1(2020), 164-169.   DOI
22 Snyder. D., D. Garcia-Romero., G. Sell., A. McCree., D. Povey., and S. Khudanpur., "Speaker recognition for multi-speaker conversations using x-vectors". IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), No.5(2019), 5796-5800.
23 Kim, S., Kim, J. "Development of final ending of three to four-year-old children", Communication Sciences & Disorders, Vol. 9 (2004), 22-35.
24 Snyder, D., D. Garcia-Romero., D. Povey., and S. Khudanpur., "Deep Neural Network Embeddings for Text-Independent Speaker Verification". Interspeech, No.8(2017), 999-1003.
25 Kwon, O., Kim, J., Cho, H.Y., Hong, K.A. Han, J.M., Kim, Y.W., Choi, S., KHU-SentiwordNet: Developing A Korean SentiwordNet Combining Empty Morpheme, Proceedings of the 2019 Conference on Korea IT Service, 2019, pp.194-197.
26 Mohdiwale, S., and T. Sahu., "Nearest Neighbor Classification Approach for Bilingual Speaker and Gender Recognition". Advances in Biometrics, No.(2019), 249-266.
27 Pack, J., "Study on the Recognition of Honorification among Korean Native Speakers -focused on Koreans in their 20s, 30s-". Hanminjok Emunhak, Vol.73, No.8(2016), 119-154.
28 Ramachandran, R., K. Farrell., R. Ramachandran., and R. Mammone., "Speaker recognition-general classifier approaches and data fusion methods". Pattern recognition, Vol.35, No.12 (2002), 2801-2821.   DOI
29 Seo. Y., and H. Kim., "Recent Speaker Recognition Technology Trend". The Magazine of the IEIE, Vol.41, No.3(2014), 40-49.
30 So, S., "Development of speaker classification model using text-independent utterance based on deep neural network", HANYANG UNIVERSITY, (2019).
31 Song, J., "Semantic Functions of the Korean Sentence-Terminal Suffix -ney", Journal of Korean Linguistics, Vol.76, No.12(2015), 123-159.   DOI
32 Yun, H., and Z. Jin., "Exploring listeners' perception on evidential grammatical markers: Comparison between Seoul and Yanbian dialect users". Language and Information, Vol.24, No.1(2020), 29-45.   DOI
33 Patterns of Honorific Final Endings in KakaoTalk Dialogue". The Sociolinguistic Journal of Korea, Vol.26, No.1(2018), 1~30.   DOI
34 Povey, D., X. Zhang., and S. Khudanpur., "Parallel training of deep neural networks with natural gradient and parameter averaging". ICLR, No.11(2014).
35 Sing, P., M. Embi., and H. Hashim., "Ask the Assistant: Using Google Assistant in classroom reading comprehension activities". International Journal of New Technology and Research, Vol.5, No.7(2019), 39-43.
36 Kim, J., M. S. Yoon, S. J. Kim, M. S. Chang, and J. E. Cha, "Utterance Types in Typically Developing Preschoolers". Korean Journal of Communication Disorders, Vol.17, No.3(2012), 488-498.
37 Snyder, D., D. Garcia-Romero.,G. Sell., D. Povey., and S. Khudanpur., "X-vectors: Robust dnn embeddings for speaker recognition". IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), No.4(2018), 5329-5333.
38 Wang, N., P. Ching., N. Zheng., and T. Lee., "Robust speaker recognition using denoised vocal source and vocal tract features". IEEE transactions on audio, speech, and language processing, Vol.19, No.1(2011), 196-205.   DOI
39 Kim, J., "A study of awareness and generation of Korean language leaners on attitude of speaker in terms of boundary tone". EWHA WOMANS UNIVERSITY, (2018).
40 Kim, S., "The function and meaning of the final ending -ni", Urimal Studies, Vol. 15, 2004, 53-78.
41 Jang. K., "A study on stylistic features of ending-components in Korean". Youkrack, (2010).
42 Jo. M., "Pragmatic Strategy and Intonation of "-geodeun", the Final Endings: Focusing on the age variation of Those in 10s, 20s, 30s". Korean Linguistics, Vol.65, No.11(2014), 237-262.
43 Kang, J., B. R. Kim, K. Y. Kim, and S. H. Lee, "Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters". Journal of the Korea Academia-Industrial cooperation Society, Vol.21, No.6(2020), 679-686.   DOI
44 Huanjun, B., X. Mingxing., and F. Thomas., "Emotion Attribute Projection for Speaker Recognition on Emotional Speech". EUROSPEECH, No.8(2007), 758-761.