Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

Lee, Donghyun;Park, Jae-Hyun;Kim, Kwang-Ho;Park, Jeong-Sik;Kim, Ji-Hwan;Jang, Gil-Jin;Park, Unsang;

doi:10.3837/tiis.2017.09.005

KSII Transactions on Internet and Information Systems (TIIS)

Volume 11 Issue 9
/
Pages.4264-4279
/
2017
/
1976-7277(pISSN)
/
1976-7277(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

Lee, Donghyun (Department of Computer Science and Engineering, Sogang University) ;
Park, Jae-Hyun (LG Electronics Institute of Technology) ;
Kim, Kwang-Ho (AIZEN Global Co., Inc.) ;
Park, Jeong-Sik (Department of Information and Communication Engineering, Yeungnam University) ;
Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University) ;
Jang, Gil-Jin (School of Electronics Engineering, Kyungpook National University) ;
Park, Unsang (Department of Computer Science and Engineering, Sogang University)

Received : 2016.11.21
Accepted : 2017.03.29
Published : 2017.09.30

https://doi.org/10.3837/tiis.2017.09.005 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Keywords

References

T. Bosse, R. Duell, M. Hoogendoorn et al., "A multi-agent system architecture for personal support during demanding tasks," Opportunities and Challenges for Next-Generation Applied Intelligence, vol. 214, pp. 285-290, 2009.
J. Jiang, A. H. Awadallah, R. Jones et al., "Automatic online evaluation of intelligent assistants," in Proc. of the 24th International Conference on World Wide Web, pp. 506-516, May 18-22, 2015.
H. Chung, J. Park, Y. Lee, and I. Chung, "Fast speech recognition to access a very large list of items on embedded devices," IEEE Transactions on Consumer Electronics, vol. 54, no. 2, pp. 803-807, July, 2008. https://doi.org/10.1109/TCE.2008.4560163
G. Riccardi and D. Hakkani-Tur, "Active and unsupervised learning for automatic speech recognition," in Proc. of the Eurospeech, pp. 1825-1828, September 1-4, 2003.
G. Riccardi, "Active learning: Theory and applications to automatic speech recognition," IEEE Transactions on Speech and Audio Processing, vol. 13, no. 4, pp. 504-511, June, 2005. https://doi.org/10.1109/TSA.2005.848882
Y. Chang, S. Hung, N. Wang, and B. Lin, "CSR: A cloud-assisted speech recognition service for personal mobile device," in Proc. of the International Conference on Parallel Processing, September 13-16, pp. 305-314, 2011.
D. Kwon, H. Lim, W. Lee, H. Kim, S. Jung, T. Suh, and K. Nam, "A personalized English vocabulary learning system based on cognitive abilities related to foreign language proficiency," KSII Transactions on Internet and Information Systems, vol. 4, no. 4, pp. 595-617, August, 2010. https://doi.org/10.3837/tiis.2010.08.009
H. Liao, Y. Guan, J. Tu, and J. Chen, "A prototype of an adaptive Chinese pronunciation training system," System, vol. 45, pp. 52-66, August, 2014. https://doi.org/10.1016/j.system.2014.04.006
R. Sproat, W. Gale, C. Shih, and N. Chang, "A stochastic finite-state word-segmentation algorithm for Chinese," Computational Linguistics, vol. 22, no. 3, pp. 377-404, September, 1996.
K. Kim, "Recurrent neural network for discrete input and its use in the implementation of Korean syllable-based language model," Ph.D. Thesis, Sogang University, 2016.
D. Lee, U. Park, J. Kim, J. Park, J. Park, and G. Jang, "Automatic generation of sub-word recognition units for unlimited Korean lexicon in continuous speech recognition system," in Proc. of the 2nd International Conference on Electrical, Engineering, Computer Science, pp. 42-43, Aug., 2016.
O. Kwon and J. Park, "Korean large vocabulary continuous speech recognition with morpheme-based recognition units," Speech Communication, vol. 39, pp. 287-300, February, 2003. https://doi.org/10.1016/S0167-6393(02)00031-6
S. Lee, J. Seo, and Y. Oh, "A Korean part of speech tagging system with handling unknown words," in Proc. of the International Conference on Computer Processing of Oriental Languages, pp. 164-171, November, 1995.
Y. Park, D. Ahn, and M. Chung, "Morpheme-based lexical modeling for Korean broadcast news transcription," in Proc. of the Eurospeech, pp.1129-1132, September 1-4, 2003.
O. Kwon, K. Hwang, and J. Park, "Korean large vocabulary continuous speech recognition using pseudomorpheme units," in Proc. of the Eurospeech, pp.483-486, September 5-9, 1999.
K. Lee and M. Chung, "Pseudo-morpheme-based continuous speech recognition," in Proc. of the Speech Communication and Signal Processing Workshop, pp. 309-314, August, 1998.
K. Lee and M. Chung, "Morphological analysis of spoken Korean based on pseudo-morphemes", in Proc. of the Conference of Korean Information Processing, pp. 396-404, October 9-10, 1998.
M. Chung and K. Lee, "Modeling cross-morpheme pronunciation variations for Korean large vocabulary continuous speech recognition," in Proc. of the Eurospeech, pp. 261-264, September 1-4, 2003.
Y. Park and M. Chung, "Automatic Generation of Concatenate Morphemes for Korean LVCSR," Journal of the Acoustical Society of Korea, vol. 21, no. 4, pp. 407-414, 2002.
M. Schuster, "Japanese and Korean voice search," in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 5149-5152, March 25-30, 2012.
L. Tomokiyo and K. Ries, "What makes a word: Learning base units in Japanese for speech recognition," in Proc. of the ACL Special Interest Group in Natural Language Learning, pp. 60-69, August 18-20, 1997.
C. Lee, B. Juang, F. Soong, and L. Rabiner, "Word recognition using whole word and subword models," in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 683-686, May 23-26, 1989.
H. Yu, H. Kim, J. Hong, M. Kim, and J. Lee, "Large vocabulary Korean continuous speech recognition using a one-pass algorithm," in Proc. of the International Conference of Spoken Language Processing, pp. 278-281, October 16-20, 2000.
B. H. Juang and L. R. Rabiner, "Hidden markov models for speech recognition," Technometrics, vol. 33, no. 3, pp. 251-272, August, 1991. https://doi.org/10.1080/00401706.1991.10484833
L. R. Rabiner and B. H. Juang, "An introduction to hidden markov models," IEEE ASSP Magazine, vol. 3, no. 1, pp. 4-16, January, 1986. https://doi.org/10.1109/MASSP.1986.1165381
J. Odell, "The use of context in large vocabulary speech recognition," Ph.D. Thesis, Cambridge University, 1995.
H. Kim, C. Seon, and J. Seo, "Review of Korean speech act classification: Machine learning method," Journal of Computing Science and Engineering, vol. 5, no. 4, pp.288-293, December, 2011. https://doi.org/10.5626/JCSE.2011.5.4.288
S. Kanthak and H. Ney, "Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition," in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 845-848, May 13-17, 2002.
R. Singh, B. Raj, and R.M. Stern, "Automatic generation of phone sets and lexical transcriptions," in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1691-1694, June 5-9, 2000.
C. Schillo, G. A. Fink, and F. Kummert, "Grapheme based speech recognition for large vocabularies," in Proc. of the International Conference on Spoken Language Processing, pp. 129-132, October 16-20, 2000.

Cited by

Exploring the opportunity of digital voice assistants in the logistics and transportation industry vol.32, pp.6, 2017, https://doi.org/10.1108/jeim-12-2018-0271
Speaker Adaptation Using i-Vector Based Clustering vol.14, pp.7, 2017, https://doi.org/10.3837/tiis.2020.07.003

KSII Transactions on Internet and Information Systems (TIIS)

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)