DOI QR코드

DOI QR Code

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

  • Lee, Donghyun (Department of Computer Science and Engineering, Sogang University) ;
  • Park, Jae-Hyun (LG Electronics Institute of Technology) ;
  • Kim, Kwang-Ho (AIZEN Global Co., Inc.) ;
  • Park, Jeong-Sik (Department of Information and Communication Engineering, Yeungnam University) ;
  • Kim, Ji-Hwan (Department of Computer Science and Engineering, Sogang University) ;
  • Jang, Gil-Jin (School of Electronics Engineering, Kyungpook National University) ;
  • Park, Unsang (Department of Computer Science and Engineering, Sogang University)
  • Received : 2016.11.21
  • Accepted : 2017.03.29
  • Published : 2017.09.30

Abstract

In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Keywords

References

  1. T. Bosse, R. Duell, M. Hoogendoorn et al., "A multi-agent system architecture for personal support during demanding tasks," Opportunities and Challenges for Next-Generation Applied Intelligence, vol. 214, pp. 285-290, 2009.
  2. J. Jiang, A. H. Awadallah, R. Jones et al., "Automatic online evaluation of intelligent assistants," in Proc. of the 24th International Conference on World Wide Web, pp. 506-516, May 18-22, 2015.
  3. H. Chung, J. Park, Y. Lee, and I. Chung, "Fast speech recognition to access a very large list of items on embedded devices," IEEE Transactions on Consumer Electronics, vol. 54, no. 2, pp. 803-807, July, 2008. https://doi.org/10.1109/TCE.2008.4560163
  4. G. Riccardi and D. Hakkani-Tur, "Active and unsupervised learning for automatic speech recognition," in Proc. of the Eurospeech, pp. 1825-1828, September 1-4, 2003.
  5. G. Riccardi, "Active learning: Theory and applications to automatic speech recognition," IEEE Transactions on Speech and Audio Processing, vol. 13, no. 4, pp. 504-511, June, 2005. https://doi.org/10.1109/TSA.2005.848882
  6. Y. Chang, S. Hung, N. Wang, and B. Lin, "CSR: A cloud-assisted speech recognition service for personal mobile device," in Proc. of the International Conference on Parallel Processing, September 13-16, pp. 305-314, 2011.
  7. D. Kwon, H. Lim, W. Lee, H. Kim, S. Jung, T. Suh, and K. Nam, "A personalized English vocabulary learning system based on cognitive abilities related to foreign language proficiency," KSII Transactions on Internet and Information Systems, vol. 4, no. 4, pp. 595-617, August, 2010. https://doi.org/10.3837/tiis.2010.08.009
  8. H. Liao, Y. Guan, J. Tu, and J. Chen, "A prototype of an adaptive Chinese pronunciation training system," System, vol. 45, pp. 52-66, August, 2014. https://doi.org/10.1016/j.system.2014.04.006
  9. R. Sproat, W. Gale, C. Shih, and N. Chang, "A stochastic finite-state word-segmentation algorithm for Chinese," Computational Linguistics, vol. 22, no. 3, pp. 377-404, September, 1996.
  10. K. Kim, "Recurrent neural network for discrete input and its use in the implementation of Korean syllable-based language model," Ph.D. Thesis, Sogang University, 2016.
  11. D. Lee, U. Park, J. Kim, J. Park, J. Park, and G. Jang, "Automatic generation of sub-word recognition units for unlimited Korean lexicon in continuous speech recognition system," in Proc. of the 2nd International Conference on Electrical, Engineering, Computer Science, pp. 42-43, Aug., 2016.
  12. O. Kwon and J. Park, "Korean large vocabulary continuous speech recognition with morpheme-based recognition units," Speech Communication, vol. 39, pp. 287-300, February, 2003. https://doi.org/10.1016/S0167-6393(02)00031-6
  13. S. Lee, J. Seo, and Y. Oh, "A Korean part of speech tagging system with handling unknown words," in Proc. of the International Conference on Computer Processing of Oriental Languages, pp. 164-171, November, 1995.
  14. Y. Park, D. Ahn, and M. Chung, "Morpheme-based lexical modeling for Korean broadcast news transcription," in Proc. of the Eurospeech, pp.1129-1132, September 1-4, 2003.
  15. O. Kwon, K. Hwang, and J. Park, "Korean large vocabulary continuous speech recognition using pseudomorpheme units," in Proc. of the Eurospeech, pp.483-486, September 5-9, 1999.
  16. K. Lee and M. Chung, "Pseudo-morpheme-based continuous speech recognition," in Proc. of the Speech Communication and Signal Processing Workshop, pp. 309-314, August, 1998.
  17. K. Lee and M. Chung, "Morphological analysis of spoken Korean based on pseudo-morphemes", in Proc. of the Conference of Korean Information Processing, pp. 396-404, October 9-10, 1998.
  18. M. Chung and K. Lee, "Modeling cross-morpheme pronunciation variations for Korean large vocabulary continuous speech recognition," in Proc. of the Eurospeech, pp. 261-264, September 1-4, 2003.
  19. Y. Park and M. Chung, "Automatic Generation of Concatenate Morphemes for Korean LVCSR," Journal of the Acoustical Society of Korea, vol. 21, no. 4, pp. 407-414, 2002.
  20. M. Schuster, "Japanese and Korean voice search," in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 5149-5152, March 25-30, 2012.
  21. L. Tomokiyo and K. Ries, "What makes a word: Learning base units in Japanese for speech recognition," in Proc. of the ACL Special Interest Group in Natural Language Learning, pp. 60-69, August 18-20, 1997.
  22. C. Lee, B. Juang, F. Soong, and L. Rabiner, "Word recognition using whole word and subword models," in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 683-686, May 23-26, 1989.
  23. H. Yu, H. Kim, J. Hong, M. Kim, and J. Lee, "Large vocabulary Korean continuous speech recognition using a one-pass algorithm," in Proc. of the International Conference of Spoken Language Processing, pp. 278-281, October 16-20, 2000.
  24. B. H. Juang and L. R. Rabiner, "Hidden markov models for speech recognition," Technometrics, vol. 33, no. 3, pp. 251-272, August, 1991. https://doi.org/10.1080/00401706.1991.10484833
  25. L. R. Rabiner and B. H. Juang, "An introduction to hidden markov models," IEEE ASSP Magazine, vol. 3, no. 1, pp. 4-16, January, 1986. https://doi.org/10.1109/MASSP.1986.1165381
  26. J. Odell, "The use of context in large vocabulary speech recognition," Ph.D. Thesis, Cambridge University, 1995.
  27. H. Kim, C. Seon, and J. Seo, "Review of Korean speech act classification: Machine learning method," Journal of Computing Science and Engineering, vol. 5, no. 4, pp.288-293, December, 2011. https://doi.org/10.5626/JCSE.2011.5.4.288
  28. S. Kanthak and H. Ney, "Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition," in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, pp. 845-848, May 13-17, 2002.
  29. R. Singh, B. Raj, and R.M. Stern, "Automatic generation of phone sets and lexical transcriptions," in Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1691-1694, June 5-9, 2000.
  30. C. Schillo, G. A. Fink, and F. Kummert, "Grapheme based speech recognition for large vocabularies," in Proc. of the International Conference on Spoken Language Processing, pp. 129-132, October 16-20, 2000.

Cited by

  1. Exploring the opportunity of digital voice assistants in the logistics and transportation industry vol.32, pp.6, 2017, https://doi.org/10.1108/jeim-12-2018-0271
  2. Speaker Adaptation Using i-Vector Based Clustering vol.14, pp.7, 2017, https://doi.org/10.3837/tiis.2020.07.003