DOI QR코드

DOI QR Code

Speech Recognition of the Korean Vowel 'ㅡ' based on Neural Network Learning of Bulk Indicators

벌크 지표의 신경망 학습에 기반한 한국어 모음 'ㅡ'의 음성 인식

  • 이재원 (성신여자대학교 컴퓨터공학과)
  • Received : 2017.07.31
  • Accepted : 2017.09.21
  • Published : 2017.11.15

Abstract

Speech recognition is now one of the most widely used technologies in HCI. Many applications where speech recognition may be used (such as home automation, automatic speech translation, and car navigation) are now under active development. In addition, the demand for speech recognition systems in mobile environments is rapidly increasing. This paper is intended to present a method for instant recognition of the Korean vowel 'ㅡ', as a part of a Korean speech recognition system. The proposed method uses bulk indicators (which are calculated in the time domain) instead of the frequency domain and consequently, the computational cost for the recognition can be reduced. The bulk indicators representing predominant sequence patterns of the vowel 'ㅡ' are learned by neural networks and final recognition decisions are made by those trained neural networks. The results of the experiment show that the proposed method can achieve 88.7% recognition accuracy, and recognition speed of 0.74 msec per syllable.

음성 인식은 HCI 분야에서 널리 사용되는 기술 중 하나이다. 가정 자동화, 자동 통역, 차량 내비게이션 등 음성 인식 기술이 적용될 수 있는 많은 응용들이 현재 개발되고 있다. 또한, 모바일 환경에서 작동 가능한 음성 인식 시스템에 대한 수요도 급속히 증대되고 있다. 본 논문은 한국어 음성 인식 시스템의 일부로서, 한국어 모음 'ㅡ'를 빠르게 인식할 수 있는 방안을 제시한다. 제안하는 방식은 주파수 영역 대신, 시간 영역에서 계산되는 지표인 벌크 지표를 사용하므로, 인식을 위한 계산 비용을 절감할 수 있다. 모음 'ㅡ'의 전형적인 시퀀스 패턴들을 표현하는 벌크 지표들에 대한 신경망 학습을 수행하며, 최종적인 인식을 위해 학습된 신경망을 사용한다. 실험 결과를 통해, 제안하는 방식이 모음 'ㅡ'를 88.7%의 정확도로 인식할 수 있음을 확인하였고, 인식 속도는 어절 당 0.74msec이다.

Keywords

Acknowledgement

Supported by : 성신여자대학교

References

  1. KOCCA, Culture Technology(CT) in-depth Report, Nov. 2011.
  2. Y. Y. Seo, J. D. Song, J. H. Lee, "Phoneme Segmentation in Consideration of Speech feature in Korean Speech Recognition," Journal of Korean Society for Internet Information, Vol. 2, No. 1, pp. 31-38, 2001. (in Korean)
  3. M. J. Kim and C. H. Kweon, "An Automatic Segmentation System Based on HMM and Correction Algorithm," Speech Sciences, Vol. 9, No. 4, pp. 265-274, 2002. (in Korean)
  4. G. Hinton, L. Deng, D. Yu, G. E. Dahl, "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," Signal Processing Magazine, IEEE, Vol. 29, No. 6, pp. 82-97, 2012.
  5. J. W. Lee, "Speech Recognition of the Korean Vowel 'ㅗ' Based on Time Domain Waveform Patterns," KIISE Transactions on Computing Practices, Vol. 22, No. 11, pp. 583-590, 2016. (in Korean) https://doi.org/10.5626/KTCP.2016.22.11.583
  6. J. W. Lee, "Speech Recognition of the Korean Vowel 'ㅜ' Based on Time Domain Bulk Indicators," KIISE Transactions on Computing Practices, Vol. 22, No. 11, pp. 591-600, 2016. (in Korean) https://doi.org/10.5626/KTCP.2016.22.11.591
  7. F. Brugnara et al., "Automatic segmentation and labeling of speech based on hidden Markov model," Speech Communication, Vol. 12, pp. 357-370, 1993. https://doi.org/10.1016/0167-6393(93)90083-W
  8. J. ZhNF, H. Yu, N. Ma, Z. Li, "The Phoneme Automatic Segmentation Algorithms Study of Tibetan Lhasa Words Continuous Speech Stream," Proc. of the 2nd International Conference On Systems Engineering and Modeling, pp. 578-581, 2013.
  9. D. K. Kim, C. G. Jeong, H. Jeong, "Hierarchy and Modularity in Time-Delay Neural Networks for Korean Phoneme Recognition using HMM," IEEK, Vol. 16, No 1, pp. 81-84, 1994.
  10. H. Jung, Korean Speech Recognition Using Neural Networks, Korean Institute of Information Scientists and Engineers, pp. 63-82, 1993.
  11. G. Kiss, D. Sztaho, K. Vicsi, "Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features," 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom), pp. 579-582, 2013.
  12. R. A. Brietion, B. M. G. Cheetham, M. C. Hall, "A comparison of distance measures for speech segmentation in variable frame rate speech vocoding," Proceeding of the IEEE Colloquium, pp. 6/1-6/5, 1990.
  13. F. Itakura and S. Saito, "A statistical method for estimation of speech spectral density and formant frequencies," Electronics and Communications in Japan, Vol. 53A, pp. 36-43, 1970.
  14. J. H. Lee, J. W. Lee, and J. Lee, "Korean Phonemes 'ㅅ', 'ㅈ', 'ㅊ' Recognition based on Sign Distribution Volatility," Communications of the Korean Institute of Information Scientists and Engineers, Vol. 19, pp. 377-382, 2013. (in Korean)
  15. J. W. Lee, "Speech Recognition of Korean Phonemes 'ㅅ', 'ㅈ', 'ㅊ' based on Volatility and Turning Points," KIISE Transactions on Computing Practices, Vol. 20, No. 11, pp. 579-585, 2014. (in Korean) https://doi.org/10.5626/KTCP.2014.20.11.579
  16. W. Roh and J. Lee, "Implementation of Korean Vowel 'ㅏ' Recognition based on Common Feature Extraction of Waveform Sequence," KIISE Transactions on Computing Practices, Vol. 20, No. 11, pp. 567-572, 2014. (in Korean) https://doi.org/10.5626/KTCP.2014.20.11.567
  17. W. Roh and J. Lee, "Implementation of Waveform Sequence Feature Extraction for Korean Vowel 'ㅓ' Recognition," KCC2015, pp. 128-130, 2014. (in Korean)
  18. J. W. Lee, "Speech Recognition of the Korean Vowel 'ㅐ' Based on Time Domain Sequence Patterns," KIISE Transactions on Computing Practices, Vol. 21, No. 11, pp. 713-720, 2015. (in Korean) https://doi.org/10.5626/KTCP.2015.21.11.713
  19. Y. Lee, "Phoneme Segmentation Using Phoneme Combination and Formant Scaling in Korean," Master Thesis, Department of Computer Engineering, Inha University, Incheon, Korea, 2003. (in Korean)