DOI QR코드

DOI QR Code

시간 영역 파형 패턴에 기반한 한국어 모음 'ㅗ'의 음성 인식

Speech Recognition of the Korean Vowel 'ㅗ' Based on Time Domain Waveform Patterns

  • 투고 : 2016.07.25
  • 심사 : 2016.09.19
  • 발행 : 2016.11.15

초록

최근 일상적인 인간 생활의 거의 모든 영역에서 사물 인터넷에 대한 관심이 급속히 증대되면서, 음성 인식은 중요한 HCI 수단으로 자리 잡고 있다. 더불어, 모바일 환경에서의 음성 인식 시스템에 대한 수요 또한 급속히 증대되고 있다. 모바일 환경을 위한 서버 기반의 음성 인식 시스템은 대체로 빠른 속도와 높은 인식률을 보이고 있지만, 데이터베이스에 저장되어 있는 단어를 단위로 하여 인식을 수행하므로, 인터넷이 연결되어 있어야 하고 서버에서의 많은 계산량을 필요로 한다. 본 논문은 음소 기반 한국어 음성 인식 시스템의 일부로서, 한국어 모음 'ㅗ'에 대한 새로운 인식 방식을 제안한다. 제안하는 방식은 주파수 영역에서의 분석 대신, 시간 영역에서의 파형 패턴에 기반하여 동작하므로, 계산 비용을 현저히 절감할 수 있다. 모음 'ㅗ'의 전형적인 파형 패턴들을 탐지하기 위한 요소 알고리즘들을 제시하며, 이를 결합하여 최종 판별을 수행한다. 실험 결과를 통해, 제안하는 방식이 89.9%의 인식 정확도를 달성할 수 있음을 확인하였다.

Recently, the rapidly increasing interest in IoT in almost all areas of casual human life has led to wide acceptance of speech recognition as a means of HCI. Simultaneously, the demand for speech recognition systems for mobile environments is increasing rapidly. The server-based speech recognition systems are typically fast and show high recognition rates; however, an internet connection is necessary, and complicated server computation is required since a voice is recognized by units of words that are stored in server databases. In this paper, we present a novel method for recognizing the Korean vowel 'ㅗ', as a part of a phoneme based Korean speech recognition system. The proposed method involves analyses of waveform patterns in the time domain instead of the frequency domain, with consequent reduction in computational cost. Elementary algorithms for detecting typical waveform patterns of 'ㅗ' are presented and combined to make final decisions. The experimental results show that the proposed method can achieve 89.9% recognition accuracy.

키워드

과제정보

연구 과제 주관 기관 : 성신여자대학교

참고문헌

  1. KOCCA, Culture Technology(CT) in-depth Report, Nov. 2011.
  2. M. J. Kim and C. H. Kweon, "An Automatic Segmentation System Based on HMM and Correction Algorithm," Speech Sciences, Vol. 9, No. 4, pp. 265-274, 2002. (in Korean)
  3. G. Hinton, L. Deng, D. Yu, G. E. Dahl, "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," Signal Processing Magazine, IEEE, Vol. 29, No. 6, pp. 82-97, 2012.
  4. Y. K. Lee, "Speech Interface Technology and Service Trend under the Smart Phone Environment," Information & Communications Magazine, Vol. 29, No. 4, pp. 3-9, 2012. (in Korean)
  5. F. Brugnara et al., "Automatic segmentation and labeling of speech based on hidden Markov model," Speech Communication, Vol. 12, pp. 357-370, 1993. https://doi.org/10.1016/0167-6393(93)90083-W
  6. J. ZhNF, H. Yu, N. Ma, Z. Li, "The Phoneme Automatic Segmentation Algorithms Study of Tibetan Lhasa Words Continuous Speech Stream," Proc of the 2nd International Conference On Systems Engineering and Modeling, pp. 578-581, 2013.
  7. G. Kiss, D. Sztaho, K. Vicsi, "Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features," 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom), pp. 579-582, 2013.
  8. R. A. Brietion, B. M. G. Cheetham, M. C. Hall, "A comparison of distance measures for speech segmentation in variable frame rate speech vocoding," Proc. of the IEEE Colloquium, pp. 6/1-6/5, 1990.
  9. F. Itakura and S. Saito, "A statistical method for estimation of speech spectral density and formant frequencies," Electronics and Communications in Japan, Vol. 53A, pp. 36-43, 1970.
  10. D. K. Kim, C. G. Jeong, and H. Jeong, "Hierarchy and Modularity in Time-Delay Neural Networks for Korean Phoneme Recognition using HMM," IEEK, Vol. 16, No. 2, pp. 81-84, 1994.
  11. H. Jung, Korean Speech Recognition Using Neural Networks, Korean Institute of Information Scientists and Engineers, pp. 63-82, 1993.
  12. J. H. Lee, J. W. Lee, and J. Lee, "Korean Phonemes 'ㅅ', 'ㅈ', 'ㅊ' Recognition based on Sign Distribution Volatility," Communications of the Korean Institute of Information Scientists and Engineers, Vol. 19, pp. 377-382, 2013. (in Korean)
  13. J. W. Lee, "Speech Recognition of Korean Phonemes 'ㅅ', 'ㅈ', 'ㅊ' based on Volatility and Turning Points," KIISE Transactions on Computing Practices, Vol. 20, No. 11, pp. 579-585, 2014. (in Korean) https://doi.org/10.5626/KTCP.2014.20.11.579
  14. S. K. Choi, J. W. Lee, and J. Lee, "Korean Vowel 'ㅏ' Recognition based on Wave Sequence Detection," Proc. of the Digital Contents Society Joint Conference 2013, Vol.14, pp.577-579, 2013. (in Korean)
  15. W. Roh and J. Lee, "Implementation of Korean Vowel 'ㅏ' Recognition based on Common Feature Extraction of Waveform Sequence," KIISE Transactions on Computing Practices, Vol. 20, No. 11, pp. 567-572, 2014. (in Korean) https://doi.org/10.5626/KTCP.2014.20.11.567
  16. J. W. Lee, "Speech Recognition of the Korean Vowel 'ㅐ' Based on Time Domain Sequence Patterns," KIISE Transactions on Computing Practices, Vol. 21, No. 11, pp. 713-720, 2015. (in Korean) https://doi.org/10.5626/KTCP.2015.21.11.713
  17. T. W. Jang, H. Y. Kim, B. M. Kim, C. H, "Implementation of Real-time Vowel Recognition Mouse based on Smartphone," KIISE Transactions on Computing Practices, Vol. 21, No. 8, pp. 531-536, 2015. https://doi.org/10.5626/KTCP.2015.21.8.531
  18. Y. Lee, "Phoneme Segmentation Using Phoneme Combination and Formant Scaling in Korean," Master Thesis, Department of Computer Engineering, Inha University, Incheon, Korea, 2003. (in Korean)