DOI QR코드

DOI QR Code

Phoneme Segmentation based on Volatility and Bulk Indicators in Korean Speech Recognition

한국어 음성 인식에서 변동성과 벌크 지표에 기반한 음소 경계 검출

  • Received : 2015.04.30
  • Accepted : 2015.08.07
  • Published : 2015.10.15

Abstract

Today, the demand for speech recognition systems in mobile environments is increasing rapidly. This paper proposes a novel method for Korean phoneme segmentation that is applicable to a phoneme based Korean speech recognition system. First, the input signal constitutes blocks of the same size. The proposed method is based on a volatility indicator calculated for each block of the input speech signal, and the bulk indicators calculated for each bulk in blocks, where a bulk is a set of adjacent samples that have the same sign as that of the primitive indicators for phoneme segmentation. The input signal vowels, voiced consonants, and voiceless consonants are sequentially recognized and the boundaries among phonemes are found using three devoted recognition algorithms that combine the two types of primitive indicators. The experimental results show that the proposed method can markedly reduce the error rate of the existing phoneme segmentation method.

최근 모바일 환경에서 작동 가능한 음성 인식 시스템에 대한 수요가 급격히 증대되고 있다. 본 논문은 음소 기반 한국어 음성 인식 시스템에 적용하기 위한 새로운 한국어 음소 경계 검출 방안을 제안한다. 먼저 입력 신호는 동일한 크기의 블록들을 구성한다. 제안하는 방식은 입력 음성 신호의 각 블록에 대해 계산되는 변동성 지표와, 부호가 동일한 인접 샘플들의 집합인, 블록 내의 각 벌크에 대해 계산되는 벌크 지표를 음소 경계 검출의 기반 지표로 사용한다. 두 가지 기반 지표를 결합하여 활용하는 세 개의 전용 인식 알고리즘을 사용하여, 모음, 유성 자음, 그리고 무성 자음을 차례로 인식하여 음소 간 경계를 검출한다. 실험 결과를 통해, 제안하는 방식을 사용함으로써 기존의 경계 검출 방식에 비해 오류율을 현저히 감소시킬 수 있음을 확인하였다.

Keywords

Acknowledgement

Supported by : 성신여자대학교

References

  1. Y. J. Kim, H. L. Kim, J. H. Jung, "A Study on the Korean Syllable As Recognition Unit," Journal of Acoustical Society of Korea, Vol. 16, No. 3, pp. 64-72, 1997. (in Korean)
  2. Y. Y. Seo, J. D. Song, J. H. Lee, "Phoneme Segmentation in Consideration of Speech feature in Korean Speech Recognition," Journal of Korean Society for Internet Information, Vol. 2, No. 1, pp. 31-38, 2001. (in Korean)
  3. M. Y. Nam, J. J. Lee, J. H. Park, S. Y. No, "Recognition of Korean Fricatives and Affricates Using Modified Teager Energy Measurement Method," Proc. of the IEEK Conference 1993, Vol. 15, No. 1, pp. 23-26, 1993. (in Korean)
  4. M. J. Kim and C. H. Kweon, "An Automatic Segmentation System Based on HMM and Correction Algorithm," Speech Sciences, Vol. 9, No. 4, pp. 265-274, 2002. (in Korean)
  5. G. Hinton, L. Deng, D. Yu, G. E. Dahl, "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups," Signal Processing Magazine, IEEE, Vol. 29, No. 6, pp. 82-97, 2012.
  6. F. Brugnara et al., "Automatic segmentation and labeling of speech based on hidden Markov model," Speech Communication, Vol. 12, pp. 357-370, 1993. https://doi.org/10.1016/0167-6393(93)90083-W
  7. J. ZhNF, H. Yu, N. Ma, Z. Li, "The Phoneme Automatic Segmentation Algorithms Study of Tibetan Lhasa Words Continuous Speech Stream," Proc. of the 2nd International Conference On Systems Engineering and Modeling, pp. 578-581, 2013.
  8. R. A. Brietion, B. M. G Cheetham, M. C. Hall, "A comparison of distance measures for speech segmentation in variable frame rate speech vocoding," Proc. of the IEEE Colloquium, pp. 6/1-6/5, 1990.
  9. F. Itakura and S. Saito, "A statistical method for estimation of speech spectral density and formant frequencies," Electronics and Communications in Japan, Vol. 53A, pp. 36-43, 1970.
  10. Y. Lee, "Phoneme Segmentation Using Phoneme Combination and Formant Scaling in Korean," Master Thesis, Department of Computer Engineering, Inha University, Incheon, Korea, 2003. (in Korean)
  11. Y. Seo, J. Song, J. Lee, "Phoneme Segmentation in Consideration of Speech feature in Korean Speech Recognition," Journal of Internet Computing and Services, Vol. 2, No. 1, pp. 31-38, 2001. (in Korean)