A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement

음성 개선 기반의 모델 보상 기법을 이용한 강인한 잡음 음성 인식

  • 신광호 (영남대학교 정보통신공학과) ;
  • 정호열 (영남대학교 정보통신공학과) ;
  • 정현열 (영남대학교 정보통신공학과)
  • Published : 2008.05.31

Abstract

In this paper, we propose a MWF-PMC noise processing method which enhances the input speech by using Mel-warped Wiener Filtering (MWF) at pre-processing stage and compensates the recognition model by using PMC (Parallel Model Combination) at post-processing stage for speech recognition in noisy environments. The PMC uses the residual noise extracted from the silence region of enhanced speech at pre-processing stage to compensate the clean speech model and thus this method is considered to improve the performance of speech recognition in noisy environments. For recognition experiments we dew.-sampled KLE PBW (Phoneme Balanced Words) 452 word speech data to 8kHz and made 5 different SNR levels of noisy speech, i.e., 0dB. 5dB, 10dB, 15dB and 20dB, by adding Subway, Car and Exhibition noise to clean speech. From the recognition results, we could confirm the effectiveness of the proposed MWF-PMC method by obtaining the improved recognition performances over all compared with the existing combined methods.

본 논문에서는 잡음 환경하의 음성 인식을 위해 전처리 단계에서 Mel-warped Wiener Filtering (MWF) 기법을 이용하여 입력 음성을 개선하고 후처리 단계에서 PMC (Parallel Model Combination) 기법을 이용하여 인식 모델을 보상하는 MWF-PMC잡음 처리 기법을 제안한다. PMC 기법은 전처리 단계에서 개선된 음성의 묵음 구간으로부터 잔류 잡음을 취하여 깨끗한 음성을 이용하여 작성한 인식 모델을 보상함으로써 잡음 환경하의 음성 인식 성능을 향상시킬 수 있다. 인식 실험을 위한 음성 데이터는 국어공학연구소 (KLE)에서 작성한 PBW (Phoneme Balanced Words) 452 단어 음성 데이터를 8 kHz로 다운 샘플링한 후 Subway, Car 및 Exhibition 잡음을 5단계의 신호 대 잡음비 (SNR)를 0, 5, 10, 15, 2003로 부가하여 구성하였다. 인식 실험 결과, 본 논문에서 제안한 MWF-PMC 기법이 기존의 결합된 기법보다 전반적으로 향상된 인식 성능을 얻어 그 유효성을 확인할 수 있었다.

Keywords

References

  1. J. Chen, K. K. Paliwal, S. Nakamura, "Sub-Band Based Additive Noise Removal for Robust Speech Recognition," Proc. Eurospeech, 70-73, 2001
  2. H. Hermansky, "Perceptual Linear Prediction (PLP) Analysis of Speech," Proc. JASA, 1738-1752, 1990
  3. S. V. Vaseghi, Advanced Signal Processing and Digital Noise Reduction (Wiley & Teubner Publishers, 1996), Chap. 5, 140-162
  4. 김희근, 정용주, 배건성, "음질향상 기법과 모델보상 방식을 결합한 강인한 음성인식 방식," 음성과학, 14(2), 115-126, 2007
  5. Y. Ephraim, D. Malah, "Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimator," Proc. ICASSP, ASSP-32(6), 1109-1121, 1984
  6. R. J. McAulay, M. L. Malpass, "Speech Enhancement Using A Soft-Decision Noise Suppression Filter," Proc. IEEE Trans. on Acoustic Speech Signal Processing, 28(2), 1995
  7. ETSI final draft standard doc., "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms," ETSI ES 202 050, v1.1.1, 2002
  8. A. Agarwal, Y. M. Cheng, "Two-Stage Mel-warped Wiener Filter for Robust Speech Recognition," Proc. ASRU, 67-70, 1999
  9. M. J. Gales, S. Young, "An Improved Approach to The Hidden Markov Model Decomposition of Speech and Noise," Proc. ICASSP, I-233-236, 1992
  10. M. J. Gales, S. Young, "Robust Speech Recognition in Additive and Convolutional Noise Using Parallel Model Combination," Proc. Computer Speech and Language, 289-307, 1995 https://doi.org/10.1006/csla.1995.0014
  11. 정용주, 이승욱, "자동차 잡음환경 고립단어 음성인식에서의 VTS와 PMC의 성능비교," 음성과학, 10(3), 251-261. 2003
  12. 김남수,"잡음 환경에서의 음성인식," Telecommunications Review, 13(5), 650-661, 2003
  13. J. A. Nolazco Flores, S. Young, "Adapting A HMM-based Recogniser for Noisy Speech Enhanced by Spectral Subtraction," CUED/F-INFENG/TR.123, Cambridge University, England, 1993
  14. J. A. Nolazco Flores, S. Young, "Continuous Speech Recognition in Noise Using Spectral Subtraction and HMM Adaptatioin," Proc. ICASSP, 1, 409-412, 1994
  15. K. Satoshi, S. Sumitaka, Y. Yoshikazu, T. Satoshi, "Robust Speech Recognition Based on HMM Composition and Modified Wiener Filter," Proc. ICSLP, 2053-2056, 2004
  16. F. Martin, K. Shikano, Y. Minami, "Recognition of Noisy Speech by Using The Composition of Hidden Markov Models," Proc. ASJ, 1-7-10, 1992
  17. S. Sagayama, Y. Yamaguchi, S, Takahashi, "Jacobian Adaptation of Noisy Speech Models," Proc. ASU, 396-403, 1997
  18. S. Davis, P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," Proc. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-28(4), 357-366, 1980
  19. S. J. Oh, H. Y. Chung, C. J. Hwang, B. K. Kim, A. Ito, "New State Clustering of Hidden Markov Network with Korean Phonological Rules for Speech Recognition," Proc. IEEE 4th Workshop on Multimedia Signal Processing, 39-44, 2001