DOI QR코드

DOI QR Code

Speech enhancement method based on feature compensation gain for effective speech recognition in noisy environments

잡음 환경에 효과적인 음성인식을 위한 특징 보상 이득 기반의 음성 향상 기법

  • Bae, Ara ;
  • Kim, Wooil (Department of Computer Science and Engineering, Incheon National University)
  • 배아라 (인천대학교 컴퓨터공학부) ;
  • 김우일 (인천대학교 컴퓨터공학부)
  • Received : 2018.11.09
  • Accepted : 2019.01.25
  • Published : 2019.01.31

Abstract

This paper proposes a speech enhancement method utilizing the feature compensation gain for robust speech recognition performances in noisy environments. In this paper we propose a speech enhancement method utilizing the feature compensation gain which is obtained from the PCGMM (Parallel Combined Gaussian Mixture Model)-based feature compensation method employing variational model composition. The experimental results show that the proposed method significantly outperforms the conventional front-end algorithms and our previous research over various background noise types and SNR (Signal to Noise Ratio) conditions in mismatched ASR (Automatic Speech Recognition) system condition. The computation complexity is significantly reduced by employing the noise model selection technique with maintaining the speech recognition performance at a similar level.

본 논문에서는 잡음 환경에 강인한 음성 인식 성능을 위해 특징 보상 이득을 이용한 음성 향상 기법을 제안한다. 본 논문에서는 변분모델 생성 기법을 채용한 병렬 결합된 가우스 혼합 모델(Parallel Combined Gaussian Mixture Model, PCGMM) 기반의 특징 보상 기법으로부터 계산할 수 있는 특징 보상 이득을 이용하는 음성 향상 기술을 제안한다. 불일치 환경 음성 인식 시스템 적용 환경에서 본 논문에서 제안하는 기법이 실험 결과에서 기존의 전처리 기법 및 이전 연구에서 제안된 특징 보상 기반의 음성 향상 기법에 비해 다양한 잡음 및 SNR(Signal to Noise Ratio) 조건에서 월등한 인식 성능을 나타내는 것을 확인한다. 또한 잡음 모델 선택 기법을 적용함으로써 음성 인식 성능을 유사한 수준으로 유지하면서 계산량을 대폭적으로 감축할 수 있다.

Keywords

GOHHBH_2019_v38n1_51_f0001.png 이미지

Fig. 1. Block diagram of the VMC-PCGMM-based feature compensation scheme.[5]

GOHHBH_2019_v38n1_51_f0002.png 이미지

Fig. 2. Block diagram of the proposed speech enhance-ment scheme employing the VMC-PCGMM-basedfeature compensation method.

GOHHBH_2019_v38n1_51_f0003.png 이미지

Fig. 3. Recognition performance for music noise in 5 dB SNR as change of the number of selected noise models (WER, %).

Table 1. Speech recognition performance with the matched ASR system condition (WER, %).

GOHHBH_2019_v38n1_51_t0001.png 이미지

Table 2. Speech recognition performance with the mis-matched ASR system condition (WER, %).

GOHHBH_2019_v38n1_51_t0002.png 이미지

References

  1. S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," Proc. IEEE Trans. on Acoustics, Speech and Signal, 27, 113-120 (1979). https://doi.org/10.1109/TASSP.1979.1163209
  2. P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven environmental compensation for speech recognition: a unified approach," Speech Communication, 24, 267-285 (1998). https://doi.org/10.1016/S0167-6393(98)00025-9
  3. W. Kim and J. H. L. Hansen, "Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise," Speech Communication, 53, 451-464 (2011). https://doi.org/10.1016/j.specom.2010.12.001
  4. J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains," Proc. IEEE Trans. on Speech and Audio, 2, 291-298 (1994). https://doi.org/10.1109/89.279278
  5. C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Computer Speech and Language, 9, 171-185 (1995). https://doi.org/10.1006/csla.1995.0010
  6. M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," Proc. IEEE Trans. on Speech and Audio, 4, 352-359 (1996). https://doi.org/10.1109/89.536929
  7. J. Du, L.-R. Dai, and Q. Huo, "Synthesized stereo mapping via deep neural networks for noisy speech recognition," ICASSP 2014, 1764-1768 (2014).
  8. K. Han, Y. He, D. Bangchi, E. F. -Lussifer, and D. L. Wang, "Deep neural network based spectral feature mapping for robust speech recognition," Interspeech 2015, 2484-2488 (2015).
  9. H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," ISCA ITRW ASR2000 (2000).
  10. W. Kim, "Speech enhancement based on feature compensation for independently applying to different types of speech recognition systems" (in Korean), J. Korea Institute of Information and Communication Engineering, 18, 2367-2374 (2014). https://doi.org/10.6109/jkiice.2014.18.10.2367
  11. ETSI ES 201 108, ETSI Standard Document, v1.1.2 (2000-04), 2000.
  12. http://htk.eng.cam.ac.uk