DOI QR코드

DOI QR Code

Noise Robust Speech Recognition Based on Noisy Speech Acoustic Model Adaptation

잡음음성 음향모델 적응에 기반한 잡음에 강인한 음성인식

  • Received : 2014.03.18
  • Accepted : 2014.06.16
  • Published : 2014.06.30

Abstract

In the Vector Taylor Series (VTS)-based noisy speech recognition methods, Hidden Markov Models (HMM) are usually trained with clean speech. However, better performance is expected by training the HMM with noisy speech. In a previous study, we could find that Minimum Mean Square Error (MMSE) estimation of the training noisy speech in the log-spectrum domain produce improved recognition results, but since the proposed algorithm was done in the log-spectrum domain, it could not be used for the HMM adaptation. In this paper, we modify the previous algorithm to derive a novel mathematical relation between test and training noisy speech in the cepstrum domain and the mean and covariance of the Multi-condition TRaining (MTR) trained noisy speech HMM are adapted. In the noisy speech recognition experiments on the Aurora 2 database, the proposed method produced 10.6% of relative improvement in Word Error Rates (WERs) over the MTR method while the previous MMSE estimation of the training noisy speech produced 4.3% of relative improvement, which shows the superiority of the proposed method.

Keywords

References

  1. Gales, M. (1995). Model based techniques for noise- robust speech recognition. Ph.D. Dissertation, University of Cambridge, United Kingdom.
  2. Ball, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust., Speech, Signal Process. Vol. 27, No. 2, 113-122. https://doi.org/10.1109/TASSP.1979.1163209
  3. Moreno, P. J. (1996). Speech recognition in noisy environments. Ph.D. Dissertation, Carnegie Mellon University, United States of America.
  4. Hirsch, H. G. & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in Proc. ICSLP. 18-20.
  5. Xu, H., Tan, Z. -H., Dalsgaard, P. & Lindberg, B. (2007). Noise condition-dependent training based on noise classification and SNR estimation. IEEE Trans. Audio, Speech and Language Process. Vol. 15, No. 8, 2431-2443. https://doi.org/10.1109/TASL.2007.906188
  6. Kalinli, O., Seltzer, M. L., Droppo, J., & Acero, A. (2010). Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech and Language Process. Vol. 18, No. 8, 1889-1901. https://doi.org/10.1109/TASL.2010.2040522
  7. Chung, Y. & Hansen, J.H.L. (2013). Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution. EURASIP Journal on Audio, Speech, and Music Processing, 2013:12, 1-14.
  8. Gopinath, R. A., Gales, M., Gopalakrishnan, P. S., Balakrishnan-Aiyer, S. & Pocheny M. A. (1995). Robust speech recognition in Noise : Performance of the IBM continuous speech recognizer on the ARPA noise spoke task. in Proc. ARPA Spoken Language System Technology. 127-130.
  9. ETSI draft standard doc., Speech processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm. ETSI Standard ES 202 050, 2002.
  10. Young, S. (1993). HTK: Hidden Markov Model Toolkit V3.4.1. Cambridge University, Engineering Department, Speech Group.