DOI QR코드

DOI QR Code

SNR 매핑을 이용한 환경적응 기반 음성인식

Speech Recognition based on Environment Adaptation using SNR Mapping

  • 정용주 (계명대학교 전자공학과)
  • 투고 : 2014.03.05
  • 심사 : 2014.05.15
  • 발행 : 2014.05.31

초록

다 모델 기반의 음성인식기는 음성인식에서 매우 성공적임이 알려져 있다. 그것은 다양한 신호-대-잡음비(SNR)와 잡음종류에 해당하는 다수의 HMM을 사용함으로서 선택된 음향모델이 인식잡음음성에 매우 근접한 일치성을 가질 수 있기 때문이다. 그러나 실제 사용시에 HMM의 개수가 제한됨에 따라서 음향모델의 불일치는 여전히 문제로 남아 있다. 본 논문에서는 인식잡음음성과 HMM 간의 SNR 불일치를 줄이고자 이들 간의 최적의 SNR 매핑 (mapping)을 실험적으로 결정하였다. 인식잡음음성으로 부터 추정된 SNR 값을 사용하는 대신 제안된 SNR 매핑을 사용함으로서 향상된 인식결과를 얻을 수 있었다. 다 모델 기반인식기에 제안된 방법을 적용하여 Aurora 2 데이터베이스에 대해서 인식 실험한 결과 기존의 MTR 이나 다 모델 기반 음성인식기에 비해서 6.3%와 9.4%의 상대적 단어 오인식율 감소를 이룰 수 있었다.

Multiple-model based speech recognition framework (MMSR) has been known to be very successful in speech recognition. Since it uses multiple hidden Markov modes (HMMs) that corresponds to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can have a close match with the test noisy speech. However, since the number of HMM sets is limited in practical use, the acoustic mismatch still remains as a problem. In this study, we experimentally determined the optimal SNR mapping between the test noisy speech and the HMM set to mitigate the mismatch between them. Improved performance was obtained by employing the SNR mapping instead of using the estimated SNR from the test noisy speech. When we applied the proposed method to the MMSR, the experimental results on the Aurora 2 database show that the relative word error rate reduction of 6.3% and 9.4% was achieved compared to a conventional MMSR and multi-condition training (MTR), respectively.

키워드

참고문헌

  1. S. Ball, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process. vol. 27, no. 2, 1979, pp. 113-120. https://doi.org/10.1109/TASSP.1979.1163209
  2. M. J. F. Gales, "Model based techniques for noise-robust speech recognition," Ph.D. Dissertation, University of Cambridge, 1996.
  3. P. J. Moreno, "Speech Recognition in noisy environments," Ph.D. Dissertation, Carnegie Mellon University, 1996.
  4. J. H. L. Hansen and M. A. Clements, "Constrained iterative speech enhancement with application to speech recognition," IEEE Trans. on Signal Processing, vol. 39, no. 4, 1991, pp. 795-805. https://doi.org/10.1109/78.80901
  5. J. Choi, "Speech and noise recognition system by neural network," J. of the Korea Institute of Electronic Communication Sciences, vol. 5, no. 4, 2010, pp. 357-362.
  6. C. Lee and D. kim, "Adaptive noise reduction of speech using wavelet transform," J. of the Korea Institute of Electronic Communication Sciences, vol. 4, no. 3, 2009, pp. 190-196.
  7. J.-S. Choi, "Noise reduction algorithm in speech by Wiener filter," J. of the Korea Institute of Electronic Communication Sciences, vol. 8, no. 9, 2013, pp. 1293-1298. https://doi.org/10.13067/JKIECS.2013.8.9.1293
  8. H. G. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," In Proc. the Int. Conf. on Spoken Language Processing, Bejing, China, 2000, pp. 18-20.
  9. H. Xu, Z. H. Tan, P. Dalsgaard, and B. Lindberg, "Robust speech recognition on noise and SNR classification-a multiple-model framework," In Proc. INTERSPEECH, Lisboa, Portugal, 2005, pp. 977-980.
  10. H. Xu, Z. H. Tan, P. Dalsgaard, and B Lindberg, "Noise condition dependent training based on noise classification and SNR estimation," IEEE Trans. Audio, Speech, Language Process., vol. 15, no. 8, 2007, pp. 2431-2443. https://doi.org/10.1109/TASL.2007.906188
  11. L. Lamel, L. Rabiner, A. Rosenberg, and J. Wilpon, "An improved endpoint detector for isolated word recognition," IEEE Trans. Acoust., Speech, Signal Process. vol. 29, no. 4, 1981, pp. 777-785. https://doi.org/10.1109/TASSP.1981.1163642