DOI QR코드

DOI QR Code

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network

심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용

  • Received : 2019.12.02
  • Accepted : 2019.12.26
  • Published : 2020.01.31

Abstract

In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.

가우스 혼합 모델-은닉 마코프 모델(Gaussian Mixture Model-Hidden Markov Model, GMM-HMM)을 이용하는 전통적인 음성인식 시스템에서는, 극점 필터링 기반의 켑스트럼 특징 정규화 방식이 잡음 환경에서 짧은 발화의 인식 성능을 향상시키는데 효과적이었다. 본 논문에서는 심층신경망(Deep Neural Network, DNN)을 이용하는 최신의 음성인식 시스템에서도 이 방식의 유용성이 있는지 검토한다. AURORA 2 DB에 대한 실험 결과, 특히 훈련 및 테스트 환경 사이의 불일치가 클 때에, 극점 필터링 기반의 켑스트럼 평균 분산 정규화 방식이 극점 필터링을 사용하지 않는 방식에 비해 매우 짧은 발화의 인식 성능을 개선시킴을 보여 준다.

Keywords

References

  1. J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, "An overview of noise-robust automatic speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process, 22, 745-777 (2014).
  2. Z. Zhang, J. Geiger, A. Mousa, J. Pohjalainena, W. Jin, and B. Schuller, "Deep learning for environ-mentally robust speech recognition: an overview of recent developments," ACM Trans. Intell. Syst. Tech. 9, 1-12 (2018).
  3. M. L. Seltzer, D. Yu, and Y. Wang, " An investigation of deep neural networks for noise robust speech recognition," Proc. IEEE Int. Conf. Acoust. Speech, Signal Process, 7398-7402 (2013).
  4. B. K. Choi, S. M. Ban, and H. S. Kim, "Cepstral feature normalization methods using pole filtering and scale normalization for robust speech recognition" (in Korean), J. Acoust. Soc. Kr. 34, 316-320 (2015). https://doi.org/10.7776/ASK.2015.34.4.316
  5. B. K. Choi, S. M. Ban, and H. S. Kim, "Selective pole filtering based feature normalization for performance improvement of short utterance recognition in noisy environments" (in Korean), Phonetics and Speech Sciences, 9, 103-110 (2017). https://doi.org/10.13064/KSSS.2017.9.2.103
  6. D. Naik, "Pole-filtered cepstral mean subtraction," Proc. IEEE Int. Conf. Acoust. Speech, Signal Process, 157-160 (1995).
  7. H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," Proc. ISCA ITRW ASR2000, 181-188 (2000).
  8. Kaldi Speech Recognition Toolkit, https://kaldi-asr.org/, (Last viewed January 06, 2020).