Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network

Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon;

doi:10.7776/ASK.2020.39.1.064

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 39 Issue 1
/
Pages.64-68
/
2020
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network

심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용

Han, Jaemin ;
Kim, Min Sik ;
Kim, Hyung Soon (Department of Electronics Engineering, Pusan National University)

한재민 (부산대학교 전자공학과) ;
김민식 (부산대학교 전자공학과) ;
김형순 (부산대학교 전자공학과)

Received : 2019.12.02
Accepted : 2019.12.26
Published : 2020.01.31

https://doi.org/10.7776/ASK.2020.39.1.064 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.

가우스 혼합 모델-은닉 마코프 모델(Gaussian Mixture Model-Hidden Markov Model, GMM-HMM)을 이용하는 전통적인 음성인식 시스템에서는, 극점 필터링 기반의 켑스트럼 특징 정규화 방식이 잡음 환경에서 짧은 발화의 인식 성능을 향상시키는데 효과적이었다. 본 논문에서는 심층신경망(Deep Neural Network, DNN)을 이용하는 최신의 음성인식 시스템에서도 이 방식의 유용성이 있는지 검토한다. AURORA 2 DB에 대한 실험 결과, 특히 훈련 및 테스트 환경 사이의 불일치가 클 때에, 극점 필터링 기반의 켑스트럼 평균 분산 정규화 방식이 극점 필터링을 사용하지 않는 방식에 비해 매우 짧은 발화의 인식 성능을 개선시킴을 보여 준다.

Keywords

References

J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, "An overview of noise-robust automatic speech recognition," IEEE/ACM Trans. Audio, Speech, Language Process, 22, 745-777 (2014).
Z. Zhang, J. Geiger, A. Mousa, J. Pohjalainena, W. Jin, and B. Schuller, "Deep learning for environ-mentally robust speech recognition: an overview of recent developments," ACM Trans. Intell. Syst. Tech. 9, 1-12 (2018).
M. L. Seltzer, D. Yu, and Y. Wang, " An investigation of deep neural networks for noise robust speech recognition," Proc. IEEE Int. Conf. Acoust. Speech, Signal Process, 7398-7402 (2013).
B. K. Choi, S. M. Ban, and H. S. Kim, "Cepstral feature normalization methods using pole filtering and scale normalization for robust speech recognition" (in Korean), J. Acoust. Soc. Kr. 34, 316-320 (2015). https://doi.org/10.7776/ASK.2015.34.4.316
B. K. Choi, S. M. Ban, and H. S. Kim, "Selective pole filtering based feature normalization for performance improvement of short utterance recognition in noisy environments" (in Korean), Phonetics and Speech Sciences, 9, 103-110 (2017). https://doi.org/10.13064/KSSS.2017.9.2.103
D. Naik, "Pole-filtered cepstral mean subtraction," Proc. IEEE Int. Conf. Acoust. Speech, Signal Process, 157-160 (1995).
H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," Proc. ISCA ITRW ASR2000, 181-188 (2000).
Kaldi Speech Recognition Toolkit, https://kaldi-asr.org/, (Last viewed January 06, 2020).

The Journal of the Acoustical Society of Korea (한국음향학회지)

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network

심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)