Cepstrum PDF Normalization Method for Speech Recognition in Noise Environment

잡음환경에서의 음성인식을 위한 켑스트럼의 확률분포 정규화 기법

  • 석용호 (㈜엠큐브웍스) ;
  • 이황수 (한국과학기술원 전자전산학과) ;
  • 최승호 (서울산업대학교 전자정보공학과)
  • Published : 2005.05.01

Abstract

In this paper, we Propose a novel cepstrum normalization method which normalizes the probability density function (pdf) of cepstrum for robust speech recognition in additive noise environments. While the conventional methods normalize the first- and/or second-order statistics such as the mean and/or variance of the cepstrum. the proposed method fully normalizes the statistics of cepstrum by making the pdfs of clean and noisy cepstrum identical to each other For the target Pdf, the generalized Gaussian distribution is selected to consider various densities. In recognition phase, we devise a table lookup method to save computational costs. From the speaker-independent isolated-word recognition experiments, we show that the Proposed method gives improved Performance compared with that of the conventional methods, especially in heavy noise environments.

본 논문에서는 부가잡음 환경에서의 강인한 음성인식을 위해 켑스트럼의 확률밀도 (pdf) 정규화 기법을 제안한다. 기존의 방법들은 켑스트럼의 평균 및 분산 등 주로 1, 2차 통계치 만을 정규화 하지만 제안한 방법은 깨끗한 음성과 잡음이 부가된 음성의 켑스트럼의 pdf를 동일하게 함으로써 켑스트럼의 통계치를 완벽하게 정규화 한다. 목표 pdf로는 다양한 확률분포를 고려하기 위하여 일반 (generalized) 가우시안 분포를 선택하였다. 또한 인식시 계산량을 감축하기 위하여 표 참조방법 (table lookup method)를 개발하였다. 화자독립 고립단어 인식 실험에서 제안된 기법이 기존 방법들보다 우수한 성능을 보였으며, 특히 잡음이 심한 환경에서 성능향상이 두드러졌다.

Keywords

References

  1. J. C. Junqua and J. P. Haton, Robustness in Automatic Speech Recognition, (Kluwer Academic Publishers, 1996)
  2. M. J. F. Gales and S. J. Young, 'Robust continuous speech recognition using parallel model combination,' IEEE Trans. on Speech and Audio Process., 4 (5), 352-259, Sep. 1996 https://doi.org/10.1109/89.536929
  3. A. Acero, Acoustical and environmental robustness in automatic speech recognition, (Kluwer Academic Polishers, Boston, MA, 1993)
  4. P. J. Moreno, B. Raj, E. Gouvea, and R. M. Stern, 'Multivariate-Gaussian-based cepstral normalization for robust speech recognition,' in Proc. ICASSP, 137-140, May 1995
  5. 김우일, 고한석, '시변 잡음에 대처하기 위한 다중 모델을 이용한 PCMM기반 특징 보상 기법,' 한국음향학회지, 23 (6), 473-480, Aug., 2004
  6. O. Viikki, D. Bye, and K. Laurila, 'A recursive feature vector normalization approach for robust speech recognition in noise,' in Proc. ICASSP, 733-736, 1998
  7. Y. Tohkura, 'A weighted cepstral distance measure for speech recognition,' IEEE Trans. on Acoust. Speech and Signal Process., 35 (10), 1414-1422, Oct. 1987 https://doi.org/10.1109/TASSP.1987.1165058
  8. M. R. Schroeder, 'Direct (nonrecursive) relations between cepstrum and predictor coefficients,' IEEE Trans. on Acoust. Speech and Signal Process., 29 (2), 297-301, Apr. 1981 https://doi.org/10.1109/TASSP.1981.1163546
  9. J. C. Junqua and H. Wakita, 'A comparative study of cepstral lifters and distance measures for all pole models of speech in noise,' in Proc. of ICASSP, 476-479, May 1989
  10. H. A. David, Order statistics, (John Wiley & Sons, NY, 1981)
  11. F. N. David and N. L. Johnson, 'Statistical treatment of censored data, Part I. fundamental formulae,' Biometrika, 41, 228-240, 1956
  12. S. A. Kassam, Signal detection in non-Gaussian noise, (Springer-Verlag, NY, 1988)