Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction

필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상

  • 신광호 (영남대학교 정보통신공학과) ;
  • 최숙남 (영남대학교 정보통신공학과) ;
  • 정현열 (영남대학교 정보통신공학과)
  • Received : 2010.05.01
  • Accepted : 2010.06.25
  • Published : 2010.07.31

Abstract

In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.

본 논문에서는 기존의 CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization) 방법의 인식성능을 향상시키기 위하여, 필터 뱅크 서브 밴드 영역에서 잡음을 차감하는 방법과 CLSFN을 결합하는 방법, 즉 FSFN (Filter bank sub-band energy subtraction based CLSFN)을 제안하였다. 이 방법은 음성으로부터 특징 파라미터를 추출할 때 필터 뱅크 서브 밴드 영역에서 잡음을 제거하여 켑스트럼 특징을 향상시키고, 이에 대한 켑스트럼 거리를 이용하여 음성/묵음 분류의 정확도를 개선함으로써 기존 CLSFN 방법에 비해 향상된 인식성능을 얻을 수 있다. Aurora 2.0 DB를 이용한 실험결과, 제안하는 FSFN 방법은 CLSFN 방법에 비해 평균 단어 정확도 (word accuracy)가 약 2% 향상되었으며, CMVN (Cepstral Mean and Variance Normalization)과의 결합에서도 기존 모든 방법에 비해 가장 우수한 인식성능을 나타내어 제안 방법의 유효성을 확인할 수 있었다.

Keywords

References

  1. K.S. Yao, E. Visser, O.W. Kwon and T.W. Lee, "A Speech Processing Front-End with Eigenspace Normalization for Robust Speech Recognition in Noisy Automobile Environments," Proc. Eurospeech, pp.9-12, Sep. 2003.
  2. W.Z. Zhu and D.O. Shaughnessy, "Log Energy Dynamic Range Normalization for Robust for Robust Speech Recognition," Proc. ICASSP, Vol.1, pp.245-248, 2005.
  3. C.-F. Tai and J.-W. Hung, "Silence Energy Normalization for Robust Speech Recognition in Additive Noise Environments," Proc. ICSLP, pp.2558-2561, Sep. 2006.
  4. C.-C. Wang, C.-A. Pan and J.-W. Hung, "Silence Feature Normalization for Robust Speech Recognition in Additive Noise Environments," Proc. ICSLP, pp.1028-1031, Sep. 2008.
  5. 신광호, 정현열, "강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화," 한국음향학회지, Vol.29, No.4, pp. 278-285, 2010.
  6. J. Chen, K.K. Paliwal and S. Nakamura, "Sub-Band Based Additive Noise Removal for Robust Speech Recognition," Proc. Eurospeech, pp. 571-574, 2001.
  7. D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong and A. Acero, "A Minimum Mean Square Error Noise Reduction Algorithm on Mel Frequency Cepstra for Robust Speech Recognition," Proc. ICASSP, Las Vegas, USA, 2008.
  8. G.I. Cohen and B. Berdugo, "Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement," IEEE Signal Process. Lett., Vol.9, No.1, pp. 12-15, Jan. 2002. https://doi.org/10.1109/97.988717
  9. R. Martin, "Spectral Subtraction Based on Minimum Statistics," Proc. 7th EUSIPCO94, pp.1182-1185, 1994.
  10. H.-G Hirsch and D. Pearce, "The Aurora Experimental Framework for The Performance Evaluation of Speech Recognition Systems Under Noisy Conditions," ISCA ITRW ASR, France, Sep. 2000.
  11. O. Viikki and K. Laurila, "Cepstral Domain Segmental Feature Vector Normalization for Noise Robust Speech Recognition," Speech Communication, Vol.25, pp.133-147, 1998. https://doi.org/10.1016/S0167-6393(98)00033-8