Browse > Article

A New Power Spectrum Warping Approach to Speaker Warping  

유일수 (성균관대학교)
김동주 (성균관대학)
노용완 (성균관대학)
홍광석 (성균관대학교)
Publication Information
Abstract
The method of speaker normalization has been known as the successful method for improving the accuracy of speech recognition at speaker independent speech recognition system. A frequency warping approach is widely used method based on maximum likelihood for speaker normalization. This paper propose a new power spectrum warping approach to making improvement of speaker normalization better than a frequency warping. Th power spectrum warping uses Mel-frequency cepstrum analysis(MFCC) and is a simple mechanism to performing speaker normalization by modifying the power spectrum of Mel filter bank in MFCC. Also, this paper propose the hybrid VTN combined the Power spectrum warping and a frequency warping. Experiment of this paper did a comparative analysis about the recognition performance of the SKKU PBW DB applied each speaker normalization approach on baseline system. The experiment results have shown that a frequency warping is 2.06%, the power spectrum is 3.06%, and hybrid VTN is 4.07% word error rate reduction as of word recognition performance of baseline system.
Keywords
Speaker Normalization; Power spectrum warping; Frequency warping; MFCC;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J.S. Youn, K. W. Chung and K.S. Hong, 'A Continuous Digit Speech Recognition Applied Vowel Sequence and VCCV Unit HMM', Proceeding of the Acoustical Society of Korea, Vol. 20, No.2, 2001
2 T.D. Rossing, P. Wheeler and F.R. Moore, 'The Science of Sound' , Addition Wesley, 2002
3 R. Roth et al, 'Dragon systems' 1994 Large Vocabulary Continuous Speech Recognizer' , in Proc. Spoken Language Systems Technology Workshop, 1995
4 L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing, Vol.6, NO.1, pp.49-60. Jan., 1998   DOI   ScienceOn
5 L. Welling, H. Ney, S. Kanthak, 'Speaker Adaptive Modeling by Vocal Tract Normalization' , IEEE Transaction on Speech and Audio Processing, Vol. 10, No.6, September 2002   DOI   ScienceOn
6 A. Andreou, T. Kam, and J. Cohen, 'Experiments in Vocal Tract Normalization' , in Proc. CAIP Workshop: Frontiers in Speech Recognition II, 1994
7 Michael Seltzer, 'SPHINX Ill Signal Processing Front End Specification' , CMU Speech Group, August 1999
8 Y. Linde, A. Duzo, R. M. Gray, 'An Algorithm for Vector Quantizer Design' , IEEE Transaction on COM., Vol. 28, January 1980