Browse > Article

Vocal Tract Length Normalization for Speech Recognition  

지상문 (경성대학교 컴퓨터과학과)
Abstract
Speech recognition performance is degraded by the variation in vocal tract length among speakers. In this paper, we have used a vocal tract length normalization method wherein the frequency axis of the short-time spectrum associated with a speaker's speech is scaled to minimize the effects of speaker's vocal tract length on the speech recognition performance In order to normalize vocal tract length, we tried several frequency warping functions such as linear and piece-wise linear function. Variable interval piece-wise linear warping function is proposed to effectively model the variation of frequency axis scale due to the large variation of vocal tract length. Experimental results on TIDIGITS connected digits showed the dramatic reduction of word error rates from 2.15% to 0.53% by the proposed vocal tract normalization.
Keywords
Vocal tract length normalization; Frequency warping function; Speech recognition;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 신옥근, 'DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화,' 한국음향학회지, 제 22권 제 4호, 299-307, 2003
2 E. B. Gouvea, 'Acoustic-Feature-based frequency warping for speaker normalization,' Thesis, Carneigie Mellon University, 1998
3 R. G. Reonard, 'A database for speaker-independent digit recognition,' Proc. ICASSP, 3, 42.11/1-4, 1984
4 L. F. Uebel and P. C. Woodland, 'An investigation into vocal tract length normalization,' Proc. EuroSpeech, Vol. 6, 2527-2530, 1999
5 M. Pitz, S. Molau, R. Schluter, and H. Ney, 'Vocal tract normalization equals linear transformation in cepstral space,' Proc. EuroSpeech, E31, 2653-2656, 2001
6 Y. Ono, H. Wakita and Y. Zhao, 'Speaker normalization using constrained spectral shifts in au ditory filter domain,' EuroSpeech, 1, 355-358, 1993
7 C. H. Lee, C. H. Lin, and B. H. Juang, 'A study on speaker adaptation of continuous density HMM parameters,' Proc. ICASSP, 1, 145-148, 1991
8 L. Lee and R. C. Rose, "A frequencywarping approach to speaker normalization,' IEEE Trans. on Speech and Audio Processing, 6 (1), 49-60, 1998   DOI   ScienceOn
9 J. McDonough, W. Byrne, and X. Luo, 'Speaker normalization with all-pass transforms,', Proc. ICSLP, paper no 869, 1998
10 T. D. Rossing, The science of sound, Addison-Wesley publishing company, p.320, 1989
11 C. Leggetter and P. Woodland, 'Maximumlikelihood linear regression for speaker adaptation of continuous density hidden markov models,' Computer Speech and Language, 9, 171-185, 1995   DOI   ScienceOn