[KSCI] Korea Science Citation Index Service

Vocal Tract Length Normalization for Speech Recognition

지상문 (경성대학교 컴퓨터과학과)

Publication Information

Journal of the Korea Institute of Information and Communication Engineering / v.7, no.7, 2003 , pp. 1380-1386 More about this Journal

Abstract

Speech recognition performance is degraded by the variation in vocal tract length among speakers. In this paper, we have used a vocal tract length normalization method wherein the frequency axis of the short-time spectrum associated with a speaker's speech is scaled to minimize the effects of speaker's vocal tract length on the speech recognition performance In order to normalize vocal tract length, we tried several frequency warping functions such as linear and piece-wise linear function. Variable interval piece-wise linear warping function is proposed to effectively model the variation of frequency axis scale due to the large variation of vocal tract length. Experimental results on TIDIGITS connected digits showed the dramatic reduction of word error rates from 2.15% to 0.53% by the proposed vocal tract normalization.

Keywords

Vocal tract length normalization; Frequency warping function; Speech recognition;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	신옥근, 'DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화,' 한국음향학회지, 제 22권 제 4호, 299-307, 2003
2	E. B. Gouvea, 'Acoustic-Feature-based frequency warping for speaker normalization,' Thesis, Carneigie Mellon University, 1998
3	R. G. Reonard, 'A database for speaker-independent digit recognition,' Proc. ICASSP, 3, 42.11/1-4, 1984
4	L. F. Uebel and P. C. Woodland, 'An investigation into vocal tract length normalization,' Proc. EuroSpeech, Vol. 6, 2527-2530, 1999
5	M. Pitz, S. Molau, R. Schluter, and H. Ney, 'Vocal tract normalization equals linear transformation in cepstral space,' Proc. EuroSpeech, E31, 2653-2656, 2001
6	Y. Ono, H. Wakita and Y. Zhao, 'Speaker normalization using constrained spectral shifts in au ditory filter domain,' EuroSpeech, 1, 355-358, 1993
7	C. H. Lee, C. H. Lin, and B. H. Juang, 'A study on speaker adaptation of continuous density HMM parameters,' Proc. ICASSP, 1, 145-148, 1991
8	L. Lee and R. C. Rose, "A frequencywarping approach to speaker normalization,' IEEE Trans. on Speech and Audio Processing, 6 (1), 49-60, 1998 DOI ScienceOn
9	J. McDonough, W. Byrne, and X. Luo, 'Speaker normalization with all-pass transforms,', Proc. ICSLP, paper no 869, 1998
10	T. D. Rossing, The science of sound, Addison-Wesley publishing company, p.320, 1989
11	C. Leggetter and P. Woodland, 'Maximumlikelihood linear regression for speaker adaptation of continuous density hidden markov models,' Computer Speech and Language, 9, 171-185, 1995 DOI ScienceOn

KSCI

Vocal Tract Length Normalization for Speech Recognition 음성인식을 위한 성도 길이 정규화

Vocal Tract Length Normalization for Speech Recognition