[KSCI] Korea Science Citation Index Service

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition

Shin Ok-keun (한국해양대학교 IT공학부)

Publication Information

The Journal of the Acoustical Society of Korea / v.23, no.8, 2004 , pp. 583-589 More about this Journal

Abstract

Proposed is a speaker normalization method based on vector quantizer for continuous speech recognition (CSR) system in which no acoustic information is made use of. The proposed method, which is an improvement of the previously reported speaker normalization scheme for a simple digit recognizer, builds up a canonical codebook by iteratively training the codebook while the size of codebook is increased after each iteration from a relatively small initial size. Once the codebook established, the warp factors of speakers are estimated by comparing exhaustively the warped versions of each speaker's utterance with the codebook. Two sets of phones are used to estimate the warp factors: one, a set of vowels only. and the other, a set composed of all the Phonemes. A Piecewise linear warping function which corresponds to the estimated warp factor is adopted to warp the power spectrum of the utterance. Then the warped feature vectors are extracted to be used to train and to test the speech recognizer. The effectiveness of the proposed method is investigated by a set of recognition experiments using the TIMIT corpus and HTK speech recognition tool kit. The experimental results showed comparable recognition rate improvement with the formant based warping method.

Keywords

Speech recognition; Speaker normalization; Warping; Vector Quantizer; CSR;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing, 6(1), 49-60. Jan. 1998 DOI ScienceOn
2	J. S. Garofolo, L. F. Lamel. W. M. Fisher, J. G. Fiscus, D. S. Pallet and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus: CDROM. NIST., 1993
3	S. Molau, S. Kanthak and H. Ney, 'Efficient Vocal Tract Normalization in Automatic Speech Recognition', Proc. ESSV, 209-216, Sept. 2000
4	E. Edie and H. Gish, 'A Parametric Approach to Vocal Tract Length Normalization', Proc. ICASSP'96, 346-349, 1996
5	J. Hogberg, 'Prediction of formant frequencies from linear combinations of filterbank and cepstral coefficient', Speech, Music and Hearing Quarterly Progress and Status Report, 33, 41-49. Institutionen for tal, musik och horsel, 1997
6	Y. Linde, A. Buzo and R. M. Gray, 'An algorithm for vector quantizer design', IEEE Transactions on Communications, 28(1), 84-95, 1980 DOI
7	M.A. Bacchiani, Speech Recognition System Design Based On Automatically Derived Units, Ph. D. Thesis, Boston University, 1999
8	신옥근, 'DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화', 한국음향학회지, 22(4), 299-307, 2003
9	P. Zhan and A. Waibel. 'Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition', Language Technologies Institute Technical Report : CMULTI-97-150, Carnegie Melon University, May, 1997
10	S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, The HTK Book. ver. 3., Microsoft CorP., 2000
11	S. Umesh, L. Cohen and D. Nelson, 'Frequency Warping and the Mel Scale', IEEE Signal Processing Letters, pp.l04-107, 9(3), March 2001 DOI ScienceOn

KSCI

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition 연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition