Browse > Article

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition  

Shin Ok-keun (한국해양대학교 IT공학부)
Abstract
Proposed is a speaker normalization method based on vector quantizer for continuous speech recognition (CSR) system in which no acoustic information is made use of. The proposed method, which is an improvement of the previously reported speaker normalization scheme for a simple digit recognizer, builds up a canonical codebook by iteratively training the codebook while the size of codebook is increased after each iteration from a relatively small initial size. Once the codebook established, the warp factors of speakers are estimated by comparing exhaustively the warped versions of each speaker's utterance with the codebook. Two sets of phones are used to estimate the warp factors: one, a set of vowels only. and the other, a set composed of all the Phonemes. A Piecewise linear warping function which corresponds to the estimated warp factor is adopted to warp the power spectrum of the utterance. Then the warped feature vectors are extracted to be used to train and to test the speech recognizer. The effectiveness of the proposed method is investigated by a set of recognition experiments using the TIMIT corpus and HTK speech recognition tool kit. The experimental results showed comparable recognition rate improvement with the formant based warping method.
Keywords
Speech recognition; Speaker normalization; Warping; Vector Quantizer; CSR;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing, 6(1), 49-60. Jan. 1998   DOI   ScienceOn
2 J. S. Garofolo, L. F. Lamel. W. M. Fisher, J. G. Fiscus, D. S. Pallet and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus: CDROM. NIST., 1993
3 S. Molau, S. Kanthak and H. Ney, 'Efficient Vocal Tract Normalization in Automatic Speech Recognition', Proc. ESSV, 209-216, Sept. 2000
4 E. Edie and H. Gish, 'A Parametric Approach to Vocal Tract Length Normalization', Proc. ICASSP'96, 346-349, 1996
5 J. Hogberg, 'Prediction of formant frequencies from linear combinations of filterbank and cepstral coefficient', Speech, Music and Hearing Quarterly Progress and Status Report, 33, 41-49. Institutionen for tal, musik och horsel, 1997
6 Y. Linde, A. Buzo and R. M. Gray, 'An algorithm for vector quantizer design', IEEE Transactions on Communications, 28(1), 84-95, 1980   DOI
7 M.A. Bacchiani, Speech Recognition System Design Based On Automatically Derived Units, Ph. D. Thesis, Boston University, 1999
8 신옥근, 'DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화', 한국음향학회지, 22(4), 299-307, 2003
9 P. Zhan and A. Waibel. 'Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition', Language Technologies Institute Technical Report : CMULTI-97-150, Carnegie Melon University, May, 1997
10 S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, The HTK Book. ver. 3., Microsoft CorP., 2000
11 S. Umesh, L. Cohen and D. Nelson, 'Frequency Warping and the Mel Scale', IEEE Signal Processing Letters, pp.l04-107, 9(3), March 2001   DOI   ScienceOn