[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2005.12B.4.437

Speaker Normalization using Gaussian Mixture Model for Speaker Independent Speech Recognition

Shin, Ok-Keun (한국해양대학교 IT공학부)

Publication Information

The KIPS Transactions:PartB / v.12B, no.4, 2005 , pp. 437-442 More about this Journal

Abstract

For the purpose of speaker normalization in speaker independent speech recognition systems, experiments are conducted on a method based on Gaussian mixture model(GMM). The method, which is an improvement of the previous study based on vector quantizer, consists of modeling the probability distribution of canonical feature vectors by a GMM with an appropriate number of clusters, and of estimating the warp factor of a test speaker by making use of the obtained probabilistic model. The purpose of this study is twofold: improving the existing ML based methods, and comparing the performance of what is called 'soft decision' method with that of the previous study based on vector quantizer. The effectiveness of the proposed method is investigated by recognition experiments on the TIMIT corpus. The experimental results showed that a little improvement could be obtained tv adjusting the number of clusters in GMM appropriately.

Keywords

Speech Recognition; Speaker Normalization; VTLN; GMM;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	G. J. McLachlan, T. Krishnan, 'The EM Algorithm and Extentions', New York, Wiley, 1997
2	L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing, Vol.6, NO.1, pp.49-60. Jan., 1998 DOI ScienceOn
3	L. Welling, S. Kanthak, H. Ney, 'Improved Methods for Vocal Tract Normalization', Proc. of ICASSP, pp.797-800, Mar., 1999 DOI
4	신옥근, '연속음성 인식기를 위한 벡터양자화기 기반의 화자 정규화', 한국음향학회지, 제23권 제8호, pp,583-589, 2004 과학기술학회마을
5	S. Umesh, L. Cohen and D. Nelson, 'Frequency Warping and the Mel Scale', IEEE Signal Processing Letters, pp.104-107, Vol.9, No.3, March, 2001 DOI ScienceOn
6	E. Redner & H. Walker, 'Mixture Densities, Maximum Likelihood and the EM Algorithms', SIAM Review, Vol.26, No.2, pp.195-239, Apr., 1984 DOI ScienceOn
7	J. Rissanen, 'A universal Prior for Integers and Estimation by Minimum Description Length', Annals of Statistics, Vol.11 No.2, pp.417-431, 1983
8	S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, The HTK Book. ver.3., Microsoft Corp., 2000
9	J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallet and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus: CDROM, NIST., 1993
10	P. Zhan, M. Westphal, 'Speaker Normalization Based on Frequency Warping', Proc. ICASSP '97. pp.1039-1042, 1997 DOI
11	S. Molau, S. Kanthak, H. Ney, 'Efficient Vocal Tract Normalization in Automatic Speech Recognition,' Proc. ESSV, pp.209-216, 2000

KSCI

Speaker Normalization using Gaussian Mixture Model for Speaker Independent Speech Recognition 화자독립 음성인식을 위한 GMM 기반 화자 정규화

Speaker Normalization using Gaussian Mixture Model for Speaker Independent Speech Recognition