Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2005.12B.4.437

Speaker Normalization using Gaussian Mixture Model for Speaker Independent Speech Recognition  

Shin, Ok-Keun (한국해양대학교 IT공학부)
Abstract
For the purpose of speaker normalization in speaker independent speech recognition systems, experiments are conducted on a method based on Gaussian mixture model(GMM). The method, which is an improvement of the previous study based on vector quantizer, consists of modeling the probability distribution of canonical feature vectors by a GMM with an appropriate number of clusters, and of estimating the warp factor of a test speaker by making use of the obtained probabilistic model. The purpose of this study is twofold: improving the existing ML based methods, and comparing the performance of what is called 'soft decision' method with that of the previous study based on vector quantizer. The effectiveness of the proposed method is investigated by recognition experiments on the TIMIT corpus. The experimental results showed that a little improvement could be obtained tv adjusting the number of clusters in GMM appropriately.
Keywords
Speech Recognition; Speaker Normalization; VTLN; GMM;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 G. J. McLachlan, T. Krishnan, 'The EM Algorithm and Extentions', New York, Wiley, 1997
2 L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing, Vol.6, NO.1, pp.49-60. Jan., 1998   DOI   ScienceOn
3 L. Welling, S. Kanthak, H. Ney, 'Improved Methods for Vocal Tract Normalization', Proc. of ICASSP, pp.797-800, Mar., 1999   DOI
4 신옥근, '연속음성 인식기를 위한 벡터양자화기 기반의 화자 정규화', 한국음향학회지, 제23권 제8호, pp,583-589, 2004   과학기술학회마을
5 S. Umesh, L. Cohen and D. Nelson, 'Frequency Warping and the Mel Scale', IEEE Signal Processing Letters, pp.104-107, Vol.9, No.3, March, 2001   DOI   ScienceOn
6 E. Redner & H. Walker, 'Mixture Densities, Maximum Likelihood and the EM Algorithms', SIAM Review, Vol.26, No.2, pp.195-239, Apr., 1984   DOI   ScienceOn
7 J. Rissanen, 'A universal Prior for Integers and Estimation by Minimum Description Length', Annals of Statistics, Vol.11 No.2, pp.417-431, 1983
8 S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, The HTK Book. ver.3., Microsoft Corp., 2000
9 J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallet and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus: CDROM, NIST., 1993
10 P. Zhan, M. Westphal, 'Speaker Normalization Based on Frequency Warping', Proc. ICASSP '97. pp.1039-1042, 1997   DOI
11 S. Molau, S. Kanthak, H. Ney, 'Efficient Vocal Tract Normalization in Automatic Speech Recognition,' Proc. ESSV, pp.209-216, 2000