Browse > Article
http://dx.doi.org/10.7776/ASK.2010.29.1.056

A Study on Improved MDL Technique for Optimization of Acoustic Model  

Cho, Hoon-Young (한국전자통신연구원 소프트웨어연구부문 음성언어정보연구부)
Kim, Sang-Hun (한국전자통신연구원 소프트웨어연구부문 음성언어정보연구부)
Abstract
This paper describes optimization methods of acoustic models in HMM-based continuous speech recognition. Most of the conventional speech recognition systems use the same number of Gaussian mixture components for each HMM state. However, since the number of data samples available for each state is different from each other, it is possible to reduce the overall number of model parameters and the computational cost at the decoding step by optimizing the number of Gaussian mixture components. In this study, we introduced the Gaussian mixture weight term at the merging stage of Gaussian components in the minimum description length (MDL) based acoustic modeling optimization. Experimental results showed that the proposed method can obtain better ASR accuracy than the previous optimization method which does not consider the Gaussian mixture weight term.
Keywords
Continuous Speech Recognition; Acoustic Model; Optimization; Minimum Description Length (MDL);
Citations & Related Records
연도 인용수 순위
  • Reference
1 G. F. G. Yared, F. Violaro and L. C. Sousa, "Gaussian elimination algorithm for HMM complexity reduction in continuous speech recognition system," in Proc. INTERSPEECH, pp. 377-380, 2005.
2 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd Ed.. Wiley Interscience, 2000.
3 G.-J Jung, H.-Y. Cho, and Y. H. Oh, "Data-driven subvector clustering using thecross-entropy method," in Proc. ICASSP, pp. 977-980, 2007.
4 K. Shinoda and K. Iso, "Efficient reduction of Gaussian components using MDL criterion for HMM-based speech recognition," in Proc. ICASSP, vol. I, pp.869-872, 2002.
5 A. Ogawa and S. Takahashi, "Weighted Distance Measure for Efficient Reduction of Gaussian Mixture Components in HMM-based Acoustic Model,"in Proc. ICASSP, pp. 4173-4176, 2008.
6 E. Bocchieri, "Vector quantization for the efficient computation of continuous density likelihoods," in Proc. ICASSP, pp. 692-695, 1993.
7 K. Shinoda and T. Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Jpn. (E), vol. 21, no. 2, pp. 99-102, 2000.
8 P. L. Dognin, J. R. Hershey, V. Goel, and P. A. Olsen, "Refactoring acoustic models using variational density approximation," in Proc. ICASSP, pp. 4473-4476, 2009.
9 J. Rissanen, "Universal coding, information, prediction, and estimation," IEEE Trans. IT 30, pp. 629-636, 1984.   DOI