[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.10.1510.0062

A New Distance Measure for a Variable-Sized Acoustic Model Based on MDL Technique

Cho, Hoon-Young (Software Research Laboratory, ETRI)
Kim, Sang-Hun (Software Research Laboratory, ETRI)

Publication Information

ETRI Journal / v.32, no.5, 2010 , pp. 795-800 More about this Journal

Abstract

Embedding a large vocabulary speech recognition system in mobile devices requires a reduced acoustic model obtained by eliminating redundant model parameters. In conventional optimization methods based on the minimum description length (MDL) criterion, a binary Gaussian tree is built at each state of a hidden Markov model by iteratively finding and merging similar mixture components. An optimal subset of the tree nodes is then selected to generate a downsized acoustic model. To obtain a better binary Gaussian tree by improving the process of finding the most similar Gaussian components, this paper proposes a new distance measure that exploits the difference in likelihood values for cases before and after two components are combined. The mixture weight of Gaussian components is also introduced in the component merging step. Experimental results show that the proposed method outperforms MDL-based optimization using either a Kullback-Leibler (KL) divergence or weighted KL divergence measure. The proposed method could also reduce the acoustic model size by 50% with less than a 1.5% increase in error rate compared to a baseline system.

Keywords

Acoustic modeling; optimization; minimum description length; parameter reduction;

Citations & Related Records

Times Cited By Web Of Science : 0 (Related Records In Web of Science)
Times Cited By SCOPUS : 0

Reference

1	G.F.G. Yared, F. Violaro, and L.C. Sousa, "Gaussian Elimination Algorithm for HMM Complexity Reduction in Continuous Speech Recognition System," Proc. INTERSPEECH, 2005, pp. 377-380.
2	J. Rissanen, "Universal Coding, Information, Prediction, and Estimation," IEEE Trans. IT, vol. 30, 1984, pp. 629-636. DOI
3	J. Cai et al., "Efficient Likelihood Evaluation and Dynamic Gaussian Selection for HMM-based Speech Recognition," Comput. Speech Language, vol. 23, 2009, pp. 147-164. DOI ScienceOn
4	I.L. Hetherington, "PocketSUMMIT: Small-Footprint Continuous Speech Recognition," Proc. INTERSPEECH, 2007, pp. 1465- 1468.
5	M.Y. Hwang and X. Huang, "Dynamically Configurable Acoustic Models for Speech Recognition," Proc. ICASSP, 1998, pp. 669-672.
6	P.L. Dognin et al., "Refactoring Acoustic Models using Variational Density Approximation," Proc. ICASSP, 2009, pp. 4473-4476.
7	A. Ogawa and S. Takahashi, "Weighted Distance Measure for Efficient Reduction of Gaussian Mixture Components in HMMBased Acoustic Model," Proc. ICASSP, 2008, pp. 4173-4176.
8	K. Shinoda and K. Iso, "Efficient Reduction of Gaussian Components Using MDL Criterion for HMM-Based Speech Recognition," Proc. ICASSP, 2002, pp. 869-872.
9	K. Shinoda and T. Watanabe, "MDL-Based Context-Dependent Subword Modeling for Speech Recognition," J. Acoust. Soc. Jpn. (E), vol. 21, no. 2, 2000, pp. 99-102.
10	K. Shinoda, "Robust Acoustic Modeling for Speech Recognition," IEICE Technical Report, vol. 104, no. 541, 2004, pp. 7-12.
11	E. Bocchieri, "Vector Quantization for the Efficient Computation of Continuous Density Likelihoods," Proc. ICASSP, 1993, pp. 692-695.
12	G.J. Jung, H.Y. Cho, and Y.H. Oh, "Data-Driven Subvector Clustering Using the Cross-Entropy Method," Proc. ICASSP, 2007, pp. 977-980.
13	R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, 2nd Ed., Wiley Interscience, 2000.