DOI QR코드

DOI QR Code

Maximum Likelihood Training and Adaptation of Embedded Speech Recognizers for Mobile Environments

  • Cho, Young-Kyu (Speech Information Processing Laboratory, Department of Computer and Communication Engineering, Korea University) ;
  • Yook, Dong-Suk (Speech Information Processing Laboratory, Department of Computer and Communication Engineering, Korea University)
  • Received : 2009.06.08
  • Accepted : 2009.08.19
  • Published : 2010.02.28

Abstract

For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.

Keywords

References

  1. C.J. Leggetter and P.C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models," Computer Speech and Language, vol. 9, 1995, pp. 171-185. https://doi.org/10.1006/csla.1995.0010
  2. E. Bocchieri and B. Mak, "Subspace Distribution Clustering Hidden Markov Model," IEEE Trans. Speech Audio Process., vol. 9, 2001, pp. 264-276. https://doi.org/10.1109/89.906000
  3. L.E. Baum, "An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of Markov Processes," Inequalities, vol. 3, 1972, pp. 1-8.
  4. J.J. Odell, The Use of Context in Large Vocabulary Speech Recognition, PhD Thesis, Cambridge University, 1995.
  5. K. Wong and B. Mak, "MAP Adaptation with Subspace Regression Classes and Tying," IEEE Proc. Int. Conf. Acoust., Speech, Signal Process., vol. 3, 2000, pp. 1551-1554.
  6. K. Wong and B. Mak, "Rapid Speaker Adaptation Using MLLR and Subspace Regression Classes," Proc. European Conf. Speech Commun. Technol., vol. 2, 2001, pp. 1253-1256.
  7. M. Zhang and J. Xu, "An Investigation into Subspace Rapid Speaker Adaptation," IEEE Proc. Int. Symp. Chinese Spoken Language Process., 2004, pp. 273-276.
  8. D. Kim and D. Yook, "Linear Spectral Transformation for Robust Speech Recognition Using Maximum Mutual Information," IEEE Signal Process. Lett., vol. 14, 2007, pp. 496-499. https://doi.org/10.1109/LSP.2006.891337
  9. Y. Cho and D. Yook, "Rapid Adaptation Using Linear Spectral Transformation for Embedded Speech Recognizers," IET Electron. Lett., vol. 44, no. 17, 2008, pp. 1040-1042. https://doi.org/10.1049/el:20081503

Cited by

  1. Probabilistic Bilinear Transformation Space-Based Joint Maximum A Posteriori Adaptation vol.34, pp.5, 2010, https://doi.org/10.4218/etrij.12.0212.0054