Browse > Article
http://dx.doi.org/10.6109/jkiice.2015.19.5.1047

Acoustic Model Transformation Method for Speech Recognition Employing Gaussian Mixture Model Adaptation Using Untranscribed Speech Database  

Kim, Wooil (Department of Computer Science & Engineering, Incheon National University)
Abstract
This paper presents an acoustic model transform method using untranscribed speech database for improved speech recognition. In the presented model transform method, an adapted GMM is obtained by employing the conventional adaptation method, and the most similar Gaussian component is selected from the adapted GMM. The bias vector between the mean vectors of the clean GMM and the adapted GMM is used for updating the mean vector of HMM. The presented GAMT combined with MAP or MLLR brings improved speech recognition performance in car noise and speech babble conditions, compared to singly-used MAP or MLLR respectively. The experimental results show that the presented model transform method effectively utilizes untranscribed speech database for acoustic model adaptation in order to increase speech recognition accuracy.
Keywords
Speech recognition; Noisy environment; Model adaptation; Acoustic model; Gaussian mixture model;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven Environmental Compensation for Speech Recognition: A Unified Approach," Speech Communication, 24(4), pp.267-285, 1998.   DOI
2 W. Kim and J. H. L. Hansen, "Feature Compensation in the Cepstral Domain Employing Model Combination," Speech Communication, 51(2), pp.83-96, 2009.   DOI   ScienceOn
3 J. L. Gauvain and C. H. Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Trans. on Speech and Audio Proc., vol.2, no.2, pp.291-298, 1994.   DOI
4 C. J. Leggetter and P. C. Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density HMMs," Computer Speech and Language, 9, pp.171-185, 1995.   DOI
5 https://catalog.ldc.upenn.edu/LDC93S1
6 M. J. F. Gales and S. J. Young, "Robust Continuous Speech Recognition Using Parallel Model Combination," IEEE Trans. on Speech and Audio Proc., vol.4, no.5, pp.352-359, 1996.   DOI
7 W. Kim and J.H.L. Hansen, "Gaussian Map based Acoustic Model Adaptation Using Untranscribed Data for Speech Recognition in Severely Adverse Environments," Interspeech-2012, pp.1764-1767, Sept. 2012.
8 http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html
9 http://cmusphinx.sourceforge.net
10 ETSI standard document, ETSI ES 201 108 v1.1.2 (2000-04), Feb. 2000.
11 S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans. on Acoustics, Speech and Signal Proc., vol.27, pp.113-120, 1979.   DOI
12 Y. Ephraim and D. Malah, "Speech Enhancement Using Minimum Mean Square Error Short Time Spectral Amplitude Estimator," IEEE Trans. on Acoustics, Speech and Signal Proc., vol.32, no.6, pp.1109-1121, 1984.   DOI
13 J. H. L. Hansen and M. Clements, "Constrained Iterative Speech Enhancement with Application to Speech Recognition," IEEE Trans. on Signal Proc., vol.39, no.4, pp.795-805, 1991.   DOI