DOI QR코드

DOI QR Code

L1-norm Regularization for State Vector Adaptation of Subspace Gaussian Mixture Model

L1-norm regularization을 통한 SGMM의 state vector 적응

  • Received : 2015.08.04
  • Accepted : 2015.09.21
  • Published : 2015.09.30

Abstract

In this paper, we propose L1-norm regularization for state vector adaptation of subspace Gaussian mixture model (SGMM). When you design a speaker adaptation system with GMM-HMM acoustic model, MAP is the most typical technique to be considered. However, in MAP adaptation procedure, large number of parameters should be updated simultaneously. We can adopt sparse adaptation such as L1-norm regularization or sparse MAP to cope with that, but the performance of sparse adaptation is not good as MAP adaptation. However, SGMM does not suffer a lot from sparse adaptation as GMM-HMM because each Gaussian mean vector in SGMM is defined as a weighted sum of basis vectors, which is much robust to the fluctuation of parameters. Since there are only a few adaptation techniques appropriate for SGMM, our proposed method could be powerful especially when the number of adaptation data is limited. Experimental results show that error reduction rate of the proposed method is better than the result of MAP adaptation of SGMM, even with small adaptation data.

Keywords

References

  1. Benzeghiba, M. et al. (2007), Automatic speech recognition and speech variability: A review, Speech Comm., Vol. 49, No. 10-11. https://doi.org/10.1016/j.specom.2007.02.006
  2. Huang, X., Acero, A., and Hon, H.-W (2001), Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall.
  3. Gales, M. (2000), Cluster adaptive training of hidden Markov models, IEEE Trans. Speech and Audio Process., Vol. 8, No. 4.
  4. Kuhn, R. et al. (1998), Eigenvoices for speaker adaptation. in Proc. ICSLP.
  5. Povey, D. et al. (2010), Subspace Gaussian Mixture Models for speech recognition, in Proc. ICASSP.
  6. Povey, D. et al. (2011), The subspace Gaussian mixture model -A structured model for speech recognition, Computer Speech and Language, Vol. 25, No. 2.
  7. Burget, L. et al. (2010), Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models, in Proc. ICASSP.
  8. Lu, L. et al. (2012), Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition, in Proc. ICASSP.
  9. Hamidi, S. and Rose, R. C. (2013), Phonetic subspace adaptation for automatic speech recognition, in Proc. ICASSP.
  10. Kim, Y. and Kim, H. (2014), Constrained mle-based speaker adaptation with l1 regularization, in Proc. ICASSP.
  11. Chen, S. S. et al. (1998), Atomic Decomposition by Basis Pursuit, SIAM J. Scientific Computing, Vol. 20.
  12. Tibshirani, R. (1996), Regression Shrinkage and Selection via the Lasso, J. Roy. Stat. Soc. Series B (Methodological), Vol. 58, No. 1.
  13. Povey, D. (2009), A tutorial-style introduction to subspace Gaussian mixture models for speech recognition, Microsoft research, Redmond, WA, Tech. Rep.
  14. Olsen, P. A. et al. (2011), Sparse Maximum A Posteriori adaptation, in Proc. ASRU.
  15. Lu, L. et al. (2011), Regularized subspace Gaussian mixture models for cross-lingual speech recognition, in Proc. ASRU.
  16. Lu, L. et al. (2011), Regularized Subspace Gaussian Mixture Models for Speech Recognition, IEEE Signal Processing Letters, Vol. 18, No. 7.
  17. Figueiredo, M. A. et al. (2007), Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems, IEEE J. Selected Topics Signal Process., Vol. 1, No. 4.
  18. Candes, E. J., Wakin, M. B., and Boyd, S. P. (2008), Enhancing sparsity by reweighted L1 minimization, J. Fourier Analysis Applicat., Vol. 14.
  19. Asif, M. S. and Romberg, J. (2013), Fast and Accurate Algorithms for Re-Weighted L1-Norm Minimization, IEEE Trans. Signal Process., Vol. 61, No. 3.
  20. Povey, D., Ghoshal, A., and Boulianne, G. (2011), The Kaldi speech recognition toolkit, in Proc. ASRU.