DOI QR코드

DOI QR Code

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers

참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응

  • 송영록 (부산대학교 전자전기공학과) ;
  • 김형순 (부산대학교 전자전기공학과)
  • Received : 2010.12.30
  • Accepted : 2011.08.12
  • Published : 2011.08.31

Abstract

This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.

본 논문은 maximum a posteriori linear regression (MAPLR) 기반의 고속 화자적응 성능을 개선하기 위하여 사전분포를 추정하는 두 가지 방식을 제안한다. 일반적으로 MAPLR 방식에서 사용되는 변환행렬의 사전분포는 화자독립모델을 구성하는 훈련 화자들로부터 추정되어 모든 화자들에게 동등하게 적용된다. 본 논문에서는 새로운 화자에게 보다 더 적합한 사전분포를 적용하고자 적응 데이터를 이용하여 새로운 화자의 음향특성과 가까운 참조화자 집단을 선택한 후 참조화자 집단으로부터 사전분포를 추정하는 방법을 제안한다. 또한, 블록 대각 형태의 변환행렬의 사전분포를 추정하는 경우 사전분포의 평균행렬과 공분산행렬을 동일한 훈련 화자들로부터 얻어진 두 가지 형태의 변환행렬집단으로부터 각각 추정하는 방법을 제안한다. 제안된 방법의 성능 평가를 위하여 고립단어 인식실험을 통해 적응 단어의 개수에 따른 단어 인식률을 평가한다. 실험결과, 적응 단어 수가 매우 적을 때 기존의 MAPLR 방식에 비하여 통계적으로 유의미한 성능향상이 얻어짐을 보여준다.

Keywords

References

  1. J. L. Gauvain and C. H. Lee "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994. https://doi.org/10.1109/89.279278
  2. C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Computer Speech and Language, vol. 9, no. 1, pp. 171-185, 1995. https://doi.org/10.1006/csla.1995.0010
  3. R. Kuhn, P. Nguyen, J. C. Jungua, L. Goldwasser, N. Niedzielski, S. Finche, K. Field and M. Contolini, "Eigenvoices for speaker adaptation," in Proc. ICSLP, pp. 1771-1774, 1998.
  4. R. Kuhn, J. C. Jungua, P. Nguyen, and N. Niedzielski, "Rapid speaker adaptation in eigenvoice space," IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, 2000. https://doi.org/10.1109/89.876308
  5. M. Gales, "Cluster adaptive training of hidden Markov models," IEEE Trans. Speech Audio Process., vol. 8, no. 4, pp. 417-428, Jul. 2000. https://doi.org/10.1109/89.848223
  6. M. J. F. Gales, The Generation and Use of Regression Class Trees for MLLR Adaptation, Cambridge University, Cambridge, U. K., Tech. Rep. CUED/F-INFENG/TR263, 1996.
  7. W. Chou, "Maximum a posterior linear regression with elliptically symmetric matrix variate priors," in Proc. Eurospeech, vol. 1, pp. 1-4, 1999.
  8. L. Gillick and S. Cox, "Some statistical issues in the comparison of speech recognition algorithms," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 532-535, 1989.
  9. T. J. Hazen, "A comparison of novel techniques for rapid speaker adaptation," Speech Communication, vol. 31, pp. 15-33, 2000. https://doi.org/10.1016/S0167-6393(99)00059-X
  10. B. Mak, T.-C. Lai, and R. Hsiao, "Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers," in Proc. ICASSP, pp. 229-232, 2006.
  11. C. Huang, T. Chen and E. Chang, "Transformation and combination of hidden Markov models for speaker selection training" in Proc. ICSLP, pp. 1001-1004, 2004.
  12. Y. Lim and Y. Lee, "Implementation of the POW (Phonetically Optimized Words) algorithm for speech database," in Proc. ICASSP, pp.89-91, 1995.
  13. 이용주, 김봉완, 김종진, 양옥렬, 임선영, "음성 DB용 PBW에 관한검토," 제12회 음성통신 신호처리 워크샵 논문집, pp. 310-314, 1995.
  14. 송영록, 김형순, "화자적응에서의 복수의 사전분포의 유용성", 제27회 음성통신 및 신호처리 학술대회 논문집, pp. 136-137, 2010.
  15. 송영록, 김형순, "MAPLR 기반의 화자적응에서의 weighted prior 적용", 2010 한국음성학회 가을 학술대회 발표논문집, pp. 136-137, 2010.