Browse > Article
http://dx.doi.org/10.7776/ASK.2011.30.6.315

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers  

Song, Young-Rok (부산대학교 전자전기공학과)
Kim, Hyung-Soon (부산대학교 전자전기공학과)
Abstract
This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.
Keywords
Speaker adaptation; MAPLR; reference speaker; adaptive priors; hybrid priors;
Citations & Related Records
연도 인용수 순위
  • Reference
1 송영록, 김형순, "화자적응에서의 복수의 사전분포의 유용성", 제27회 음성통신 및 신호처리 학술대회 논문집, pp. 136-137, 2010.
2 송영록, 김형순, "MAPLR 기반의 화자적응에서의 weighted prior 적용", 2010 한국음성학회 가을 학술대회 발표논문집, pp. 136-137, 2010.
3 Y. Lim and Y. Lee, "Implementation of the POW (Phonetically Optimized Words) algorithm for speech database," in Proc. ICASSP, pp.89-91, 1995.
4 이용주, 김봉완, 김종진, 양옥렬, 임선영, "음성 DB용 PBW에 관한검토," 제12회 음성통신 신호처리 워크샵 논문집, pp. 310-314, 1995.
5 W. Chou, "Maximum a posterior linear regression with elliptically symmetric matrix variate priors," in Proc. Eurospeech, vol. 1, pp. 1-4, 1999.
6 C. Huang, T. Chen and E. Chang, "Transformation and combination of hidden Markov models for speaker selection training" in Proc. ICSLP, pp. 1001-1004, 2004.
7 L. Gillick and S. Cox, "Some statistical issues in the comparison of speech recognition algorithms," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 532-535, 1989.
8 T. J. Hazen, "A comparison of novel techniques for rapid speaker adaptation," Speech Communication, vol. 31, pp. 15-33, 2000.   DOI   ScienceOn
9 B. Mak, T.-C. Lai, and R. Hsiao, "Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers," in Proc. ICASSP, pp. 229-232, 2006.
10 R. Kuhn, P. Nguyen, J. C. Jungua, L. Goldwasser, N. Niedzielski, S. Finche, K. Field and M. Contolini, "Eigenvoices for speaker adaptation," in Proc. ICSLP, pp. 1771-1774, 1998.
11 C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Computer Speech and Language, vol. 9, no. 1, pp. 171-185, 1995.   DOI   ScienceOn
12 R. Kuhn, J. C. Jungua, P. Nguyen, and N. Niedzielski, "Rapid speaker adaptation in eigenvoice space," IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, 2000.   DOI   ScienceOn
13 M. Gales, "Cluster adaptive training of hidden Markov models," IEEE Trans. Speech Audio Process., vol. 8, no. 4, pp. 417-428, Jul. 2000.   DOI   ScienceOn
14 M. J. F. Gales, The Generation and Use of Regression Class Trees for MLLR Adaptation, Cambridge University, Cambridge, U. K., Tech. Rep. CUED/F-INFENG/TR263, 1996.
15 J. L. Gauvain and C. H. Lee "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.   DOI   ScienceOn