Browse > Article

A Study on Regression Class Generation of MLLR Adaptation Using State Level Sharing  

오세진 (한국천문연구원 KVN 사업본부)
성우창 (영남대학교 전자정보공학부)
김광동 (한국천문연구원 KVN 사업본부)
노덕규 (한국천문연구원 KVN 사업본부)
송민규 (한국천문연구원 KVN 사업본부)
정현열 (영남대학교 전자정보공학부)
Abstract
In this paper, we propose a generation method of regression classes for adaptation in the HM-Net (Hidden Markov Network) system. The MLLR (Maximum Likelihood Linear Regression) adaptation approach is applied to the HM-Net speech recognition system for expressing the characteristics of speaker effectively and the use of HM-Net in various tasks. For the state level sharing, the context domain state splitting of PDT-SSS (Phonetic Decision Tree-based Successive State Splitting) algorithm, which has the contextual and time domain clustering, is adopted. In each state of contextual domain, the desired phoneme classes are determined by splitting the context information (classes) including target speaker's speech data. The number of adaptation parameters, such as means and variances, is autonomously controlled by contextual domain state splitting of PDT-SSS, depending on the context information and the amount of adaptation utterances from a new speaker. The experiments are performed to verify the effectiveness of the proposed method on the KLE (The center for Korean Language Engineering) 452 data and YNU (Yeungnam Dniv) 200 data. The experimental results show that the accuracies of phone, word, and sentence recognition system increased by 34∼37%, 9%, and 20%, respectively, Compared with performance according to the length of adaptation utterances, the performance are also significantly improved even in short adaptation utterances. Therefore, we can argue that the proposed regression class method is well applied to HM-Net speech recognition system employing MLLR speaker adaptation.
Keywords
State level sharing; PDT-SSS algorithm; Regression class; MLLR adaptation;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 S. Sagayama and S. Honma, 'Estimation of unknown context using a phoneme environment clustering algorithm,' Proc. of ICSLP90, 1, 361-364, 1990
2 K. F. Lee and H. W. Hon, 'Large vocabulary speaker independent continuous speech recognition using HMM,' Proc. of ICASSP'88. 123-126, 1988
3 M. J. E. Gales, The Generation and Use of Regression Class Trees for MLLR Adaptation, Technical Report CUED/F-INFENG/TR263, Cambridge University, 1996
4 R. Haeb Umbach, 'Automatic generation of phonetic regression class trees for MLLR adaptation,' IEEE Trans. on Speech and Audio Processing, 9 (3), 299-302, 2001   DOI   ScienceOn
5 L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Prentice Hall, 1993
6 S.-J. Oh, K.-D. Kim, D.-G. Roh, W.-C. Sung, and H.-Y. Chung, "Speaker adaptation using regression classes generated by phonetic decision tree based successive state splitting,' Abstract Book of EUROSPEECH'03, 51, 2003
7 M. Ostendoft and H. Singer, 'HMM topology design using maximum likelihood successive state splitting,' Computer Speech and Language, 11, 17-41, 1997
8 J. Takami and S. Sagayama, 'A successive state splitting algorithm for efficient allophone modeling,' Proc. of ICASSP'92, 1, 573-576, 1992
9 S. Younq and P. Woodland, 'The use of state tying in continuous speech recognition,' Proc. of EUROSPEECH'93, 2203-2206, 1993
10 임영춘, 오세진, 김광동, 노덕규, 송민규, 정현열, '음성인식에서 문맥의존 음향모델의 성능향상을 위한 유사음소단위에 관한 연구', 한국음향학회지, 22 (5), 2003
11 J. L. Gauvain and C. H. Lee, 'Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov Chains,' IEEE Trans. Speech Audio Processing, 2, 291-298, 1994   DOI   ScienceOn
12 성우창, 오세진, 김광동, 정호열, 정현열, '결정트리 상태 클러스터링에 의한 MLLR 적응화의 회귀 클래스 생성에 관한 연구,' 2003년도 한국음향학회 하계학술발표대회 논문집, 22 1(s), 121-124, 2003
13 T. Hori, M. Katoh, A. Ito, and M. Kohda, 'A study on HMNets using decision tree-based successive state splitting,' Proc. of ICSP'97, 2, 383-387, 1997
14 M. J. E. Gales and P. C. Woodland, Variance Compensation within the MLLR Framework, Technical Report CUED/FINFENG/TR242, Cambridge University, 1996
15 오세진, 황철준, 김범국, 정호열, 정현열, '결정트리 상태 클러스트링에 의한 HMNet 구조결정 알고리즘을 이용한 음성인식에 관한 연구,' 한국음향학회지, 21 (2), 199-210, 2002