Browse > Article

A Study on Phoneme Likely Units to Improve the Performance of Context-dependent Acoustic Models in Speech Recognition  

임영춘 (주식회사 자모바)
오세진 (한국천문연구원 KVN 사업본부)
김광동 (한국천문연구원 KVN 사업본부)
노덕규 (한국천문연구원 KVN 사업본부)
송민규 (한국천문연구원 KVN 사업본부)
정현열 (영남대학교 전자정보공학부)
Abstract
In this paper, we carried out the word, 4 continuous digits. continuous, and task-independent word recognition experiments to verify the effectiveness of the re-defined phoneme-likely units (PLUs) for the phonetic decision tree based HM-Net (Hidden Markov Network) context-dependent (CD) acoustic modeling in Korean appropriately. In case of the 48 PLUs, the phonemes /ㅂ/, /ㄷ/, /ㄱ/ are separated by initial sound, medial vowel, final consonant, and the consonants /ㄹ/, /ㅈ/, /ㅎ/ are also separated by initial sound, final consonant according to the position of syllable, word, and sentence, respectively. In this paper. therefore, we re-define the 39 PLUs by unifying the one phoneme in the separated initial sound, medial vowel, and final consonant of the 48 PLUs to construct the CD acoustic models effectively. Through the experimental results using the re-defined 39 PLUs, in word recognition experiments with the context-independent (CI) acoustic models, the 48 PLUs has an average of 7.06%, higher recognition accuracy than the 39 PLUs used. But in the speaker-independent word recognition experiments with the CD acoustic models, the 39 PLUs has an average of 0.61% better recognition accuracy than the 48 PLUs used. In the 4 continuous digits recognition experiments with the liaison phenomena. the 39 PLUs has also an average of 6.55% higher recognition accuracy. And then, in continuous speech recognition experiments, the 39 PLUs has an average of 15.08% better recognition accuracy than the 48 PLUs used too. Finally, though the 48, 39 PLUs have the lower recognition accuracy, the 39 PLUs has an average of 1.17% higher recognition characteristic than the 48 PLUs used in the task-independent word recognition experiments according to the unknown contextual factor. Through the above experiments, we verified the effectiveness of the re-defined 39 PLUs compared to the 48PLUs to construct the CD acoustic models in this paper.
Keywords
48; HM-Net; 48; 39 phoneme likely units; HM-net (hidden Markov Network); PDT-SSS algorithm; Context dependent acoustic models;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Allophone clustering for continuous speech recognition /
[ K.Lee;S.Hayamizu;H.Hou;C.Huang;J.Swartz;R.Weide ] / Proc. of ICASSP'90
2 결정트리 상태 클러스트링에 의한 HM-Net 구조결정 알고리즘을 이용한 음성인식에 관한 연구 /
[ 오세진;황철준;김범국;정호열;정현열 ] / 한국음향학회지   과학기술학회마을
3 /
[ L.Rabiner;B.H.Juang ] / Fundamentals of Speech Recognition
4 A successive state splitting algorithm for efficient allophone modeling /
[ J.Takami;S.Sagayama ] / Pro. of ICASSP '92
5 /
[ 中川聖一 ] / 確率モデ ルによる音聲認識
6 New state clustering of hidden Markov network with Korean Phonological rules for speech recognition /
[ S.J.Oh:C.J.Hwang;B.K.Kim;H.Y.Chung;A.Ito ] / IEEE 4th workshop on Multimedia Signal Processing
7 HMM topology design using maximum likelihood successive state splitting /
[ Ostendoft;H.Singer ] / Computer Speech and Language   DOI   ScienceOn
8 Diphone 단위의 hidden Markov model을 이용한 한국어 단어인식 /
[ 박현상;은종관,박용규;권오욱 ] / 한국음향학회지   과학기술학회마을
9 음성인식 기능을 가진 주소입력 시스템의 개발과 평가 /
[ 김득수;황철준;정현열 ] / 한국음향학회지   과학기술학회마을
10 A new HMnet construction algorithm requiring no contextual factors /
[ M.Suzuki;S.Makino;A.Ito;H.Aso;H.Shimodaira ] / IEICE Trans. Info. & Syst.
11 인식 단위로서의 한국어 음절에 관한 연구 /
[ 김유진;김회린;정재호 ] / 한국음향학회지   과학기술학회마을
12 기본음소 설정을 위한 음소인식률 이용 방안 연구 /
[ 김호경;구명완 ] / 제15회 음성통신 및 신호처리 워크샵 논문집
13 /
[ 이호영 ] / 국어음성학
14 /
[ 배주채 ] / 국어음운론
15 /
[ S.Young;D.Kershaw;J.Odell;D.Ollason;V.Valtchev;P.Woodland ] / The HTK Book
16 가변어휘 음성인식기의 음향모델 개선 밍 성능 분석 /
[ 이승훈;김회린 ] / 한국음향학회지   과학기술학회마을