Browse > Article
http://dx.doi.org/10.7776/ASK.2006.25.7.312

Lip-Synch System Optimization Using Class Dependent SCHMM  

Lee, Sung-Hee (고려대학교 전자컴퓨터공학과)
Park, Jun-Ho (고려대학교 전자컴퓨터공학과)
Ko, Han-Seok (고려대학교 전자컴퓨터공학과)
Abstract
The conventional lip-synch system has a two-step process, speech segmentation and recognition. However, the difficulty of speech segmentation procedure and the inaccuracy of training data set due to the segmentation lead to a significant Performance degradation in the system. To cope with that, the connected vowel recognition method using Head-Body-Tail (HBT) model is proposed. The HBT model which is appropriate for handling relatively small sized vocabulary tasks reflects co-articulation effect efficiently. Moreover the 7 vowels are merged into 3 classes having similar lip shape while the system is optimized by employing a class dependent SCHMM structure. Additionally in both end sides of each word which has large variations, 8 components Gaussian mixture model is directly used to improve the ability of representation. Though the proposed method reveals similar performance with respect to the CHMM based on the HBT structure. the number of parameters is reduced by 33.92%. This reduction makes it a computationally efficient method enabling real time operation.
Keywords
Head-Bod-Tail (HBT); Context dependent; Connected vowel recognition; Lip-synch;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 J. R. Bellegarda, D.Nahamoo, 'Tied Mixture Continuous Parameter Modeling for Speech Recognition.' IEEE Trans. Acoustic Speech Signal Processing, 38 2033-2045, 1990   DOI   ScienceOn
2 이혜정, 정석태 '아바타 기반 교육용 멀티미디어 컨텐츠 저작시스템의 설계 및 구현', 한국해양정보통신학회논문지 8 (5) 1042-1049, 2004
3 신지영, '모음-자음-모음 연결에서 자음의 조음특성과 모음-모음 동시조음', 음성과학, 1226-5276, 1 55-81, 1997
4 W. Chou, C. -H. Lee, B. -H. Huang, 'Minimum Error Rate Training of Inter-Word Context-Dependent Acoustic Model Units in Speech Recognition', Proceeding ICSLP, 439-442, 1994
5 M. B. Gandhi, J. Jacob, 'Natural Number Recognition using MCE Trained Inter-Word Context-Dependent Acoustic Models,' Proceedings ICASSP, pp, 457-460, 1998
6 T. Kim, Y. Kang, H. Ko. 'Achieving Real -Time Lip Synch via SVM-Based Phoneme Classification and Lip Shape Refinement,' ICMI, Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02), 299-304, 2002
7 T.Chen and R.Rao, 'Audio-visual integration in multi modal communication', Proceedings of IEEE, Special Issue on Multimedia Signal Processing, 837-852, 1998   DOI   ScienceOn
8 주희열, 강선미, 고한석, '음소인식 기반의 립싱크 구현을 위한 한국어 음운학적 Viseme의 제안', 한국음향학회, 70-73, 1999   과학기술학회마을
9 X. D. Huang, 'Phoneme Classification using Semi continuous hidden Markov Models' IEEE Trans. Acoustic Speech Signal Processing, 40 1062-1067, 1992   DOI   ScienceOn
10 F.J. Huang, T. Chen, 'Real-Time Up-Synch Face Animation Driven By Human Voice' Proc. IEEE Workshop on Multimedia Signal Processing, 352-357, 1998
11 M. Brand, 'Voice Puppetry' Proceedings of SIGGRAPH' 99, 21-28, 1999