Browse > Article
http://dx.doi.org/10.5391/JKIIS.2010.20.3.348

Robust Feature Extraction Based on Image-based Approach for Visual Speech Recognition  

Gyu, Song-Min (전남대학교 전자공학과)
Pham, Thanh Trung (전남대학교 전자공학과)
Min, So-Hee (전남대학교 전자공학과)
Kim, Jing-Young (전남대학교 전자공학과)
Na, Seung-You (전남대학교 전자공학과)
Hwang, Sung-Taek (삼성전자 정보통신총괄 통신연구소)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.20, no.3, 2010 , pp. 348-355 More about this Journal
Abstract
In spite of development in speech recognition technology, speech recognition under noisy environment is still a difficult task. To solve this problem, Researchers has been proposed different methods where they have been used visual information except audio information for visual speech recognition. However, visual information also has visual noises as well as the noises of audio information, and this visual noises cause degradation in visual speech recognition. Therefore, it is one the field of interest how to extract visual features parameter for enhancing visual speech recognition performance. In this paper, we propose a method for visual feature parameter extraction based on image-base approach for enhancing recognition performance of the HMM based visual speech recognizer. For experiments, we have constructed Audio-visual database which is consisted with 105 speackers and each speaker has uttered 62 words. We have applied histogram matching, lip folding, RASTA filtering, Liner Mask, DCT and PCA. The experimental results show that the recognition performance of our proposed method enhanced at about 21% than the baseline method.
Keywords
Visual speech recognition; Histogram matching; Lip folding; Rasta filter; PCA;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 G. Potamanos, H.P. Graf, E. Cosatto, "An image transform approach for HMM based automatic lipreading", Proceedings of the International Conference on Image Processing, vol.3, pp. 173-177, Chicago, U.S.A., July 1998.
2 C. C. Chibelushi, F. Deravi, and J. S. Moson, "A review of speech-based bimodal recognition," IEEE Trans. Multimedia, vol.4, no.1, pp23-37, Mar. 2002.   DOI
3 P. Scanlon and R. Reilly, "Feature analysis for automatic speechreading," in Proc. Int. Conf. Multimedia and Expo, pp. 625-630, 2001.
4 T. T. pham, J. Y. Kim, S. Y. Na, S. T. Hwang, "Robust Eye Localization for Lip Reading in Mobile Environment," Proceddings of SCIS&ISIS in Japan, pp.385-388, 2008.
5 MacQueen, J. B. "Some Methods for Classification and Analysis of Multivariate Observations," In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297. 1967.
6 Andrew W. Moore, "K-means and Hierarchical Cl-ustering", Tutorial Slides in School of Computer Science Carnegie Mellon University, h t t p : / / w w w . c s . c m u . e d u / ~ a w m , http://www.autonlab.org/tutorials/kmeans11.pdf
7 T. T. Pham, M. G. Song, J. Y. KIm, S. Y. Na, S. T. Hwang, "A Robust Lip Center Detection in Cell Phone Environment," Proceedings of IEEE Symposium on Signal Processing and Information Technology, pp.390-395, Sarajevo, December, 2008.
8 송민규, 김진영, T. T. Pham, 황성택, “모바일환경에서의 시각 음성인식을 위한 눈 정위 기반 입술의 검출에 대한 연구”, 한국퍼지 및 지능시스템학회 논문지, 제 19권 제 4호, pp. 478-484.
9 김진범, 김진영, “입술의 대칭성에 기반한 효울적인 립리딩 방법,” 전자공학회논문지, 제 37권, 제 5호, pp.105-114, 2000.
10 신도성, 김진영, 최승호, “시간영역 필터를 이용한 립리딩 성능향상에 관한 연구,” 한국음향학회지, 제22권, 제 5호, pp.375-382, 2003   과학기술학회마을
11 J. N. Gowdy, A. Subramanya,. C. Bartels, J. Bilmes, "DBN-based muti-stream models for audio-visual speech recognition." proc. IEEE Int. conf. Acoustics, Speech, and Signal Processing, pp.993-996, 2004.
12 Pedro J. Moreno, "Speech Recognition in Noisy Environment," Ph.D. Thesis, ECE Department, CMU, May 1996.
13 McGurk, Harry and MacDonald, John, "Hearing lips and seeing voices," Nature, Vol. 264(5588), pp. 746–748, 1976.
14 S. Dupont and J. Luettin, “Audio-Visual Speech Modelling for Continuous Speech Recognition,” Proceedings of IEEE Transactions on Multimedia, pp.141-151, 2000.
15 Jeff A. Bilmes and Chris Bartels, "Graphical Model Architectures for Speech Recognition," IEEE Signal Processing Magazine, vol.22, pp.89-100, 2005.
16 Jean-Luc Schwartz, , Frederic Berthommier and Christophe Savariaux, “Seeing to Hear Better: Evidence for Early Audio-Visual Interactions in Speech Identification,” ERIC Journal Articles : Reports-Research, Cognition, vol.93, no.2, pp. 69-pp.78, Sep, 2004.