Automatic Lipreading Using Color Lip Images and Principal Component Analysis

Lee, Jong-Seok;Park, Cheol-Hoon;

doi:10.3745/KIPSTB.2008.15-B.3.229

The KIPS Transactions:PartB (정보처리학회논문지B)

Volume 15B Issue 3
/
Pages.229-236
/
2008
/
1598-284X(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Automatic Lipreading Using Color Lip Images and Principal Component Analysis

컬러 입술영상과 주성분분석을 이용한 자동 독순

이종석 (한국과학기술원 전자전산학부) ;
박철훈 (한국과학기술원 전자전산학부)

Published : 2008.06.30

https://doi.org/10.3745/KIPSTB.2008.15-B.3.229 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper examines effectiveness of using color images instead of grayscale ones for automatic lipreading. First, we show the effect of color information for performance of humans' lipreading. Then, we compare the performance of automatic lipreading using features obtained by applying principal component analysis to grayscale and color images. From the experiments for various color representations, it is shown that color information is useful for improving performance of automatic lipreading; the best performance is obtained by using the RGB color components, where the average relative error reductions for clean and noisy conditions are 4.7% and 13.0%, respectively.

본 논문은 화자의 입술 움직임으로부터 음성을 인식하는 자동 독순에서 회색조 영상 대신 컬러 영상을 사용하는 것의 유용성에 대해 고찰한다. 먼저 인간의 독순 실험을 통해 컬러 정보가 인식 성능에 어떠한 영향을 미치는지 확인한다. 다음으로 주성분분석을 이용한 자동 독순에서 회색조 또는 컬러 입술영상을 사용하는 경우에 대해 인식 성능을 비교한다. 다양한 컬러 좌표계에 대한 실험을 통해 컬러 영상의 사용으로 인식율이 향상됨을 보인다. 특히 RGB 좌표계를 사용했을 때 가장 좋은 성능을 얻으며, 회색조의 경우에 비해 잡음이 없는 환경에서는 4.7%, 잡음이 있는 경우 평균 13.0%의 상대적 오인식율 감소를 얻을 수 있음을 확인한다.

Keywords

References

이종석, 박철훈, “잡음에 강인한 시청각 음성인식,” iCROS, 제13권 제3호, pp.28-34, 2007년 9월
P. Scanlon and R. Reilly, “Feature Analysis for Automatic Speechreading,” Proc. Int. Conf. Multimedia and Expo, Tokyo, Japan, pp.625-630, Apr., 2001
P. Daubias, “Is Color Information Really Useful for Lip-reading? (Or What Is Lost When Color Is Not Used),” Proc. Interspeech, Lisbon, Portugal, pp.1193-1196, 2005
S. L. Wang, W. H. Lau, and S. H. Leung, “Automatic Lip Contour Extraction from Color Images,” Pattern Recognit., Vol.37, No.12, pp.2375-2387, 2004 https://doi.org/10.1016/S0031-3203(04)00196-7
G. I. Chiou and J. N. Hwang, “Lipreading from Color Video”, IEEE Trans. Image Processing, Vol.6, No.8, pp.1192-1195, 1997 https://doi.org/10.1109/83.605417
S. Lucey, “An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition,” Proc. Int. Conf. Audio-Video-based Biometric Person Authentication, Guildford, UK, pp. 260-267, 2003 https://doi.org/10.1007/3-540-44887-X_31
K. Saenko, T. Darrell, and J. Glass, “Articulatory Features for Robust Visual Speech Recognition,” Proc. Int. Conf. Multimodal Interfaces, State College, PA, USA, pp. 152-158, Oct., 2004
L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, “Do You See What I Am Saying? Exploring Visual Enhancement of Speech Comprehension in Noisy Environments,” Cerebral Cortex, Vol.17, No.5, pp.1147-1153, 2007 https://doi.org/10.1093/cercor/bhl024
M. V. McCotter and T. R. Jordan, “The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception,” Perception, Vol.32, No.8, pp.921-936, 2003 https://doi.org/10.1068/p3316
J.-S. Lee and C. H. Park, “Training Hidden Markov Models by Hybrid Simulated Annealing for Visual Speech Recognition,” Proc. Int. Conf. Systems, Man, Cybernetics, Taipei, Taiwan, pp.198-202, Oct., 2006
A. R. Weeks Jr., 'Fundamentals of Electronic Image Processing,' SPIE/IEEE Press, 1995
H. Park, L. Gopishankar, and Y. Kim, “Adaptive Filtering for Noise Reduction in Hue Saturation Intensity Color Space,” Opt. Eng., Vol.41, No.6, pp.1232-1239, 2002 https://doi.org/10.1117/1.1475996
L. Rabiner and B.-H. Juang, 'Fundamentals of Speech Recognition,' Prentice-Hall, 1993
R. C. Gonzalez and R. E. Woods, 'Digital Image Processing,' Prentice-Hall, 2002
N. Evano, A. Caplier, and P.-Y. Coulon, “A New Color Transformation for Lips Segmentation,” Proc. Multimedia Signal Processing, Cannes, France, pp. 3-8, 2001
L. Gillick and S. J. Cox, “Some Statistical Issues in the Comparison of Speech Recognition Algorithms,” Proc. Int. Conf. Acoustics, Speech, Signal Processing, Glasgow, UK, pp.532-535, 1989

Cited by

Real Time Lip Reading System Implementation in Embedded Environment vol.17B, pp.3, 2010, https://doi.org/10.3745/KIPSTB.2010.17B.3.227

The KIPS Transactions:PartB (정보처리학회논문지B)

Automatic Lipreading Using Color Lip Images and Principal Component Analysis

컬러 입술영상과 주성분분석을 이용한 자동 독순

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)