DOI QR코드

DOI QR Code

Automatic Lipreading Using Color Lip Images and Principal Component Analysis

컬러 입술영상과 주성분분석을 이용한 자동 독순

  • 이종석 (한국과학기술원 전자전산학부) ;
  • 박철훈 (한국과학기술원 전자전산학부)
  • Published : 2008.06.30

Abstract

This paper examines effectiveness of using color images instead of grayscale ones for automatic lipreading. First, we show the effect of color information for performance of humans' lipreading. Then, we compare the performance of automatic lipreading using features obtained by applying principal component analysis to grayscale and color images. From the experiments for various color representations, it is shown that color information is useful for improving performance of automatic lipreading; the best performance is obtained by using the RGB color components, where the average relative error reductions for clean and noisy conditions are 4.7% and 13.0%, respectively.

본 논문은 화자의 입술 움직임으로부터 음성을 인식하는 자동 독순에서 회색조 영상 대신 컬러 영상을 사용하는 것의 유용성에 대해 고찰한다. 먼저 인간의 독순 실험을 통해 컬러 정보가 인식 성능에 어떠한 영향을 미치는지 확인한다. 다음으로 주성분분석을 이용한 자동 독순에서 회색조 또는 컬러 입술영상을 사용하는 경우에 대해 인식 성능을 비교한다. 다양한 컬러 좌표계에 대한 실험을 통해 컬러 영상의 사용으로 인식율이 향상됨을 보인다. 특히 RGB 좌표계를 사용했을 때 가장 좋은 성능을 얻으며, 회색조의 경우에 비해 잡음이 없는 환경에서는 4.7%, 잡음이 있는 경우 평균 13.0%의 상대적 오인식율 감소를 얻을 수 있음을 확인한다.

Keywords

References

  1. 이종석, 박철훈, “잡음에 강인한 시청각 음성인식,” iCROS, 제13권 제3호, pp.28-34, 2007년 9월
  2. P. Scanlon and R. Reilly, “Feature Analysis for Automatic Speechreading,” Proc. Int. Conf. Multimedia and Expo, Tokyo, Japan, pp.625-630, Apr., 2001
  3. P. Daubias, “Is Color Information Really Useful for Lip-reading? (Or What Is Lost When Color Is Not Used),” Proc. Interspeech, Lisbon, Portugal, pp.1193-1196, 2005
  4. S. L. Wang, W. H. Lau, and S. H. Leung, “Automatic Lip Contour Extraction from Color Images,” Pattern Recognit., Vol.37, No.12, pp.2375-2387, 2004 https://doi.org/10.1016/S0031-3203(04)00196-7
  5. G. I. Chiou and J. N. Hwang, “Lipreading from Color Video”, IEEE Trans. Image Processing, Vol.6, No.8, pp.1192-1195, 1997 https://doi.org/10.1109/83.605417
  6. S. Lucey, “An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition,” Proc. Int. Conf. Audio-Video-based Biometric Person Authentication, Guildford, UK, pp. 260-267, 2003 https://doi.org/10.1007/3-540-44887-X_31
  7. K. Saenko, T. Darrell, and J. Glass, “Articulatory Features for Robust Visual Speech Recognition,” Proc. Int. Conf. Multimodal Interfaces, State College, PA, USA, pp. 152-158, Oct., 2004
  8. L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, “Do You See What I Am Saying? Exploring Visual Enhancement of Speech Comprehension in Noisy Environments,” Cerebral Cortex, Vol.17, No.5, pp.1147-1153, 2007 https://doi.org/10.1093/cercor/bhl024
  9. M. V. McCotter and T. R. Jordan, “The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception,” Perception, Vol.32, No.8, pp.921-936, 2003 https://doi.org/10.1068/p3316
  10. J.-S. Lee and C. H. Park, “Training Hidden Markov Models by Hybrid Simulated Annealing for Visual Speech Recognition,” Proc. Int. Conf. Systems, Man, Cybernetics, Taipei, Taiwan, pp.198-202, Oct., 2006
  11. A. R. Weeks Jr., 'Fundamentals of Electronic Image Processing,' SPIE/IEEE Press, 1995
  12. H. Park, L. Gopishankar, and Y. Kim, “Adaptive Filtering for Noise Reduction in Hue Saturation Intensity Color Space,” Opt. Eng., Vol.41, No.6, pp.1232-1239, 2002 https://doi.org/10.1117/1.1475996
  13. L. Rabiner and B.-H. Juang, 'Fundamentals of Speech Recognition,' Prentice-Hall, 1993
  14. R. C. Gonzalez and R. E. Woods, 'Digital Image Processing,' Prentice-Hall, 2002
  15. N. Evano, A. Caplier, and P.-Y. Coulon, “A New Color Transformation for Lips Segmentation,” Proc. Multimedia Signal Processing, Cannes, France, pp. 3-8, 2001
  16. L. Gillick and S. J. Cox, “Some Statistical Issues in the Comparison of Speech Recognition Algorithms,” Proc. Int. Conf. Acoustics, Speech, Signal Processing, Glasgow, UK, pp.532-535, 1989

Cited by

  1. Real Time Lip Reading System Implementation in Embedded Environment vol.17B, pp.3, 2010, https://doi.org/10.3745/KIPSTB.2010.17B.3.227