Lip Detection using Color Distribution and Support Vector Machine for Visual Feature Extraction of Bimodal Speech Recognition System

바이모달 음성인식기의 시각 특징 추출을 위한 색상 분석자 SVM을 이용한 입술 위치 검출

  • 정지년 (한국과학기술원 전산학과) ;
  • 양현승 (한국과학기술원 전산학과)
  • Published : 2004.04.01

Abstract

Bimodal speech recognition systems have been proposed for enhancing recognition rate of ASR under noisy environments. Visual feature extraction is very important to develop these systems. To extract visual features, it is necessary to detect exact lip position. This paper proposed the method that detects a lip position using color similarity model and SVM. Face/Lip color distribution is teamed and the initial lip position is found by using that. The exact lip position is detected by scanning neighbor area with SVM. By experiments, it is shown that this method detects lip position exactly and fast.

바이모달 음성인식기는 잡음 환경하 음성인식 성능을 향상하기 위해 고안되었다. 바이모달 음 성인식기에 있어 영상을 통한 시각 특징 추출은 매우 중요한 역할을 하며 이를 위한 입술 위치 검출은 시각 특징 추출을 위한 중요한 선결 과제이다 본 논문은 색상분포와 SVM을 이용하여 시각 특징 추출을 위한 입술 위치 검출 방법을 제안하였다. 제안된 방법은 얼굴색/입술 색상 분포를 학습하여 이로부터 입술의 초기 위치를 빠르게 찾아내고 SVM을 이용하여 입술의 정확한 위치를 찾음으로써 정확하고 빠르게 입술의 위치를 찾도록 하였으며 실험을 통해 바이모달 인식기에 적용하기에 적합함을 알 수 있었다.

Keywords

References

  1. G. Potamianos, H. P. Graf, and E. Cosatto, An Image Transform Approach for HMM Based Automatic Lipreading, Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on, vol.3, Page(s): 173-177, 4-7 Oct 1998 https://doi.org/10.1109/ICIP.1998.999008
  2. Kaucic R and Blake A., Accurate, real-time, unadorned lip tracking, Sixth International Conference on Computer Vision, Page(s): 370-375, 4-7 Jan 1998 https://doi.org/10.1109/ICCV.1998.710745
  3. Sadeghi M., Kittler J. and Messer K., 'Modelling and segmentation of lip area in face images,' IEEE Proceedings on Vision, Image and Signal Processing, Volume: 149 Issue: 3, Page(s): 179 -184 , Jun 2002 https://doi.org/10.1049/ip-vis:20020378
  4. Zhang Jian, Kaynak M.N., Cheok A.D., Ko Chi Chung, Real-time lip tracking for virtual lip implementation in virtual environments and computer games, The 10th IEEE International Conference on Fuzzy Systems, Volume: 3, Page(s): 1359-1362, 2001 https://doi.org/10.1109/FUZZ.2001.1008910
  5. Lucey S., Sridharan. S. and Chandran. W., Chromatic lip tracking using a connectivity based fuzzy thresholding technique, ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and Its Applications, Volume: 2, Page(s): 669-672 vol.2, 1999 https://doi.org/10.1109/ISSPA.1999.815761
  6. Chan M.T., Zhang, Y. and Huang T.S., Real-time lip tracking and bimodal continuous speech recognition, IEEE Second Workshop on Multimedia Signal Processing, Page(s): 65-70, 7-9 Dec 1998 https://doi.org/10.1109/MMSP.1998.738914
  7. Delmas P., Eveno. N. and Lievin. M., Towards robust lip tracking, Proceedings of 16th International Conference on Pattern Recognition, Volume: 2, Page(s): 528-531 vol.2, 2002 https://doi.org/10.1109/ICPR.2002.1048356
  8. Zhilin Wu, Aleksic P.S. and Katsaggelos A.K. Lip tracking for MPEG-4 facial animation, Proceedings of Fourth IEEE International Conference on Multimodal Interfaces, Page(s): 293-298, 2002 https://doi.org/10.1109/ICMI.2002.1167009
  9. Robert M. Haralick and Linda G. Shapiro, Computer and Robot Vision, Vol.1 pp. 73-74, Addison-Wesley publishing company., 1992
  10. Christopher J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery 2, pp. 121-167, 1998 https://doi.org/10.1023/A:1009715923555