Browse > Article

Speech Activity Detection using Lip Movement Image Signals  

Kim, Eung-Kyeu (한밭대학교)
Publication Information
Journal of the Institute of Convergence Signal Processing / v.11, no.4, 2010 , pp. 289-297 More about this Journal
Abstract
In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.
Keywords
Lip movement image signals; Acoustic noises; Speech & image processing system; Speech recognitions;
Citations & Related Records
연도 인용수 순위
  • Reference
1 F. Leymarie and M.D. Levine, "Simulating the Grassfire Transform Using the Active Contour Model", Trans. IEEE Pattern Analysis and Machine Intelligence, Vol.14, No.1, pp.56-75, 1992.   DOI   ScienceOn
2 김응규, 최정훈, 이수종, "연속영상 프레임으로부터 입술움직임 영상의 검출방법", 2008년 한국신호처리 시스템학회 추계학술대회 논문집, 제9권, 제2호, pp.433-437, 2008.
3 김응규, 최정훈, "영상 환경에 적응하는 강인한 입술움직임 영역 추적법", 2009년 한국신호처리시스템학회 하계학술대회 논문집, 제10권, 제1호, pp.77-80, 2009
4 G. Potaminanos, C. Neti, J. Luettin and I. Matthews, Audio-visual automatic speech recognition: An overview, in issue in Visual Speech Processing, MIT Press, 2004.
5 Z. Q. Wu, J. A. Ware, W. R. Stewart, and J. Jiang, "The Removal of Blocking Effects Caused by Partially Overlapped Sub-activity Contrast Enhancement", Journal of Electronic Imaging, Vol.14, Issue 3, 033006(8 pages), July-Sept. 2005.   DOI   ScienceOn
6 김응규, 이수종, "입술움직임 영상신호를 활용한 음성 구간 검출", 2007년 한국신호처리시스템학회 추계학술대회 논문집, 제8권, 제2호, pp.187-192, 2007
7 A. W. Liew, S. H. Leung, and W. H. Lau, "Lip contour extraction from color images using a deformable model", Pattern Recognition, Vol.35, No.12 , pp.2949-2962, 2002.   DOI   ScienceOn
8 G. Potamianos, & C. Neti, "Audio-visual speech recognition In challenging environments", In Proceedings of the European Conference on Speech Communication and Technology, (Geneva, Switzerland), pp.1293-1296, 2003.
9 A. Liew and S. Wang, "Visual Speech Recognition: Lip Segmentation and Mapping", editors, IGI Global, 2009.
10 Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing, Second Edition, pp.567-642. 2002.
11 V. Libal, J. Connell, G. Potamianos, and E. Marcheret, "An embedded system for in-vehicle visual speech activity detection", In proceedings of the International Workshop on Multimedia Signal Processing(MMSP 2007), pp.255-258, Chania, Greece, 2007.
12 Lawrence Rabiner, Biing-Hwang juang, Fundamentals of Speech Recognition, Prentice Hall, pp.11-68, 1993.
13 G. Potamianos, & C. Neti, G. Gravier, A. Grag, & A.W. Senior, "Recent advances in the automatic recognition of Audio-visual speech", Proceedings of the IEEE, Vol.91 , No.9, pp.1306-1326, 2003.
14 S. Nakamura, and E. Yamamoto, "Speech-to-lip movement synthesis by maximizing audio-visual joint probability based on the em algorithm", Journal of VLSI Signal Processing, Vol.27, No.1-2, pp.119-126, 2001.   DOI
15 Shogo Nishida, "Speech Recognition Enhancement by Lip-Information", Media Laboratory, MIT Cambridge, MA 02139, pp.198-204, April 1986.
16 M.T. Zhang, and T.S. Huang, "Real-Time Lip Tracking and Bimodal Continuous Speech Recognition", IEEE Second Workshop on Multimedia Signal Proceeding, pp.65-70, 7-9 Dec. 1998.
17 G. Potaminanos, H.P. Graf, and E. Cosatto, "An Image Transform Approach for HMM Based Automatic Lipreading," Image Processing, 1988. ICIP 98, Proceeding, pp.173-177, Oct. 1998.
18 P. Lucey, & G. Potamianos, "Lipreading using profile versus frontal views", In Proceedings of the International Workshop on Multimedia Signal Processing, (Victoria, Canada), pp.24-28, 2006.