Browse > Article

Robust Endpoint Detection for Bimodal System in Noisy Environments  

오현화 (경북대학교 전자전기공학부)
권홍석 (경북대학교 전자전기공학부)
손종목 (경북대학교 전자전기공학부)
진성일 (경북대학교 전자전기공학부)
배건성 (l경북대학교 전자전기공학부)
Publication Information
Abstract
The performance of a bimodal system is affected by the accuracy of the endpoint detection from the input signal as well as the performance of the speech recognition or lipreading system. In this paper, we propose the endpoint detection method which detects the endpoints from the audio and video signal respectively and utilizes the signal to-noise ratio (SNR) estimated from the input audio signal to select the reliable endpoints to the acoustic noise. In other words, the endpoints are detected from the audio signal under the high SNR and from the video signal under the low SNR. Experimental results show that the bimodal system using the proposed endpoint detector achieves satisfactory recognition rates, especially when the acoustic environment is quite noisy.
Keywords
끝점검출;바이모달 시스템;입술독해;음성/영상 데이터베이스;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 H. Kaplan, C.J. Bally, and C. Garretson, Speechreading: A Way to Improve Understanding, Gallaudet University Press, Washington D.C., 1999
2 M.E. Hennecks, K.V. Prasad, and D.G. Stork, 'Automatic Speech Recognition System Using Acoustic and Visual Signals,' in Proc. of 29th Asilomar Conf. on Signals, Systems and Computers, vol. 2, pp. 1214-1218, 1995   DOI
3 L.R. Rabiner and M.R. Sambur, 'An Algorithm for Determining the Endpoints of Isolated Uttrances,' Bell Syst. Tech. J., vol. 54, no. 2, pp. 297-315, 1975   DOI
4 B. Dodd and R. Campbell, Hearing by Eye: The Psychology of Lip-reading, Lawrence Erbaum Press, Hillsdale NJ, 1987
5 C. Bregler and Y. Konig, 'Eigenlips for Robust Speech Recognition,' in Proc. of IEEE Int'l Conf. on Acoustics, Speech and Signal Processing, vol. 2, pp. 669-672, 1994   DOI
6 G.S. Ying, C.D. Mitchell, and L.H. Jamieson, 'Endpoing Detection of Isolated Utterances Based on a Modified Teager Energy Measurement,' in Proc. of IEEE Int'l Conf. on Acoustics, Speech and Signal Processing, pp. 732-735, 1993
7 박병구, 김진영, 최승호, '바이모달 음성인식의 음성정보와 입술정보 결합방법 비교,' 한국음향학회지, 제18권 제4호, pp. 31-37, 1999   과학기술학회마을
8 S. Dupont and J. Luettin, 'Audio-Visual Speech Modeling for Continuous Speech Recognition,' IEEE Trans. on Multimedia, vol. 2, no. 3, pp. 141-151, 2000   DOI   ScienceOn
9 L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpon, 'An Improved Endpoint Detector for Isolated Word Recognition,' IEEE Trans. Acoust., Speech, and Signal Processing, vol. 29, no. 4, pp. 777-785, 1981   DOI
10 Y. Ephraim and D. Malah, 'Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimator,' IEEE Trans. on Acoustic, Speech and Signal Processing, vol. ASSP-2, no. 6, pp. 1109-1121, 1984   DOI
11 H.-S. Kwon, J.-M. Son, S.-Y. Jung, and K.-S. Bae, 'Speech Enhancement Using Microphone Array with MMSE-STSA Based Post-Processing,' in Proc. of Int'l Conf. on Electronics, Information and Communications, pp. 186-189, Ulaanbaatar, Mongolia, Jul. 2002
12 H.-H. Oh, Y.-M. Jeoun, and S.-I. Chien, 'A Set of Mesh Features for Automatic Visual Speech Recognition,' in Proc. of IARP Workshop on Machine Vision Applications, pp. 488-491, Nara, Japan, Dec. 2002
13 S. Bou-Ghazale and K. Assaleh, 'A Robust Endpoint Detection of Speech for Noisy Environments with Application to Automatic Speech Recognition,' Proc. IEEE Int'l Conf. On Acoustics, Speech and Signal Processing, pp. IV-3808-IV-3811, Orlando, Florida, May 2002   DOI