[KSCI] Korea Science Citation Index Service

잡음에 강인한 시청각 음성인식

Lee, Jong-Seok (한국과학기술원 전자전산학부)
Park, Cheol-Hun (한국과학기술원 전자전산학부)

Publication Information

ICROS / v.13, no.3, 2007 , pp. 28-34 More about this Journal

Keywords

Citations & Related Records

Reference

1	L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, New Jersey, 1993
2	S. Dupont and J. Luettin, 'Audio-visual speech modeling for continuous speech recognition,' IEEE Trans. Multimedia, vol. 2, no. 3,pp.141-151, Sept. 2000 DOI ScienceOn
3	이종석, 심선희, 김소영, 박철훈, '제어되지 않은 조명 조건하에서 입술움직임의 강인한 특징추출을 이용한 바이모달 음성인식,' Telecommunications Review, 제 14권 1호, pp. 123-134, 2004년 2월
4	B. Conrey and D. B. Pisoni, 'Auditory-visual speech perception and synchrony detection for speech and nonspeech signals,' Journal of Acoustical Society of America, vol. 119, no. 6, pp. 4065-4073, June, 2006 DOI ScienceOn
5	이종석, 박철훈, '시청각 음성인식을 위한 정보통합: 신뢰도 측정방식의 비교와 신경회로망을 이용한 통합 기법,' Telecommunications Review, 제 17권 3호, pp. 538-550, 2007년 6월
6	S. Nakamura, 'Statistical multimodal integration for audio-visual speech processing,' IEEE Trans. Neural Networks, vol. 13, no. 4, pp. 854-866, Jul. 2002 DOI ScienceOn
7	S. M. Chu and T. S. Huang, 'Audio-visual speech modeling using coupled hidden Markov models,' in Proc. Int. Conf Acoustics, Speech and Signal Processing, vol. 2, Orlando, FL, pp. 2009-2012, May 2002
8	http://voice.etri.re.kr/DBSearch/Voice.asp
9	L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, 'Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments,' Cerebral Cortex, vol. 17, no. 5, pp. 1147-1153, 2007 DOI ScienceOn
10	S. Pigeon and L. Vandendorpe, 'The M2VTS multimodal face database,' in Proc. Int. Conf Audio- and Video-based Biometric Person Authentication, pp. 403-409, 1997 DOI ScienceOn
11	A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, 'Dynamic Bayesian networks for audio-visual speech recognition,' EURASIP J. Applied Signal Processing, vol. 11, pp. 1-15, 2002
12	R. M. Stem, B. Raj, and P. J. Moreno, 'Compensation for environmental degradation in automatic speech recognition,' in Proc. ESCA-NATO Tutorial and Research Workshop on Robust Speech Recognition using Unknown Communication Channels, Pont-a-mousson, France, pp. 33-42, Apr. 1997
13	T. Coianiz, L. Torresani, and B. Capril, '2D deformable models for visual speech analysis,' in D. G. Stork and M. E. Hennecke, eds., Speechreading by Humans and Machines: Models, Systems and Applications, pp. 391-398, Springer-Verlag, Berlin, German, 1996
14	H. McGurk and J. MacDonald, 'Hearing lips and seeing voices,' Nature, vol. 264, pp. 746-748,Dec., 1976 DOI ScienceOn
15	H. P. Graf, E. Cosatto, and G. Potamianos, 'Robust recognition of faces and facial features with a multi-modal system,' in Proc. Int. Conf Systems, Man and Cybernetics, pp. 2034-2039,1997
16	M. N. Kaynak, Q. Zhi, A. D. Cheok, K. Sengupta, Z. Jian, and K. C. Chung, 'Lip geometric features for human-computer interaction using bimodal speech recognition: comparison and analysis,' Speech Communication, vol. 43, no. 1-2, pp. 1-16, Jan. 2004 DOI ScienceOn
17	M. S. Gray, J. R. Movellan, and T. J. Sejnowski, 'Dynamic features for visual speechreading: a systematic comparison,' Advances in Neural Information Processing Systems, vol. 9, pp. 751-757,1997
18	T. J. Hazen, 'Visual model structures and synchrony constraints for audio-visual speech recognition,' IEEE Trans. Audio, Speech, Language Processing, vol. 14, no. 3, pp. 1082-1089, May 2006 DOI ScienceOn
19	K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, 'XM2VTS: the extended M2VTS database,' in Proc. Int. Conf Audio and Video-based Biometric Person Authentication, pp. 72-76,1999
20	G. Potamianos, H. P. Graf, and E. Cosatto, 'An image transform approach for HMM based automatic lipreading,' in Proc. Int. Conf. Image Processing, vol. 3, Chicago, pp. 173-177, 1998
21	C. Benoit, 'The intrinsic bimodality of speech communication and the synthesis of talking faces,' in M. M. Taylor, F. Nel, and D. Bouwhuis, eds., The Structure of Multimodal Dialogue II, John Benjamins, Amsterdam, The Netherlands, pp. 485-502, 2000
22	G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, 'Recent advances in the automatic recognition of audiovisual speech,' Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sept. 2003 DOI ScienceOn
23	S. Tamura, K. Iwano, and S. Furui, 'A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization,' in Proc. Int. Conf Acoustics, Speech and Signal Processing, vol. 1, pp. 469-472, 2005
24	A. Q. Summerfield, 'Some preliminaries to a comprehensive account of audio-visual speech perception, in B. Dodd and R. Campbell, eds., Hearing by Eye: The Psychology of Lip-reading, pp. 3-51, Lawrence Erlbarum, London, 1987
25	H. Hermansky and N. Morgan, 'RASTA processing of speech,' IEEE Trans. Speech and Audio Processing, vol. 2, no. 4, pp. 578-589,1994 DOI ScienceOn
26	S. Bengio, 'Mnltimodal speech processing using asynchronous hidden Markov models,' Information Fusion, vol. 5, pp. 81-89, 2004 DOI ScienceOn

KSCI

잡음에 강인한 시청각 음성인식 잡음에 강인한 시청각 음성인식

잡음에 강인한 시청각 음성인식