Browse > Article

잡음에 강인한 시청각 음성인식  

Lee, Jong-Seok (한국과학기술원 전자전산학부)
Park, Cheol-Hun (한국과학기술원 전자전산학부)
Publication Information
ICROS / v.13, no.3, 2007 , pp. 28-34 More about this Journal
Keywords
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, New Jersey, 1993
2 S. Dupont and J. Luettin, 'Audio-visual speech modeling for continuous speech recognition,' IEEE Trans. Multimedia, vol. 2, no. 3,pp.141-151, Sept. 2000   DOI   ScienceOn
3 이종석, 심선희, 김소영, 박철훈, '제어되지 않은 조명 조건하에서 입술움직임의 강인한 특징추출을 이용한 바이모달 음성인식,' Telecommunications Review, 제 14권 1호, pp. 123-134, 2004년 2월
4 B. Conrey and D. B. Pisoni, 'Auditory-visual speech perception and synchrony detection for speech and nonspeech signals,' Journal of Acoustical Society of America, vol. 119, no. 6, pp. 4065-4073, June, 2006   DOI   ScienceOn
5 이종석, 박철훈, '시청각 음성인식을 위한 정보통합: 신뢰도 측정방식의 비교와 신경회로망을 이용한 통합 기법,' Telecommunications Review, 제 17권 3호, pp. 538-550, 2007년 6월
6 S. Nakamura, 'Statistical multimodal integration for audio-visual speech processing,' IEEE Trans. Neural Networks, vol. 13, no. 4, pp. 854-866, Jul. 2002   DOI   ScienceOn
7 S. M. Chu and T. S. Huang, 'Audio-visual speech modeling using coupled hidden Markov models,' in Proc. Int. Conf Acoustics, Speech and Signal Processing, vol. 2, Orlando, FL, pp. 2009-2012, May 2002
8 http://voice.etri.re.kr/DBSearch/Voice.asp
9 L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, 'Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments,' Cerebral Cortex, vol. 17, no. 5, pp. 1147-1153, 2007   DOI   ScienceOn
10 S. Pigeon and L. Vandendorpe, 'The M2VTS multimodal face database,' in Proc. Int. Conf Audio- and Video-based Biometric Person Authentication, pp. 403-409, 1997   DOI   ScienceOn
11 A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, 'Dynamic Bayesian networks for audio-visual speech recognition,' EURASIP J. Applied Signal Processing, vol. 11, pp. 1-15, 2002
12 R. M. Stem, B. Raj, and P. J. Moreno, 'Compensation for environmental degradation in automatic speech recognition,' in Proc. ESCA-NATO Tutorial and Research Workshop on Robust Speech Recognition using Unknown Communication Channels, Pont-a-mousson, France, pp. 33-42, Apr. 1997
13 T. Coianiz, L. Torresani, and B. Capril, '2D deformable models for visual speech analysis,' in D. G. Stork and M. E. Hennecke, eds., Speechreading by Humans and Machines: Models, Systems and Applications, pp. 391-398, Springer-Verlag, Berlin, German, 1996
14 H. McGurk and J. MacDonald, 'Hearing lips and seeing voices,' Nature, vol. 264, pp. 746-748,Dec., 1976   DOI   ScienceOn
15 H. P. Graf, E. Cosatto, and G. Potamianos, 'Robust recognition of faces and facial features with a multi-modal system,' in Proc. Int. Conf Systems, Man and Cybernetics, pp. 2034-2039,1997
16 M. N. Kaynak, Q. Zhi, A. D. Cheok, K. Sengupta, Z. Jian, and K. C. Chung, 'Lip geometric features for human-computer interaction using bimodal speech recognition: comparison and analysis,' Speech Communication, vol. 43, no. 1-2, pp. 1-16, Jan. 2004   DOI   ScienceOn
17 M. S. Gray, J. R. Movellan, and T. J. Sejnowski, 'Dynamic features for visual speechreading: a systematic comparison,' Advances in Neural Information Processing Systems, vol. 9, pp. 751-757,1997
18 T. J. Hazen, 'Visual model structures and synchrony constraints for audio-visual speech recognition,' IEEE Trans. Audio, Speech, Language Processing, vol. 14, no. 3, pp. 1082-1089, May 2006   DOI   ScienceOn
19 K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, 'XM2VTS: the extended M2VTS database,' in Proc. Int. Conf Audio and Video-based Biometric Person Authentication, pp. 72-76,1999
20 G. Potamianos, H. P. Graf, and E. Cosatto, 'An image transform approach for HMM based automatic lipreading,' in Proc. Int. Conf. Image Processing, vol. 3, Chicago, pp. 173-177, 1998
21 C. Benoit, 'The intrinsic bimodality of speech communication and the synthesis of talking faces,' in M. M. Taylor, F. Nel, and D. Bouwhuis, eds., The Structure of Multimodal Dialogue II, John Benjamins, Amsterdam, The Netherlands, pp. 485-502, 2000
22 G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, 'Recent advances in the automatic recognition of audiovisual speech,' Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sept. 2003   DOI   ScienceOn
23 S. Tamura, K. Iwano, and S. Furui, 'A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization,' in Proc. Int. Conf Acoustics, Speech and Signal Processing, vol. 1, pp. 469-472, 2005
24 A. Q. Summerfield, 'Some preliminaries to a comprehensive account of audio-visual speech perception, in B. Dodd and R. Campbell, eds., Hearing by Eye: The Psychology of Lip-reading, pp. 3-51, Lawrence Erlbarum, London, 1987
25 H. Hermansky and N. Morgan, 'RASTA processing of speech,' IEEE Trans. Speech and Audio Processing, vol. 2, no. 4, pp. 578-589,1994   DOI   ScienceOn
26 S. Bengio, 'Mnltimodal speech processing using asynchronous hidden Markov models,' Information Fusion, vol. 5, pp. 81-89, 2004   DOI   ScienceOn