A Novel Integration Scheme for Audio Visual Speech Recognition

Pham, Than Trung;Kim, Jin-Young;Na, Seung-You;

doi:10.7776/ASK.2009.28.8.832

한국음향학회지 (The Journal of the Acoustical Society of Korea)

제28권8호
/
Pages.832-842
/
2009
/
1225-4428(pISSN)
/
2287-3775(eISSN)

한국음향학회 (The Acoustical Society of Korea)

DOI QR Code

A Novel Integration Scheme for Audio Visual Speech Recognition

Pham, Than Trung (School of Electronics & Computer Engineering Chonnam National University) ;
Kim, Jin-Young (School of Electronics & Computer Engineering Chonnam National University) ;
Na, Seung-You (School of Electronics & Computer Engineering Chonnam National University)

발행 : 2009.11.30

https://doi.org/10.7776/ASK.2009.28.8.832 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Automatic speech recognition (ASR) has been successfully applied to many real human computer interaction (HCI) applications; however, its performance tends to be significantly decreased under noisy environments. The invention of audio visual speech recognition (AVSR) using an acoustic signal and lip motion has recently attracted more attention due to its noise-robustness characteristic. In this paper, we describe our novel integration scheme for AVSR based on a late integration approach. Firstly, we introduce the robust reliability measurement for audio and visual modalities using model based information and signal based information. The model based sources measure the confusability of vocabulary while the signal is used to estimate the noise level. Secondly, the output probabilities of audio and visual speech recognizers are normalized respectively before applying the final integration step using normalized output space and estimated weights. We evaluate the performance of our proposed method via Korean isolated word recognition system. The experimental results demonstrate the effectiveness and feasibility of our proposed system compared to the conventional systems.

키워드

참고문헌

Nefian, L. Laing, X. Pi, L. Xioxiang, C. Mao and K. Murphy, "Dynamic Bayesian Networks for Audio-Visual Speech Recognition," EURASIP Journal on Applied Signal Processing, vol. 1, pp. 1274 - 1288, 2002 https://doi.org/10.1155/S1110865702206083
Petajan, E.D., "Automatic Lipreading to Enhance Speech Recognition," Proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 40-47, 1985
T. Chen, "Audiovisual speech processing," IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 9-21, 2001 https://doi.org/10.1109/79.911195
P. Duchnowski, U. Meier, A. Waibel, "See Me, Hear Me: lntergrating Automatic Speech Recognition and Lipreading", Proceedings of ICSLP pp. 547-550, 1994
G. Potamianos, C. Neti, J. Luettin, and I. Matthews, “Audio-Visual Automatic Speech Recognition: An Overview,” in Issues in Visual and Audio-Visual Speech Processing, G. Bailly, E. Vatikiotis-Bateson, and P. Perrier (Eds.), MIT Press, Boston, 2004
F. Berthommier, H, Glotin, "A new SNR-feature mapping for robust multistream speech recognition," Proceedings of International Congress on Phonetic Sciences (ICPhS), vol. 1, pp. 711-715, San Francisco, 1999
Md. J. Alam, Md. F. Chowdhury, Md. F. Alam, "Comparative Study of A Priori Signal-To Noise Ratio (SNR) Estimation Approaches for Speech Enhancement", Journal of Electrical & Electronics Engineering, vol. 9, no. 1, pp. 809-817, 2009
A. Rogozan, P. Del'eglise, and M. Alissali, “Adaptive determination of audio and visual weights for automatic speech recognition,” Proceedings of European Tutorial Workshop on Audio-Visual Speech Processing (AVSP), pp. 61 - 64, 1997
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," Proceedings of IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, pp. 173 - 176, 2001 https://doi.org/10.1109/ICASSP.2001.940795
M. Heckmann, F. Berthommier and K. Kroschel, "Noise Adaptive Stream Weighting in Audio-Visual Speech Recognitions," EURASIP Journal on Applied Signal Processing, vol. 2002, no. 1, pp. 1260 - 1273, 2002 https://doi.org/10.1155/S1110865702206150
M. Gurban and J.-Ph. Thiran, " Using Entropy as a Stream Reliability Estimate for Audio-Visual Speech Recognition," Proceedings of 16th European Signal Processing Conference, Lausanne, Switzerland, August pp. 25-29, 2008
J.-S. Lee and C. H. Park, "Adaptive Decision Fusion for Audio-Visual Speech Recognition," in Speech Recognition, Technology and Applications, I-Tech, Vienna, Austria, pp. 275-296, 2008
J. Kennedy, and R. Eberhart, "Particle Swarm Optimization," Proceedings of the IEEE Int. Conf. on Neural Networks, Piscataway, NJ, pp. 1942 - 1948, 1995
Kuliback, S; Leibler, R.A, "On Information and Sufficiency," The Annals of Mathematical Statistics, vol. 22 (1): pp. 79 - 86, 1951 https://doi.org/10.1214/aoms/1177729694
A. Bhattacharyya, “On a Measure of Divergence between Two Statistical Populations Defined by Probability Distributions,” Bull. Calcutta Math. Soc., vol. 35, pp. 99 - 109, 1943
Printz et al., “Theory and Practice of Acoustic Confusability”, Proceedings of the ISCA ITRW ASR2000, pp. 77-84, Paris, France, Sep. 18-20, 2000 https://doi.org/10.1006/csla.2001.0188
John Hershey and Peder Olsen, “Approximating the Kullback LeibIer divergence between gaussian mixture models,” Proceedings of ICASSP 2007, Honolulu, Hawaii, April 2007
J.R. Hershey, P.A. Olsen, "Variational Bhattacharyya Divergence for Hidden Markov Models", Proceedings of ICASSP 2008, pp. 4557-4560, 2008
John R. Hershey, Peder A. Olsen, and Steven J. Rennie, "Variational Kullback Leibler Divergence for Hidden Markov Models," Proceedings of ASRU, Kyoto, Japan, pp. 323-328, December 2007. https://doi.org/10.1109/ASRU.2007.4430132
Jia-Yu Chen, Peder Olsen, and John Hershey, "Word Confusability - Measuring Hidden Markov Model Similarity," Proceedings of Interspeech 2007 pp. 2089-2092, August 2007
J. Silva and S. Narayanan, "Average Divergence Distance as a Statistical Discrimination Measure for Hidden Markov Models," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, issue 3, pp. 890-906, May 2006 https://doi.org/10.1109/TSA.2005.858059
http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html

한국음향학회지 (The Journal of the Acoustical Society of Korea)

A Novel Integration Scheme for Audio Visual Speech Recognition

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)