[KSCI] Korea Science Citation Index Service

Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

Tariquzzaman, Md. (School of Electronics & Computer Engineering, Chonnam National University)
Kim, Jin-Young (School of Electronics & Computer Engineering, Chonnam National University)
Na, Seung-You (School of Electronics & Computer Engineering, Chonnam National University)
Choi, Seung-Ho (Dept. of Computer Eng, Dongshin University)

Publication Information

The Journal of the Acoustical Society of Korea / v.28, no.3E, 2009 , pp. 109-117 More about this Journal

Abstract

Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.

Keywords

Person Identification; Local PCA; Reliability Measures; Particle Swarm Optimization;

Citations & Related Records

Reference

1	A,K. Jain, A. Ross, S. Prabhakar, 'An introduction to biometric recognition,' IEEE Tran. Circuits Sys. Video Technol., vol. 14, no. 1, PP. 4-20, 2004 DOI ScienceOn
2	M. Brand, N. Oliver, A. Pentland, 'Coupled hidden Markov models for complex action recognition,' In Proc. of IEEE Internat. Cont. on Computer Vision and Pattern Recognition, PP. 994-999. 1997 DOI
3	S. Furui, 'Cepstral Analysis technique for automatic speaker verification,' IEEE Trans. on Acoustics, Speech, and Signal Proc., vol. 29, no. 2, pp. 254-272, 1981 DOI
4	D. A. Reynolds, 'An overview of automatic speaker recognition technology,' Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing, vol. 4, PP. 4072-4075, 2000 DOI
5	C. H. Sit, M. W. Mak and S. Y. Kung, 'Maximum likelihood and maximum a posteriori adaptation for distributed speaker recognition systems,' In Proc. of 1st Internat. Conf. on Biometric Authentication, PP. 640-647, 2004
6	K. Yiu, M .. Mak and S. Kung, 'Environment adaptation for robust speaker verification,' In Proc. EUROSPEECH, pp. 2973-2976, 2003
7	C.H. Lee, C.H. Lin and B.H. Juang, 'A Study on speaker adaptation on the parameters of continuous density hidden Markov models,' IEEE Trans. of Signal Proc. vol. 39, no. 4, pp. 806-814, 1991 DOI
8	C. Sanderson, Biometric Person Recognition: Face, Speech and Fusion, VDM-Verlag, 2008
9	M. Heckmann, F. Berthommier, and K. Kristian, 'Noise adaptive stream weighting in audio-visual speaker identification,' EURASIP J. Applied Signal Proc. vol. 2002, pp. 1260-1273, 2002. DOI ScienceOn
10	B.S. Atal. 'Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification.' J. Acoust. Soc. Am.. vol. 55, no. 6. PP. 1304-1312. 1974 DOI ScienceOn
11	M. Tariquzzaman, Jin Young Kim and Joon-Hee Hong, 'Improvement of reliability based information integration in audio-visual person identification,' J. Korean Soc. of Phonetic Sc~ Speech Technol. Vol. 62, PP. 149-161, 2007 과학기술학회마을
12	R.J.Mammone, X. Zhang and R. P. Ramachandran, 'Robust speaker recognition: a feature-based approach,' IEEE Signal Processing Magazine vol. 13, no. 5, PP. 58-71, 1996 DOI ScienceOn
13	U. V. Chaudhari, et al .. 'Audio-visual speaker recognition using time-varying stream reliability prediction,' Proceeding of IEEE Int. Conference on Acoustics, speech and signal proc. vol. 5, PP. 712-715, 2003
14	D. Stephane, and R. Christophe, 'Robust feature extraction and acoustic modeling at multitel: experiments on the Aurora databases,' In Proc. Eurospeech, PP. 1789-1792, 2003
15	E. Mengusoglu, 'Confidence measure based model adaptation for speaker verification,' In Proc. 2nd lASTED Internat Conf. on ommunications, Internet and Information Technology, pp. 408- 411, 2000
16	J. P. Campbell, 'Speaker recognition: a tutorial,' In Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, 1997 DOI ScienceOn
17	N. Kambhatla, and T.K. Leen, 'Dimension reduction by local PCA,' Neural Computation, vol. 9, no. 7, PP. 1493-1503, 1997 DOI ScienceOn
18	N. A. Fox, Audio and video based person identification, Ph.D. thesis, University College Dublin, 2005
19	H. Hermansky and N. Morgan, 'RASTA processing of speech,' IEEE Trans. on Speech and Audio Proc .. vol. 2, no. 4, PP. 578 -589, 1994 DOI ScienceOn
20	R. Eberhart and J. Kennedy, 'A new optimizer using particle swarm theory,' In Proc. Sixth Int. Symposium on Micro Machine and Human Science, PP. 39-43, 1995 DOI
21	D. A. Reynolds, R. C. Ross, 'Robust text-independent speaker identification using Gaussian mixture speaker models.' IEEE Trans. Speech Audio Proc. vol. 3, no.1, PP. 72-82, 1995 DOI ScienceOn
22	R. Chengalvarayan and L. Deng, 'A maximum a posteriori approach to speaker adaptation using the trended hidden Markov model,' IEEE Trans. Speech Audio Proc. vol. 9, no. 6, PP. 549-557. 2001 DOI ScienceOn