Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

  • Tariquzzaman, Md. (School of Electronics & Computer Engineering, Chonnam National University) ;
  • Kim, Jin-Young (School of Electronics & Computer Engineering, Chonnam National University) ;
  • Na, Seung-You (School of Electronics & Computer Engineering, Chonnam National University) ;
  • Choi, Seung-Ho (Dept. of Computer Eng, Dongshin University)
  • Published : 2009.09.30

Abstract

Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.

Keywords

References

  1. B.S. Atal. 'Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification.' J. Acoust. Soc. Am.. vol. 55, no. 6. PP. 1304-1312. 1974 https://doi.org/10.1121/1.1914702
  2. A,K. Jain, A. Ross, S. Prabhakar, 'An introduction to biometric recognition,' IEEE Tran. Circuits Sys. Video Technol., vol. 14, no. 1, PP. 4-20, 2004 https://doi.org/10.1109/TCSVT.2003.818349
  3. M. Brand, N. Oliver, A. Pentland, 'Coupled hidden Markov models for complex action recognition,' In Proc. of IEEE Internat. Cont. on Computer Vision and Pattern Recognition, PP. 994-999. 1997 https://doi.org/10.1109/CVPR.1997.609450
  4. J. P. Campbell, 'Speaker recognition: a tutorial,' In Proc. IEEE, vol. 85, no. 9, pp. 1437-1462, 1997 https://doi.org/10.1109/5.628714
  5. R. Chengalvarayan and L. Deng, 'A maximum a posteriori approach to speaker adaptation using the trended hidden Markov model,' IEEE Trans. Speech Audio Proc. vol. 9, no. 6, PP. 549-557. 2001 https://doi.org/10.1109/89.928919
  6. U. V. Chaudhari, et al .. 'Audio-visual speaker recognition using time-varying stream reliability prediction,' Proceeding of IEEE Int. Conference on Acoustics, speech and signal proc. vol. 5, PP. 712-715, 2003
  7. R. Eberhart and J. Kennedy, 'A new optimizer using particle swarm theory,' In Proc. Sixth Int. Symposium on Micro Machine and Human Science, PP. 39-43, 1995 https://doi.org/10.1109/MHS.1995.494215
  8. S. Furui, 'Cepstral Analysis technique for automatic speaker verification,' IEEE Trans. on Acoustics, Speech, and Signal Proc., vol. 29, no. 2, pp. 254-272, 1981 https://doi.org/10.1109/TASSP.1981.1163530
  9. H. Hermansky and N. Morgan, 'RASTA processing of speech,' IEEE Trans. on Speech and Audio Proc .. vol. 2, no. 4, PP. 578 -589, 1994 https://doi.org/10.1109/89.326616
  10. M. Heckmann, F. Berthommier, and K. Kristian, 'Noise adaptive stream weighting in audio-visual speaker identification,' EURASIP J. Applied Signal Proc. vol. 2002, pp. 1260-1273, 2002. https://doi.org/10.1155/S1110865702206150
  11. N. Kambhatla, and T.K. Leen, 'Dimension reduction by local PCA,' Neural Computation, vol. 9, no. 7, PP. 1493-1503, 1997 https://doi.org/10.1162/neco.1997.9.7.1493
  12. C.H. Lee, C.H. Lin and B.H. Juang, 'A Study on speaker adaptation on the parameters of continuous density hidden Markov models,' IEEE Trans. of Signal Proc. vol. 39, no. 4, pp. 806-814, 1991 https://doi.org/10.1109/78.80902
  13. R.J.Mammone, X. Zhang and R. P. Ramachandran, 'Robust speaker recognition: a feature-based approach,' IEEE Signal Processing Magazine vol. 13, no. 5, PP. 58-71, 1996 https://doi.org/10.1109/79.536825
  14. E. Mengusoglu, 'Confidence measure based model adaptation for speaker verification,' In Proc. 2nd lASTED Internat Conf. on ommunications, Internet and Information Technology, pp. 408- 411, 2000
  15. N. A. Fox, Audio and video based person identification, Ph.D. thesis, University College Dublin, 2005
  16. D. A. Reynolds, 'An overview of automatic speaker recognition technology,' Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing, vol. 4, PP. 4072-4075, 2000 https://doi.org/10.1109/ICASSP.2002.1004813
  17. D. A. Reynolds, R. C. Ross, 'Robust text-independent speaker identification using Gaussian mixture speaker models.' IEEE Trans. Speech Audio Proc. vol. 3, no.1, PP. 72-82, 1995 https://doi.org/10.1109/89.365379
  18. D. Stephane, and R. Christophe, 'Robust feature extraction and acoustic modeling at multitel: experiments on the Aurora databases,' In Proc. Eurospeech, PP. 1789-1792, 2003
  19. C. H. Sit, M. W. Mak and S. Y. Kung, 'Maximum likelihood and maximum a posteriori adaptation for distributed speaker recognition systems,' In Proc. of 1st Internat. Conf. on Biometric Authentication, PP. 640-647, 2004
  20. C. Sanderson, Biometric Person Recognition: Face, Speech and Fusion, VDM-Verlag, 2008
  21. M. Tariquzzaman, Jin Young Kim and Joon-Hee Hong, 'Improvement of reliability based information integration in audio-visual person identification,' J. Korean Soc. of Phonetic Sc~ Speech Technol. Vol. 62, PP. 149-161, 2007
  22. K. Yiu, M .. Mak and S. Kung, 'Environment adaptation for robust speaker verification,' In Proc. EUROSPEECH, pp. 2973-2976, 2003