Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

  • Published : 2009.03.31

Abstract

This paper presents a training method for neural networks and the employment of MSE (mean scare error) values as the basis of a decision regarding the identity claim of a speaker in a recurrent neural networks based speaker verification system. Recurrent neural networks (RNNs) are employed to capture temporally dynamic characteristics of speech signal. In the process of supervised learning for RNNs, target outputs are automatically generated and the generated target outputs are made to represent the temporal variation of input speech sounds. To increase the capability of discriminating between the true speaker and an impostor, a discriminative training method for RNNs is presented. This paper shows the use and the effectiveness of the MSE value, which is obtained from the Euclidean distance between the target outputs and the outputs of networks for test speech sounds of a speaker, as the basis of speaker verification. In terms of equal error rates, results of experiments, which have been performed using the Korean speech database, show that the proposed speaker verification system exhibits better performance than a conventional hidden Markov model based speaker verification system.

Keywords

References

  1. S. Furui, "An overview of speaker recognition technology," ESCA workshop on automatic speaker recognition, identification and verification, pp.1-9, Apr. 1994
  2. Ben Gold, Nelson Morgan, Speech and audio signal processing, processing and perception of speech and music, John wiley & Sons, Inc., 2000
  3. D.A. Reynolds, 'Speaker identification and verification using Gaussian mixture speaker models,' Speech communication, pp. 91-108, 1995 https://doi.org/10.1016/0167-6393(95)00009-D
  4. B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, 'Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,' IEEE Transactions on speech and audio processing, 13(4), pp. 575-582, July 2005 https://doi.org/10.1109/TSA.2005.848892
  5. Ale\v{s} Padrta, Vlasta Radov\'{a}, "On the background model construction for speaker verification using GMM," LNCS 3206, pp. 425-432, 2004 https://doi.org/10.1007/b100511
  6. C.O. Dumitru, I. Gavat, R. Vieru, 'Speaker verification using HMM for Romanian language,' 48th International Symposium ELMAR-2006 focused on multimedia signal processing and communications, pp. 131-134, June 2006 https://doi.org/10.1109/ELMAR.2006.329532
  7. H-S. Liou, R. Mammone, 'A subword neural tree network approach to text-dependent speaker verification', in ICASSP, IEEE, 1995 https://doi.org/10.1109/ICASSP.1995.479595
  8. M.D. Richard , R.P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, 3, pp. 461-483, 1991 https://doi.org/10.1162/neco.1991.3.4.461
  9. Simon Haykin, Neural networks, a comprehensive foundation, 2nd ed., Prentice-Hall, Inc., pp. 635-789. 1999
  10. D.P. Delacretaz, J. Hennebert, 'Text-prompted speaker verification experiments with phoneme specific MLPs,' in ICASSP, IEEE, 2, pp. 777-780, 12-15 May 1998 https://doi.org/10.1109/ICASSP.1998.675380
  11. C.S. Gupta, S.R. Mahadeva Prasanna, B. Yegnanarayana, 'Autoassociative neural network models for online speaker verification using source features from vowels', in Proc. of IJCNN '02, 2, pp. 1252-1257, 12-17 May 2002
  12. J.M. Naik, D.M. Lubensky, 'A hybrid HMM-MLP speaker verification algorithm for telephone speech,' in ICASSP, IEEE, 1, pp. I/153-I/156, 19-22 April 1994 https://doi.org/10.1109/ICASSP.1994.389332
  13. M.F. Benzeghiba, H. Bourlard, 'Hybrid HMM/ANN and GMM combination for user-customized password speaker verification,' in ICASSP, IEEE, 2, pp. II/225- II/228, 6-10 April 2003 https://doi.org/10.1109/ICASSP.2003.1202335
  14. Y. Liu, M. Russell, M. Carey, 'The role of dynamic features in text-dependent and -independent speaker verification,' in ICASSP 2006, IEEE, pp. 669-672, 2006
  15. X. Wang, 'Text-dependent speaker verification using recurrent time delay neural networks for feature extraction,' in Proc. of IEEE-SP workshop neural networks for signal processing III '93, pp. 353-361, 6-9 Setp. 1993 https://doi.org/10.1109/NNSP.1993.471853
  16. J. R. Saeta, J. Hernando, 'Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification,' LNCS 3817, pp. 81-91, 2006 https://doi.org/10.1007/11613107_6
  17. R.J. Williams, D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, 1, pp. 270-280, 1989 https://doi.org/10.1162/neco.1989.1.2.270
  18. A.E. Rosenberg, J. DeLong, C.H. Lee, B.H. Juang, F.K. Soong, 'The use of cohort normalized scores for speaker verification,' In Proc. Int. Conf. on spoken language processing, Banff, Alberta, Canada, pp. 599-602, 1992
  19. L. Rabiner, B.H. Juang, Fundamentals of speech recognition, Prentice-Hall International, Inc., 1993
  20. Korean speech database CD-ROM, the Korean Language Engineering Center, 1998