Browse > Article

Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System  

Kim, Tae-Hyung (국방과학연구소)
Abstract
This paper presents a training method for neural networks and the employment of MSE (mean scare error) values as the basis of a decision regarding the identity claim of a speaker in a recurrent neural networks based speaker verification system. Recurrent neural networks (RNNs) are employed to capture temporally dynamic characteristics of speech signal. In the process of supervised learning for RNNs, target outputs are automatically generated and the generated target outputs are made to represent the temporal variation of input speech sounds. To increase the capability of discriminating between the true speaker and an impostor, a discriminative training method for RNNs is presented. This paper shows the use and the effectiveness of the MSE value, which is obtained from the Euclidean distance between the target outputs and the outputs of networks for test speech sounds of a speaker, as the basis of speaker verification. In terms of equal error rates, results of experiments, which have been performed using the Korean speech database, show that the proposed speaker verification system exhibits better performance than a conventional hidden Markov model based speaker verification system.
Keywords
Speaker verification; RNN; Neural networks learning; Discriminative training; HMM;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Simon Haykin, Neural networks, a comprehensive foundation, 2nd ed., Prentice-Hall, Inc., pp. 635-789. 1999
2 J.M. Naik, D.M. Lubensky, 'A hybrid HMM-MLP speaker verification algorithm for telephone speech,' in ICASSP, IEEE, 1, pp. I/153-I/156, 19-22 April 1994   DOI
3 R.J. Williams, D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, 1, pp. 270-280, 1989   DOI
4 C.S. Gupta, S.R. Mahadeva Prasanna, B. Yegnanarayana, 'Autoassociative neural network models for online speaker verification using source features from vowels', in Proc. of IJCNN '02, 2, pp. 1252-1257, 12-17 May 2002
5 Y. Liu, M. Russell, M. Carey, 'The role of dynamic features in text-dependent and -independent speaker verification,' in ICASSP 2006, IEEE, pp. 669-672, 2006
6 C.O. Dumitru, I. Gavat, R. Vieru, 'Speaker verification using HMM for Romanian language,' 48th International Symposium ELMAR-2006 focused on multimedia signal processing and communications, pp. 131-134, June 2006   DOI
7 H-S. Liou, R. Mammone, 'A subword neural tree network approach to text-dependent speaker verification', in ICASSP, IEEE, 1995   DOI
8 M.F. Benzeghiba, H. Bourlard, 'Hybrid HMM/ANN and GMM combination for user-customized password speaker verification,' in ICASSP, IEEE, 2, pp. II/225- II/228, 6-10 April 2003   DOI
9 B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, 'Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,' IEEE Transactions on speech and audio processing, 13(4), pp. 575-582, July 2005   DOI   ScienceOn
10 Korean speech database CD-ROM, the Korean Language Engineering Center, 1998
11 A.E. Rosenberg, J. DeLong, C.H. Lee, B.H. Juang, F.K. Soong, 'The use of cohort normalized scores for speaker verification,' In Proc. Int. Conf. on spoken language processing, Banff, Alberta, Canada, pp. 599-602, 1992
12 Ben Gold, Nelson Morgan, Speech and audio signal processing, processing and perception of speech and music, John wiley & Sons, Inc., 2000
13 L. Rabiner, B.H. Juang, Fundamentals of speech recognition, Prentice-Hall International, Inc., 1993
14 Ale\v{s} Padrta, Vlasta Radov\, "On the background model construction for speaker verification using GMM," LNCS 3206, pp. 425-432, 2004   DOI
15 S. Furui, "An overview of speaker recognition technology," ESCA workshop on automatic speaker recognition, identification and verification, pp.1-9, Apr. 1994
16 M.D. Richard , R.P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, 3, pp. 461-483, 1991   DOI
17 D.P. Delacretaz, J. Hennebert, 'Text-prompted speaker verification experiments with phoneme specific MLPs,' in ICASSP, IEEE, 2, pp. 777-780, 12-15 May 1998   DOI
18 X. Wang, 'Text-dependent speaker verification using recurrent time delay neural networks for feature extraction,' in Proc. of IEEE-SP workshop neural networks for signal processing III '93, pp. 353-361, 6-9 Setp. 1993   DOI
19 J. R. Saeta, J. Hernando, 'Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification,' LNCS 3817, pp. 81-91, 2006   DOI   ScienceOn
20 D.A. Reynolds, 'Speaker identification and verification using Gaussian mixture speaker models,' Speech communication, pp. 91-108, 1995   DOI   ScienceOn