[KSCI] Korea Science Citation Index Service

Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

Kim, Tae-Hyung (국방과학연구소)

Publication Information

The Journal of Korean Institute of Communications and Information Sciences / v.34, no.3C, 2009 , pp. 257-267 More about this Journal

Abstract

This paper presents a training method for neural networks and the employment of MSE (mean scare error) values as the basis of a decision regarding the identity claim of a speaker in a recurrent neural networks based speaker verification system. Recurrent neural networks (RNNs) are employed to capture temporally dynamic characteristics of speech signal. In the process of supervised learning for RNNs, target outputs are automatically generated and the generated target outputs are made to represent the temporal variation of input speech sounds. To increase the capability of discriminating between the true speaker and an impostor, a discriminative training method for RNNs is presented. This paper shows the use and the effectiveness of the MSE value, which is obtained from the Euclidean distance between the target outputs and the outputs of networks for test speech sounds of a speaker, as the basis of speaker verification. In terms of equal error rates, results of experiments, which have been performed using the Korean speech database, show that the proposed speaker verification system exhibits better performance than a conventional hidden Markov model based speaker verification system.

Keywords

Speaker verification; RNN; Neural networks learning; Discriminative training; HMM;

Citations & Related Records

Reference

1	Simon Haykin, Neural networks, a comprehensive foundation, 2nd ed., Prentice-Hall, Inc., pp. 635-789. 1999
2	J.M. Naik, D.M. Lubensky, 'A hybrid HMM-MLP speaker verification algorithm for telephone speech,' in ICASSP, IEEE, 1, pp. I/153-I/156, 19-22 April 1994 DOI
3	R.J. Williams, D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, 1, pp. 270-280, 1989 DOI
4	C.S. Gupta, S.R. Mahadeva Prasanna, B. Yegnanarayana, 'Autoassociative neural network models for online speaker verification using source features from vowels', in Proc. of IJCNN '02, 2, pp. 1252-1257, 12-17 May 2002
5	Y. Liu, M. Russell, M. Carey, 'The role of dynamic features in text-dependent and -independent speaker verification,' in ICASSP 2006, IEEE, pp. 669-672, 2006
6	C.O. Dumitru, I. Gavat, R. Vieru, 'Speaker verification using HMM for Romanian language,' 48th International Symposium ELMAR-2006 focused on multimedia signal processing and communications, pp. 131-134, June 2006 DOI
7	H-S. Liou, R. Mammone, 'A subword neural tree network approach to text-dependent speaker verification', in ICASSP, IEEE, 1995 DOI
8	M.F. Benzeghiba, H. Bourlard, 'Hybrid HMM/ANN and GMM combination for user-customized password speaker verification,' in ICASSP, IEEE, 2, pp. II/225- II/228, 6-10 April 2003 DOI
9	B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, 'Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,' IEEE Transactions on speech and audio processing, 13(4), pp. 575-582, July 2005 DOI ScienceOn
10	Korean speech database CD-ROM, the Korean Language Engineering Center, 1998
11	A.E. Rosenberg, J. DeLong, C.H. Lee, B.H. Juang, F.K. Soong, 'The use of cohort normalized scores for speaker verification,' In Proc. Int. Conf. on spoken language processing, Banff, Alberta, Canada, pp. 599-602, 1992
12	Ben Gold, Nelson Morgan, Speech and audio signal processing, processing and perception of speech and music, John wiley & Sons, Inc., 2000
13	L. Rabiner, B.H. Juang, Fundamentals of speech recognition, Prentice-Hall International, Inc., 1993
14	Ale $\v{s}$ Padrta, Vlasta Radov $\$ , "On the background model construction for speaker verification using GMM," LNCS 3206, pp. 425-432, 2004 DOI
15	S. Furui, "An overview of speaker recognition technology," ESCA workshop on automatic speaker recognition, identification and verification, pp.1-9, Apr. 1994
16	M.D. Richard , R.P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, 3, pp. 461-483, 1991 DOI
17	D.P. Delacretaz, J. Hennebert, 'Text-prompted speaker verification experiments with phoneme specific MLPs,' in ICASSP, IEEE, 2, pp. 777-780, 12-15 May 1998 DOI
18	X. Wang, 'Text-dependent speaker verification using recurrent time delay neural networks for feature extraction,' in Proc. of IEEE-SP workshop neural networks for signal processing III '93, pp. 353-361, 6-9 Setp. 1993 DOI
19	J. R. Saeta, J. Hernando, 'Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification,' LNCS 3817, pp. 81-91, 2006 DOI ScienceOn
20	D.A. Reynolds, 'Speaker identification and verification using Gaussian mixture speaker models,' Speech communication, pp. 91-108, 1995 DOI ScienceOn