Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

Kim, Tae-Hyung;

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Volume 34 Issue 3C
/
Pages.257-267
/
2009
/
1226-4717(pISSN)
/
2287-3880(eISSN)

The Korean Institute of Commucations and Information Sciences (한국통신학회)

Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

Kim, Tae-Hyung

김태형 (국방과학연구소)

Published : 2009.03.31

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents a training method for neural networks and the employment of MSE (mean scare error) values as the basis of a decision regarding the identity claim of a speaker in a recurrent neural networks based speaker verification system. Recurrent neural networks (RNNs) are employed to capture temporally dynamic characteristics of speech signal. In the process of supervised learning for RNNs, target outputs are automatically generated and the generated target outputs are made to represent the temporal variation of input speech sounds. To increase the capability of discriminating between the true speaker and an impostor, a discriminative training method for RNNs is presented. This paper shows the use and the effectiveness of the MSE value, which is obtained from the Euclidean distance between the target outputs and the outputs of networks for test speech sounds of a speaker, as the basis of speaker verification. In terms of equal error rates, results of experiments, which have been performed using the Korean speech database, show that the proposed speaker verification system exhibits better performance than a conventional hidden Markov model based speaker verification system.

Keywords

References

S. Furui, "An overview of speaker recognition technology," ESCA workshop on automatic speaker recognition, identification and verification, pp.1-9, Apr. 1994
Ben Gold, Nelson Morgan, Speech and audio signal processing, processing and perception of speech and music, John wiley & Sons, Inc., 2000
D.A. Reynolds, 'Speaker identification and verification using Gaussian mixture speaker models,' Speech communication, pp. 91-108, 1995 https://doi.org/10.1016/0167-6393(95)00009-D
B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, 'Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,' IEEE Transactions on speech and audio processing, 13(4), pp. 575-582, July 2005 https://doi.org/10.1109/TSA.2005.848892
Ale\v{s} Padrta, Vlasta Radov\'{a}, "On the background model construction for speaker verification using GMM," LNCS 3206, pp. 425-432, 2004 https://doi.org/10.1007/b100511
C.O. Dumitru, I. Gavat, R. Vieru, 'Speaker verification using HMM for Romanian language,' 48th International Symposium ELMAR-2006 focused on multimedia signal processing and communications, pp. 131-134, June 2006 https://doi.org/10.1109/ELMAR.2006.329532
H-S. Liou, R. Mammone, 'A subword neural tree network approach to text-dependent speaker verification', in ICASSP, IEEE, 1995 https://doi.org/10.1109/ICASSP.1995.479595
M.D. Richard , R.P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Computation, 3, pp. 461-483, 1991 https://doi.org/10.1162/neco.1991.3.4.461
Simon Haykin, Neural networks, a comprehensive foundation, 2nd ed., Prentice-Hall, Inc., pp. 635-789. 1999
D.P. Delacretaz, J. Hennebert, 'Text-prompted speaker verification experiments with phoneme specific MLPs,' in ICASSP, IEEE, 2, pp. 777-780, 12-15 May 1998 https://doi.org/10.1109/ICASSP.1998.675380
C.S. Gupta, S.R. Mahadeva Prasanna, B. Yegnanarayana, 'Autoassociative neural network models for online speaker verification using source features from vowels', in Proc. of IJCNN '02, 2, pp. 1252-1257, 12-17 May 2002
J.M. Naik, D.M. Lubensky, 'A hybrid HMM-MLP speaker verification algorithm for telephone speech,' in ICASSP, IEEE, 1, pp. I/153-I/156, 19-22 April 1994 https://doi.org/10.1109/ICASSP.1994.389332
M.F. Benzeghiba, H. Bourlard, 'Hybrid HMM/ANN and GMM combination for user-customized password speaker verification,' in ICASSP, IEEE, 2, pp. II/225- II/228, 6-10 April 2003 https://doi.org/10.1109/ICASSP.2003.1202335
Y. Liu, M. Russell, M. Carey, 'The role of dynamic features in text-dependent and -independent speaker verification,' in ICASSP 2006, IEEE, pp. 669-672, 2006
X. Wang, 'Text-dependent speaker verification using recurrent time delay neural networks for feature extraction,' in Proc. of IEEE-SP workshop neural networks for signal processing III '93, pp. 353-361, 6-9 Setp. 1993 https://doi.org/10.1109/NNSP.1993.471853
J. R. Saeta, J. Hernando, 'Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification,' LNCS 3817, pp. 81-91, 2006 https://doi.org/10.1007/11613107_6
R.J. Williams, D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, 1, pp. 270-280, 1989 https://doi.org/10.1162/neco.1989.1.2.270
A.E. Rosenberg, J. DeLong, C.H. Lee, B.H. Juang, F.K. Soong, 'The use of cohort normalized scores for speaker verification,' In Proc. Int. Conf. on spoken language processing, Banff, Alberta, Canada, pp. 599-602, 1992
L. Rabiner, B.H. Juang, Fundamentals of speech recognition, Prentice-Hall International, Inc., 1993
Korean speech database CD-ROM, the Korean Language Engineering Center, 1998

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Training Method and Speaker Verification Measures for Recurrent Neural Network based Speaker Verification System

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)