[KSCI] Korea Science Citation Index Service

A New Power Spectrum Warping Approach to Speaker Warping

유일수 (성균관대학교)
김동주 (성균관대학)
노용완 (성균관대학)
홍광석 (성균관대학교)

Publication Information

Journal of the Institute of Electronics Engineers of Korea SP / v.41, no.4, 2004 , pp. 103-111 More about this Journal

Abstract

The method of speaker normalization has been known as the successful method for improving the accuracy of speech recognition at speaker independent speech recognition system. A frequency warping approach is widely used method based on maximum likelihood for speaker normalization. This paper propose a new power spectrum warping approach to making improvement of speaker normalization better than a frequency warping. Th power spectrum warping uses Mel-frequency cepstrum analysis(MFCC) and is a simple mechanism to performing speaker normalization by modifying the power spectrum of Mel filter bank in MFCC. Also, this paper propose the hybrid VTN combined the Power spectrum warping and a frequency warping. Experiment of this paper did a comparative analysis about the recognition performance of the SKKU PBW DB applied each speaker normalization approach on baseline system. The experiment results have shown that a frequency warping is 2.06%, the power spectrum is 3.06%, and hybrid VTN is 4.07% word error rate reduction as of word recognition performance of baseline system.

Keywords

Speaker Normalization; Power spectrum warping; Frequency warping; MFCC;

Citations & Related Records

Reference

1	J.S. Youn, K. W. Chung and K.S. Hong, 'A Continuous Digit Speech Recognition Applied Vowel Sequence and VCCV Unit HMM', Proceeding of the Acoustical Society of Korea, Vol. 20, No.2, 2001
2	T.D. Rossing, P. Wheeler and F.R. Moore, 'The Science of Sound' , Addition Wesley, 2002
3	R. Roth et al, 'Dragon systems' 1994 Large Vocabulary Continuous Speech Recognizer' , in Proc. Spoken Language Systems Technology Workshop, 1995
4	L. Lee and R. C. Rose, 'A Frequency Warping Approach to Speaker Normalization', IEEE Trans. on Speech and Audio Processing, Vol.6, NO.1, pp.49-60. Jan., 1998 DOI ScienceOn
5	L. Welling, H. Ney, S. Kanthak, 'Speaker Adaptive Modeling by Vocal Tract Normalization' , IEEE Transaction on Speech and Audio Processing, Vol. 10, No.6, September 2002 DOI ScienceOn
6	A. Andreou, T. Kam, and J. Cohen, 'Experiments in Vocal Tract Normalization' , in Proc. CAIP Workshop: Frontiers in Speech Recognition II, 1994
7	Michael Seltzer, 'SPHINX Ill Signal Processing Front End Specification' , CMU Speech Group, August 1999
8	Y. Linde, A. Duzo, R. M. Gray, 'An Algorithm for Vector Quantizer Design' , IEEE Transaction on COM., Vol. 28, January 1980

KSCI

A New Power Spectrum Warping Approach to Speaker Warping 화자 정규화를 위한 새로운 파워 스펙트럼 Warping 방법

A New Power Spectrum Warping Approach to Speaker Warping