Browse > Article

GMM-Based Gender Identification Employing Group Delay  

Lee, Kye-Hwan (인하대학교 전자전기공학부)
Lim, Woo-Hyung (서울대 전기컴퓨터공학부)
Kim, Nam-Soo (서울대 전기컴퓨터공학부)
Chang, Joon-Hyuk (인하대학교 전자전기공학부)
Abstract
We propose an effective voice-based gender identification using group delay(GD) Generally, features for speech recognition are composed of magnitude information rather than phase information. In our approach, we address a difference between male and female for GD which is a derivative of the Fourier transform phase. Also, we propose a novel way to incorporate the features fusion scheme based on a combination of GD and magnitude information such as mel-frequency cepstral coefficients(MFCC), linear predictive coding (LPC) coefficients, reflection coefficients and formant. The experimental results indicate that GD is effective in discriminating gender and the performance is significantly improved when the proposed feature fusion technique is applied.
Keywords
Gender identification; Group delay; MFCC; GMM;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 S. Slomka, and S. Sridharan, 'Automatic gender identification optimised for language independence,' IEEE TENCON Speech and Image Technologies for Computing and Telecommunications, 1 145-148, Dec. 1997
2 E. S. Parris and M. J. Carey, 'Language independent gender identification,' International Conference on Acoustics, Speech and Signal Processing, 2 685-688, May 1996
3 A. V. Oppenheim and J. S. Lim, 'The importance of phase in signals,' IEEE, 69 529-541, May 1981   DOI   ScienceOn
4 S. B. Davis and P. Mermelstein, 'Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,' IEEE Transactions on Acoustics, Speech, and Signal Processing, 28 (4) 357-366, Aug. 1980   DOI
5 H. A. Murthy, K. V. Madhu Murthy and B. Yegnanarayana, 'Formant extraction from phase using weighted group delay function,' IEE Electronics Letters, 25 (23) 1609-1611, Nov. 1989   DOI   ScienceOn
6 N. S. Kim and J.-H. Chang, 'Spectral enhancement based on global soft decision,' IEEE Signal Processing Letters, 7 (5) 108-110, May 2000   DOI   ScienceOn
7 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, (Englewood Cliffs, Prentice-Hall. 1975)
8 B. Yegnanarayana and H. A. Murthy, 'Significance of group delay functions in spectrum estimation,' IEEE Transactions on Signal Processing, 40 (9) 2281-2289, Sep. 1992   DOI   ScienceOn
9 K. K. Paliwal and L. D. Alsteris, 'On the usefulness of STFT phase spectrum in human listening tests,' Speech Communication, 17 (3) 578-616, May 2007
10 S. Ramamohan and S. Dandapat, 'Sinusoidal model-based analysis and classification of stressed speech', IEEE Transactions on Audio and Language Processing, 14 (3) 737-746, May 2006   DOI   ScienceOn
11 M. R. Schroeder, 'Models of hearing,' IEEE, 63 (9) 1332-1350, Sep, 1975   DOI   ScienceOn
12 B. Yegnanarayana, D. K. Saikia and T. R. Krishnan, 'Significance of group delay function in signal reconstruction from spectral magnitude or phase', IEEE Transactions on Acoustics, Speech, and Signal Processing, 32 (3) 610-623, Jun. 1984   DOI
13 G. Xuan, W. Zhang and P. Chai, 'EM algorithm of gaussian mixture model and hidden Markov model.' International Conference on Image Processing, 1 145-148, Oct. 2001
14 이계환, 강상익, 김덕환, 장준혁, '음성신호 기바의 성별인식을 위한 Support vector machines의 적용.' 한국음향학회지. 26 (2) 75-79, 2007   과학기술학회마을
15 H. A. Murthy and B. Yegnanarayana, 'Formant extraction from group delay function,' Speech Communication, 10 (3) 209-221, Aug. 1991   DOI   ScienceOn
16 H. Harb and L. Chen, 'Voice-based gender identification in multimedia applications,' Intelligent Information System, 24 179-198, May 2005   DOI   ScienceOn
17 H. Hermansky, 'Perceptual linear predictive (PLP) analysis of speech,' Journal of Acoustic Society of America, 87 (4) 1738-1752, Apr. 1990   DOI
18 B. Yegnanarayana, 'Non-spectral features for speech processing', Tutorial presentation at INTERSPEECH, Sep. 2006
19 L. D. Alsteris and K. K. Paliwal, 'Further intelligibility results from human listening tests using the short-time phase spectrum,' Speech Communication, 48 (6) 727-736, Jun. 2006   DOI   ScienceOn
20 K. K. Paliwal and L. Alsteris, 'Usefulness of phase spectrum in human speech perception: EUROSPEECH, 2117-2120, Sep. 2003
21 L. Liu, J. He and G. Palm, 'Effects of phase on the perception of intervocalic stop consonants,' Speech Communication, 22 (4) 403-417, Sep, 1997   DOI   ScienceOn
22 Y. K. Muthusamy, R. A. Cole and B. T. Oshika, 'The OGI multi-language telephone speech corpus,' International Conference on Spoken Language Processing, 2 895-898, Oct. 1992
23 R. Smits and B. Yegnanarayana, 'Determination of instants of significant excitation in speech using group delay function,' IEEE Transactions on Speech and Audio Processing, 3 (5) 325-333, Sep. 1995   DOI   ScienceOn
24 H. Harb and L. Chen, 'Gender identification using a general audio classifier,' IEEE International Conference, 2 733-736, July 2003
25 L. D. Alsteris and K. K. Paliwal, 'Importance of window shape for phase-only reconstruction of speech,' IEEE International Conference on Acoustics, Speech, and Signal Processing, 11-573-576, May 2004