Quantitative Measure of Speaker Specific Information in Human Voice: From the Perspective of Information Theoretic Approach

정보이론 관점에서 음성 신호의 화자 특징 정보를 정량적으로 측정하는 방법에 관한 연구

  • Kim Samuel (Department of Electrical and Electronic Eng., Yonsei University) ;
  • Seo Jung Tae (Information Control Engineering, Chungju University) ;
  • Kang Hong Goo (Department of Electrical and Electronic Eng., Yonsei University)
  • Published : 2005.03.01

Abstract

A novel scheme to measure the speaker information in speech signal is proposed. We develope the theory of quantitative measurement of the speaker characteristics in the information theoretic point of view, and connect it to the classification error rate. Homomorphic analysis based features, such as mel frequency cepstral coefficient (MFCC), linear prediction cepstral coefficient (LPCC), and linear frequency cepstral coefficient (LFCC) are studied to measure speaker specific information contained in those feature sets by computing mutual information. Theories and experimental results provide us quantitative measure of speaker information in speech signal.

Keywords

References

  1. D. A. Reynolds, 'Speaker identification and verification using Gaussian mixture models,' Speech Communication, 17, 91-108, 1995 https://doi.org/10.1016/0167-6393(95)00009-D
  2. S. Furui, 'Cepstral analysis technique for automatic speaker verification,' IEEE Transaction on speech and audio processing, ASSP-29 (2), 254-272, Apr. 1981
  3. R. Battiti, 'Using Mutual Information for Selecting Features in Supervised Neural Net Learning,' in IEEE Transactions on Neural Networks, 5 (4), 537-550, 1994 https://doi.org/10.1109/72.298224
  4. N. Kwak and C.-H. Choi, 'Input Feature Selection by Mutual Information Based on Parzen Windows,' IEEE Transactions on Pattern Analysis and Machine Intelligence, 24 (12), 1667-1671, 2002 https://doi.org/10.1109/TPAMI.2002.1114861
  5. B. J. Lee, S. Kim, and H. G. Kang, 'Speaker recognition based on transformed line spectral frequencies,' Submitted to Intenational Symposium on Intelligent Signal Processing and Communication System, 2004
  6. T. F. Quatieri, Discrete time speech signal processing, Prentice Hall, 2002
  7. D. A. Reynolds and R. C. Rose, 'Robust text independent speaker identification using Gaussian mixture models,' IEEE Trans. Speech and Audio Processing, 3, 72-83, 1995 https://doi.org/10.1109/89.365379
  8. B.-H. Juang, L. R. Rabiner, J. G. Wilpon, 'On the use of bandpass littering in speech recognition,' IEEE Transaction on acoustic, speech, and signal processing, ASSP-35 (7), 947-954, July 1987
  9. W. Verhelst and O. Steenhaut, 'A new model for the short-time complex cepstrum of voiced speech,' IEEE Transaction on acoustic, speech, and signal processing, ASSP-34, 43-51, Feb. 1986
  10. D. A. Reynold et al, 'The SuperSID project: exploiting high-level information for high-accuracy speaker recognition,' Proc. Internat. Conf. Acoust. Speech Signal Process., 784-787, 2003
  11. T. M. Cover, and J. A. Thomas, Elements of information theory, (Wiely, 1991)
  12. T. Eriksson, S. Kim, and H.-G. Kang, 'Theory for speaker recognition over IP,' submitted to Interspeech 2004 - ICSLP, Apr. 2004
  13. J. P. Campbell, 'Testing with the YOHO CD-ROM voice verification,' Proc. Internat. Conf. Acoust. Speech Signal Process., 341-344, May 1995