Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition

남성과 여성의 음성 특징 비교 및 성별 음성인식에 의한 인식 성능의 향상

  • 이창영 (동서대학교 정보시스템공학부)
  • Received : 2010.10.01
  • Accepted : 2010.12.10
  • Published : 2010.12.31

Abstract

In an effort to improve the speech recognition rate, we investigated performance comparison between speaker-independent and gender-specific speech recognitions. For this purpose, 20 male and 20 female speakers each pronounced 300 isolated Korean words and the speeches were divided into 4 groups: female, male, and two mixed genders. To examine the validity for the gender-specific speech recognition, Fourier spectrum and MFCC feature vectors averaged over male and female speakers separately were examined. The result showed distinction between the two genders, which supports the motivation for the gender-specific speech recognition. In experiments of speech recognition rate, the error rate for the gender-specific case was shown to be less than50% compared to that of the speaker-independent case. From the obtained results, it might be suggested that hierarchical recognition of gender and speech recognition might yield better performance over the current method of speech recognition.

음성인식에서의 인식률 향상을 위한 노력의 일환으로서, 본 논문에서는 성별을 구분하지 않는 일반적 화자독립 음성인식과 성별에 따른 음성인식의 성능을 비교하는 연구를 수행하였다. 실험을 위해 남녀 각 20명의 화자로 하여금 각각 300단어를 발성하게 하고, 그 음성 데이터를 여성/남성/혼성A/혼성B의 네 그룹으로 나누었다. 우선, 성별 음성인식에 대한 근거의 타당성을 파악하기 위하여 음성 신호의 주파수 분석 및 MFCC 특징벡터들의 성별 차이를 조사하였다. 그 결과, 성별 음성인식의 동기를 뒷받침할 정도의 두드러진 성별 차이가 확인되었다. 음성인식을 수행한 결과, 성을 구분하지 않는 일반적인 화자독립의 경우에 비해 성별 음성인식에서의 오류율이 절반 이하로 떨어지는 것으로 나타났다. 이로부터, 성 인식과 성별 음성인식을 계층적으로 수행함으로써 화자독립의 인식률을 높일 수 있을 것으로 사료된다.

Keywords

References

  1. G. Kaplan, "Words Into Action I," IEEE Spectrum, Vol. 17, pp. 22-26,
  2. K. H. Davis, R. Biddulph, and S. Balashek, "Automatic Recognition of Spoken Digits," J. Acoust. Soc. Am., Vol. 24, No. 6, pp. 637-642, 1952.
  3. B. H. Juang & L. R. Rabiner, "Automatic Speech Recognition - A Brief History of the Technology Development," Encyclopedia of Language and Linguistics, 2nd Ed., Elsevier, 2005.
  4. L. Rabiner & B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, New Jersey, pp. 485-486, 1993.
  5. Z. Bo, L. Juan, P. Gang, & W. Wang, "A High Performance Mandarin Digit Recognizer," Fifth International Symposium on Signal Processing and Its Applications, Vol. 2, pp. 629-632, 1999
  6. O. Deshmukh, C. Y. Espy-Wilson, & A. Juneja, "Acoustic-Phonetic Speech Parameters for Speaker-Independent Speech Recognition," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 593-596, 2002.
  7. J. G. Wilpon & C. N. Jacobsen, "A Study of Speech Recognition for Children and the Elderly," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 349-352, 1996.
  8. S. Yildirim & S. S. Narayanan, "An Information- Theoretic Analysis of Developmental Changes in Speech," International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 480-483, 2003.
  9. I. Kudo, T. Nakama, & T. Watanabe, "An Estimation of Speaker Sampling in Voice Across Japan Database," International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 825-828, 1996.
  10. P. Dutta & A. Haubold, "Audio-Based Classification of Speaker Characteristics," 2009 International Conference on Multimedia and Expo (ICME), pp. 422-425, 2009.
  11. S. Deshpande, S. Chikkerur, & V. Govindaraju, "Accent Classification in Speech," Fourth IEEE Workshop on Automatic Identification Advanced Technologies, pp. 139-143, 2005.
  12. http://en.wikipedia.org/wiki/File:Tenor.png & http://en.wikipedia.org/wiki/File:Sopran.png.
  13. R. Muralishankar & D. O'Shaughnessy, "A Comprehensive Analysis of Noise Robust Speech Features Extracted from All-Pass Based Warping with MFCC in a Noisy Phoneme Recognition," The Third International Conference on Digital Communications (ICDT), pp. 180-185, 2008.
  14. I. Gavat & C. O. Dumitru, "ASR for Romanian Language," 14th International Workshop on Systems, Signals, and Image Processing (IWSSIP), pp. 300-303, 2007.
  15. G. Tzanetakis, "Audio-Based Gender Identification Using Bootstrapping," IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 432-433, 2005.
  16. H. Kim, K. Bae, & H. Yoon, "Age and Gender Classification for a Home-Robot Service," The 16th International Symposium on Robot and Human Interactive Communication, pp. 122-126, 2007.
  17. T. Bocklet, A. Maier, J. Bauer, F. Burkhardt, & E. Noth, "Age and Gender Recognition for Telephone Applications Based on GMM Supervectors and Support Vector Machines," International Conference on Acoustics, Speech, and Signal Processing, pp. 1605-1608, 2008.
  18. X. Zhao, D. O'Shaughnessy, & N. Minh- Quang, "A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches," International Symposium on Signals, Systems, and Electronics (ISSSE), pp. 59-62, 2007.
  19. J. R. Deller, J. G. Proakis, & J. H. L. Hansen, "Discrete-Time Processing of Speech Signals," Macmillan, New York, pp. 143-145, 1994.
  20. J. Wang, J.-F. Wang, & Y. Weng, "Chip Design of MFCC Extraction For Speech Recognition," The VLSI Journal, vol. 32, pp. 111-131, 2002. https://doi.org/10.1016/S0167-9260(02)00045-7
  21. W. Xu, et. al., "A Noise Robust Front-End Using Wiener Filter, Probability Model and CMS for ASR," International Conference on Natural Language Processing and Knowledge Engineering, pp. 102-105, 2005.
  22. M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar, "Unconstrained Farsi Handwritten Word Recognition Using Fuzzy Vector Quantization and Hidden Markov models," Pattern Recognition Letters, vol. 22, pp. 209-214, 2001. https://doi.org/10.1016/S0167-8655(00)00090-8
  23. L. Fausett, "Fundamentals of Neural Networks," Prentice-Hall, New Jersey, p. 298, 1994.