DOI QR코드

DOI QR Code

Voice-Based Gender Identification Employing Support Vector Machines

음성신호 기반의 성별인식을 위한 Support Vector Machines의 적용

  • 이계환 (인하대학교 전자전기공학부) ;
  • 강상익 (인하대학교 전자전기공학부) ;
  • 김덕환 (인하대학교 전자전기공학부) ;
  • 장준혁 (인하대학교 전자전기공학부)
  • Published : 2007.02.28

Abstract

We propose an effective voice-based gender identification method using a support vector machine(SVM). The SVM is a binary classification algorithm that classifies two groups by finding the voluntary nonlinear boundary in a feature space and is known to yield high classification performance. In the present work, we compare the identification performance of the SVM with that of a Gaussian mixture model(GMM) using the mel frequency cepstral coefficients(MFCC). A novel means of incorporating a features fusion scheme based on a combination of the MFCC and pitch is proposed with the aim of improving the performance of gender identification using the SVM. Experiment results indicate that the gender identification performance using the SVM is significantly better than that of the GMM. Moreover, the performance is substantially improved when the proposed features fusion technique is applied.

본 논문은 SVM(Support Vector Machines)을 이용한 음성신호 기반의 효과적인 성별인식 시스템을 제안한다. 분별적 이진(binary) 패턴 분류기인 SVM은 특징 공간에서 비선형 경계를 찾아 분류하는 방법으로 우수한 성능을 보인다고 알려져 있다. 연구에서는 기존의 성별인식에서 널리 쓰이고 있는 MFCC(Mel Frequency Cepstral Coefficients)를 사용하여 SVM과 기존의 GMM(Gaussian Mixture Model) 알고리즘의 성별인식 성능을 비교하였고, 특히, 보다 향상된 SVM의 성별인식을 위해 MFCC와 Pitch를 이용한 결합 특징 벡터를 적용하였다. 실험결과 MFCC 파라미터를 사용했을 때 제안된 SVM이 GMM보다 우수한 성별인식 성능을 보였고, 제안된 결합 특징 벡터를 사용 했을 때 우수한 성능을 보였다.

Keywords

References

  1. H. Harb, L. Chen, 'Voice-based gender identifivation in multimedia applications,' Journal of Intelligent Information System, 24 179-198, May 2005 https://doi.org/10.1007/s10844-005-0322-8
  2. M. Wald, 'Using automatic speech recognition to enhance education for all student : Turning a vision into reality', 34th ASEE/IEEE 'Frontiers in Education' Conference S3G, 22-25, Oct. 2004
  3. E. S. Parris and M. J Carey, 'Language independent gender identification,' 1996 International Conference on Acoustics, Speech and Signal Processing, 2 685-688, May 1996
  4. H. Harb and L. Chen, 'Gender identification using a general audio classifier,' In Proceeding of IEEE 2003 International Conference, 2 733-736, July 2003
  5. S, Slomka, and S, Sridharan, 'Automatic gender identification optimised for language independence,' In Proceeding of IEEE TENCON - Speech and Image Technologies for Computing and Telecommunications, 1 145-148, Dec, 1997
  6. K, R. Farrell and R. J. Mamrnone, Data fusion techniques for speaker recognition, In R. V. Ramachandran and R. J. Mamrnone, editors, Modern Methods of Speech Processing, chapter 12 279--297, Kluwer Academic Publshers, Boston, Massachusetts, 1995
  7. G. Xuan, W. Zhang and P. Chai, 'EM algorithms of gaussian mixture model and hidden markov model,' In Proceeding of International Conference on Image Processing, 1 145-148, October 2001
  8. V. N Vapnic, 'An overview of statistical learning theory,' IEEE Transactions on Neural Networks, 10 (5) 988-999, Sept. 1999 https://doi.org/10.1109/72.788640
  9. B. Boser, I. Guyon, and V. N. Vapnik, 'A training algorithm for optimal margin classifiers,' In Proceeding of 5th Annu. Wkshp. Comput. Learning Theory. Pittsburgh, PA : ACM, 144-152, 1992
  10. J. C. Palatt , Advances in kernel methods - Support vector learning, (MIT Press, February 1999)
  11. J. Ramirez, P. Yelamos, J. M. Gorriz, J. C Segura and L.Garia, 'Speech/Non-speech discrimination combining advanced feature extraction and SVM learning,' In Proceeding of the INTERSPEECH'2006 International Conference on Spoken Language Processing, 1662-1665, Pittsburgh, Sept. 2006
  12. N. S Kim and J. -H. Chang, 'Spectral enhancement based on global soft decision,' IEEE Signal Processing Letters, 7 (5) 108-110, May 2000 https://doi.org/10.1109/97.841154
  13. W. C. Chu, Speech coding algorithms : Foundation and evolution of standardized coders, (John vviley & Sons, INC., May 2003) chapter 2, pp, 33-43
  14. Y. K. Muthusamy, R. A. Cole and B. T. Oshika, 'The OGI multi-language telephone speech corpus,' In Proceeding of the 1992 International Conference on Spoken Language Processing, 2 895-898, October 1992
  15. J. Ramirez, P. Yelamos, J. M. Gorriz and J. C Segura, 'SVM-based speech endpoint detection using contextual speech features,' lEE Electronics Letters, 42 (7) 426-428, Mar. 2006 https://doi.org/10.1049/el:20064068