DOI QR코드

DOI QR Code

Speaker Recognition Performance Improvement by Voiced/Unvoiced Classification and Heterogeneous Feature Combination

유/무성음 구분 및 이종적 특징 파라미터 결합을 이용한 화자인식 성능 개선

  • Kang, Jihoon (Department of Electronics Engineering, Gyeongsang National University) ;
  • Jeong, Sangbae (Department of Electronics Engineering, Gyeongsang National University)
  • Received : 2014.04.28
  • Accepted : 2014.06.09
  • Published : 2014.06.30

Abstract

In this paper, separate probabilistic distribution models for voiced and unvoiced speech are estimated and utilized to improve speaker recognition performance. Also, in addition to the conventional mel-frequency cepstral coefficient, skewness, kurtosis, and harmonic-to-noise ratio are extracted and used for voiced speech intervals. Two kinds of scores for voiced and unvoiced speech are linearly fused with the optimal weight found by exhaustive search. The performance of the proposed speaker recognizer is compared with that of the conventional recognizer which uses mel-frequency cepstral coefficient and a unified probabilistic distribution function based on the Gassian mixture model. Experimental results show that the lower the number of Gaussian mixture, the greater the performance improvement by the proposed algorithm.

본 논문에서는 화자 인식의 성능을 개선하기 위해서 유성음 및 무성음에 대한 별도의 확률분포 모델링을 사용하였다. 또한, 종래의 멜-주파수 캡스트럼 계수 이외에 유성음 구간에서 추가적으로 왜도, 첨도, 하모닉 대 잡음비 등을 추출하여 활용하였다. 화자 인식을 위한 스코어는 유성음 및 무성음 확률분포 모델에서 각각 구해지는데 전수 조사방식에 의해서 최적의 스코어 결합 가중치가 결정되었다. 제안된 방식의 화자인식기의 성능은 종래의 멜-주파수 캡스트럼 계수 및 화자당 하나의 혼합 가우시안 기반 확률분포 모델링을 사용한 방식과 비교되었으며 실험 결과 제안된 방식이 가우시안 혼합의 수가 낮아질수록 더 큰 성능 향상을 얻음을 알 수 있었다.

Keywords

References

  1. T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Communication Vol. 52, No. 1, pp. 12-40, 2010. https://doi.org/10.1016/j.specom.2009.08.009
  2. N. Ahmed, "How I came up with the discrete cosine transform," Digital Signal Processing, Vol. 1, No. 1, pp. 4-9, 1991. https://doi.org/10.1016/1051-2004(91)90086-Z
  3. L. Rabiner and R. Schafer, Theory and Applications of Digital Speech Processing, Prentice Hall, 2010.
  4. W. Kleijin and K. Paliwal, Speech Coding and Synthesis, 2nd ed., Elsevier, 1998.
  5. C. Nikias and A. Petropulu, Higher-Order Spectra Analysis, Prentice Hall, 1993.
  6. C. Ferrand, "Harmonics-to-Noise Ratio: An Index of Vocal Aging," Journal of Voice, Vol. 16, No. 4, pp. 480-477, 2002. https://doi.org/10.1016/S0892-1997(02)00123-6
  7. D. Raynolds and R. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Trans. Speech and Audio Proc., Vol. 3, No. 1, pp. 72-83, 1995. https://doi.org/10.1109/89.365379
  8. L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.

Cited by

  1. Effect of Nasal Septoturbinoplasty on Nasality or Acoustic Parameters vol.59, pp.7, 2016, https://doi.org/10.3342/kjorl-hns.2016.59.7.510