DOI QR코드

DOI QR Code

MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters

  • 투고 : 2020.04.10
  • 심사 : 2020.06.05
  • 발행 : 2020.06.30

초록

본 논문에서는 화자인식 성능 향상을 위해 음원에서 개선된 특징추출 방식과 최소 분류 오차 기반의 다중 특징 벡터 스코어에 대한 가중치 추정을 사용하여 스코어 결합을 제안하였다. 제안한 특징 벡터는 Glottal Flow에서 무의미한 정보구간인 평탄한 스펙트럼 구간을 제거하기 위하여 저역통과 필터를 수행한 신호에서 인지적 선형 예측 캡스트럼 계수, 왜도, 첨도를 추출하여 구성하였다. 제안한 특징 벡터는 종래의 음원에서 멜-주파수 캡스트럼 계수, 인지적 선형 예측 캡스트럼 계수를 추출하여 가우시안 혼합 모델로 모델링한 화자인식 시스템을 개선하기 위해 사용된다. 또한, 스코어 추정과정의 신뢰성을 높이기 위하여 기존의 스코어의 확률 분포를 사용하여 가중치를 추정하는 대신 제안한 특징 벡터에서 평가된 점수와 종래의 특징 벡터에서 평가된 점수에 대하여 최소 분류 오차 기법으로 가중치를 추정하여 스코어를 결합함으로써 최적의 화자를 찾는다. 실험 결과 제안한 특징 벡터가 화자를 인식하는데 유효한 정보를 포함하고 있는 것을 확인하였다. 또한, 최소 분류 오차 기반의 다중 특징 파라미터 스코어를 결합하여 화자인식을 수행하였을 때, 종래의 화자인식 성능보다 더 우수한 성능을 나타내는 것을 확인할 수 있으며, 특히 가우시안 혼합 모델이 낮을 때 더 높은 성능향상을 보였다.

In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.

키워드

참고문헌

  1. V. Tiwari, "MFCC and its applications in speaker recognition," IEEE International Journal on Emerging Technologies, vol. 1, no. 7, pp. 33-37, May 2013.
  2. K. Kau and N. Jain, "Feature Extraction and Classification for Automatic Speaker Recognition System - A Review," International Journal of Advanced Research in Computer Science and Software Engineering, vol. 5, no. 1, pp. 1-6, January 2015.
  3. K. Dhameliya and N. Bhatt, "Feature Extraction And Classification Techniques for Speaker Recognition: A Review," IEEE International Conference on Electrical, Electronics, Signal, Communication and Optimization (EESCO), pp. 1-4, January 2015. DOI:http://dx.doi.org/10.1109/EESCO.2015.7253831
  4. T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Communication, vol. 52, no. 1, pp. 12-40, January 2010. DOI: http://dx.doi.org/10.1016/j.specom.2009.08.009
  5. Sonali T. Saste1 and Prof. S. M. Jagdale, "Comparative Study of Different Techniques in Speaker Recognition: Review," International Journal of Advanced Engineering, Management and Science (IJAEMS), vol. 3, no. 3, pp. 284-287, March 2017. DOI: https://dx.doi.org/10.24001/ijaems.3.3.25
  6. B. Putra and Suyanto, "Implementation of secure speaker verification at web login page using Mel Frequency Cepstral coefficient-Gaussian Mixture Model (MFCC-GMM)," International Conference on Instrumentation Control and Automation (ICA), pp. 358-363, November 2011. DOI: http://dx.doi.org/10.1109/ICA.2011.6130187
  7. H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," The Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738-1752, April 1990. DOI: http://dx.doi.org/10.1121/1.399423
  8. C. L. Nikias, "Higher-Order Spectral Analysis," Proceedings of the 15th Annual International Conference of the IEEE Engineering in Medicine and Biology Societ, pp. 319-319, October 1993. DOI: http://dx.doi.org/10.1109/IEMBS.1993.978564
  9. T. Kinnunen and P. Alku, "On separation glottal source and vocal tract information in telephony speaker verification," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4545-4548, April 2009. DOI: https://doi.org/10.1109/ICASSP.2009.4960641
  10. D. Raynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995. DOI: https://doi.org/10.1109/89.365379
  11. P. Salmela, K. Laurila, M. Lehtokangas and J. Saarinen, "On string level MCE training in MLP/HMM speech recognition system," IEEE International Conference on Systems, Man, and Cybernetics, vol. 2, pp. 165-171, October 1999. DOI: https://doi.org/10.1109/ICSMC.1999.825227