DOI QR코드

DOI QR Code

A New Speech Quality Measure for Speech Database Verification System

음성 인식용 데이터베이스 검증시스템을 위한 새로운 음성 인식 성능 지표

  • Ji, Seung-eun (Department of Computer Science & Engineering, Incheon National University) ;
  • Kim, Wooil (Department of Computer Science & Engineering, Incheon National University)
  • Received : 2016.01.20
  • Accepted : 2016.03.03
  • Published : 2016.03.31

Abstract

This paper presents a speech recognition database verification system using speech measures, and describes a speech measure extraction algorithm which is applied to this system. In our previous study, to produce an effective speech quality measure for the system, we propose a combination of various speech measures which are highly correlated with WER (Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we increase the system independency by employing GMM acoustic score instead of HMM score which is obtained by a secondary speech recognition system. The combination with GMM score shows a slightly lower correlation with WER compared to the combination with HMM score, however it presents a higher relative improvement in correlation with WER, which is calculated compared to the correlation of each speech measure alone.

본 논문에서는 음성의 특성 지표를 이용한 음성 인식용 데이터베이스 검증 시스템의 개발 내용을 소개하고 이 시스템의 핵심 기술인 음성 특성 지표 추출 알고리즘을 설명한다. 선행 연구에서는 본 시스템에 필요한 효과적인 음성 인식 성능 지표를 생성하기 위해 대표적인 음성 인식 성능 지표인 단어 오인식률(Word Error Rate, WER)과 상관도가 높은 여러 가지 음성 특성 지표들을 조합하여 새로운 성능 지표를 생성하였다. 생성된 음성 인식 성능 지표는 다양한 잡음 환경에서 각 음성 특성 지표를 단독으로 사용할 때보다 단어 오인식률과 높은 상관도를 나타내어 음성 인식 성능을 예측하는데 효과적임을 입증 하였다. 본 실험에서는 선행 연구에서 조합에 사용한 이차적인 음성 인식기에서 추출된 음향 모델 확률 값을 GMM(Gaussian Mixture Model) 음향 모델 확률 값으로 대체해 조합함으로써 시스템 구축 시 다른 음성 인식기에 대한 의존성을 감소시킨다.

Keywords

References

  1. S. -Y. Yoon, L. Chen and K. Zechner, "Predicting Word Accuracy for the Automatic Speech Recognition of Non-native Speech," Interspeech-2010, pp. 773-776, 2010.
  2. W. Kim and J. H. L. Hansen, "Phonetic Distance Based Confidence Measure," Signal Processing Letters, IEEE vol. 17, no. 2 , pp. 121-124, Feb. 2010. https://doi.org/10.1109/LSP.2009.2034551
  3. S. Ji and W. Kim, "A Study on Speech Measure Analysis for Speech Recognition Accuracy Estimation in Noisy Environments," A Conference of Acoustical Society of Korea, vol. 34, no. 1, pp. 46, May 2015.
  4. S. Ji, J. Cho and W. Kim, "Development of Database Verification System for Automatic Speech Recognition," KCC2015, vol. 34, pp. 719-720, June 2015.
  5. S. Ji and W. Kim, "A Study on Effective Speech Recognition Performance Measure using MFCC Similarity," KSCSP-2015, vol. 32, no. 1, pp.220-222, Aug. 2015.
  6. Tcl Developer Xchange. Tcl/tk Software and download page [Internet]. Available: http://www.tcl.tk/software/tcltk
  7. SNACK Sound Toolkit developed by KTH Royal Institute of Technology. Snack software and tutorial download page [Internet]. Available: http://www.speech.kth.se/snack
  8. Y. Hu and P. C. Loizou, "Evaluation of Objective Measure for Speech Enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 16, no. 1, pp. 229-238, Sep. 2008. https://doi.org/10.1109/TASL.2007.911054
  9. Hidden Markov Model Toolkit (HTK) developed by Cambridge University. HTK software and tutorial download page [Internet]. Available: http://htk.eng.ca0m.ac.uk
  10. SPHINX project by Carnegie Mellon University. SPHINX software and tutorial download page [Internet]. Available: http://cmusphinx.sourceforge.net
  11. STNR technique provided by National Institute of Standards and Technology(NIST) [Internet]. Available: http://www.nist.gov/speech

Cited by

  1. 효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표 vol.21, pp.12, 2016, https://doi.org/10.6109/jkiice.2017.21.12.2291
  2. 인공지능 스피커와 아동들의 상호작용 :유형별 성공/실패 사례 도출을 위한 현장 연구 vol.20, pp.7, 2016, https://doi.org/10.5392/jkca.2020.20.07.019