DOI QR코드

DOI QR Code

주파수 변이를 이용한 Parallel Model Combination 모델 적응에 기반한 잡음에 강한 음성인식

Noise Robust Speech Recognition Based on Parallel Model Combination Adaptation Using Frequency-Variant

  • 최숙남 (영남대학교 정보통신공학과) ;
  • 정현열 (영남대학교 정보통신공학과)
  • Choi, Sook-Nam (Department of Information and Communication,Yeongnam University) ;
  • Chung, Hyun-Yeol (Department of Information and Communication,Yeongnam University)
  • 투고 : 2013.02.18
  • 심사 : 2013.04.30
  • 발행 : 2013.05.31

초록

일반적인 음성인식 시스템은 조용한 인식 환경에서는 높은 인식성능을 나타내지만 잡음이 존재하는 실제 환경에서는 그 성능이 급격히 저하한다. 본 논문에서는 다양한 잡음환경에서도 강인한 음성인식기를 구현하기 위하여, 주파수의 변이도를 이용하여 음성인식을 위한 환경 정보를 얻고 이를 음성 인식을 위한 모델 개선에 적용하여 성능향상을 도모하는 환경정보 지식에 기반한 주파수 변이 적응 PMC (Parallel Model Combination adaptation using frequency-variant based on environment - awareness : FV-PMC) 방법을 제안한다. 이 방법은 미리 분류된 각 잡음 군간의 평균 주파수 변이도를 미리 계산하여 임계치로 설정하고 미지의 잡음이 포함된 음성이 입력되면 각 잡음 군과의 주파수 변이도를 다시 계산하여 해당 잡음군의 임계치 보다 높을 경우 그 잡음 군의 잡음이 포함된 음성으로 간주하여 이 잡음 군이 포함된 음성을 이용하여 생성된 인식모델을 이용하여 음성인식을 수행한다. 제안한 FV-PMC 방법을 이용하여 잡음을 분류 하였을 경우 평균 분류 정확도는 56%를 보였고 이를 이용해 음성인식 실험을 실시한 결과 Set A의 평균인식률은 79.05%, Set B의 평균인식률은 79.43%, Set C의 평균인식률은 83.37%로 나타났다. 전체 평균인식률 80.62%로 기존의 깨끗한 모델을 이용한 PMC 인식률 74.93% 보다 5.69% 향상된 결과를 보여 제안한 방법의 유효성을 확인할 수 있었다.

The common speech recognition system displays higher recognition performance in a quiet environment, while its performance declines sharply in a real environment where there are noises. To implement a speech recognizer that is robust in different speech settings, this study suggests the method of Parallel Model Combination adaptation using frequency-variant based on environment-awareness (FV-PMC), which uses variants in frequency; acquires the environmental data for speech recognition; applies it to upgrading the speech recognition model; and promotes its performance enhancement. This FV-PMC performs the speech recognition with the recognition model which is generated as followings: i) calculating the average frequency variant in advance among the readily-classified noise groups and setting it as a threshold value; ii) recalculating the frequency variant among noise groups when speech with unknown noises are input; iii) regarding the speech higher than the threshold value of the relevant group as the speech including the noise of its group; and iv) using the speech that includes this noise group. When noises were classified with the proposed FV-PMC, the average accuracy of classification was 56%, and the results from the speech recognition experiments showed the average recognition rate of Set A was 79.05%, the rate of Set B 79.43%m, and the rate of Set C 83.37% respectively. The grand mean of recognition rate was 80.62%, which demonstrates 5.69% more improved effects than the recognition rate of 74.93% of the existing Parallel Model Combination with a clear model, meaning that the proposed method is effective.

키워드

참고문헌

  1. Yao, E. Visser, O. W. Kwon and T. W. Lee, "A seech processing front-end with eigenspace normalization for robust speech recognition in noisy automobile environments," Proc. Eurospeech, 9-12 ( 2003).
  2. Seon-Mi Gang, "Study on speech recognition under noisy environments" (in Korean), J. Inst. Ind. Tech. 3, 301-318 (1997).
  3. J. S. Lim, A. V. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proceedings IEEE, 67, 1586-1604 (1979).
  4. Y. Ephraim and D. Malah, and B. H. Juang, "On the application of hidden markov models for enhancing noisy speech," Proc. ICASSP, 533-536 (1992).
  5. J. C. Junqua and J. P. Haton, Robustness in Automatic Speech Recognition: Fundamentals and Applications, (Kluwer Academic Publishers, 1996).
  6. Y. H. Suk, S. H. Choi, and H. S. Lee, "Cepstrum PDF normalization method for Speech recognition in noise environment"(in Korean), J. Acoust. Soc. Kr. 4(s) 24, 224-229 (2005).
  7. Hanson, B. A., and Wakita, H., "Spectral slope distance measure with linear prediction analysis for word recognition in noise," IEEE Trans. on ASSP, ASSP-35, 7, 968-973 ( 1987).
  8. Juang, B. H., Rabiner, L., and Wilpon, J., "On the use og bandpass liftering in speech recognition," ICASSP, 765-768 (1986).
  9. A. Nadas, D. Nahamoo and M. Picheny, "Speech recognition using noise adaptive prototypes," Proc. ICASSP, 517-520 (1988).
  10. Gue-Jun Jung, Hoon-Young Cho, and Yung-Hwan Oh, "Improved compensation of dynamic parameter in PMC for robust speech recognition"(in Korean), J. Acoust. Soc. Kr. 1(s) 20, 183-186 (2001).
  11. K. C. SIM, M.T. LUONG, "A trajectory-based parallel model combination with a unified static and dynamic parameter compensation for noisy speech recognition," ASRU, 107-112 ( 2011).
  12. G.H. Shen, H.Y. Jung, and H. Y. Chung, "A noise robust speech recognition method using model compensation based on speech enhancement"(in Korean), J. Acoust. Soc. Kr. 4(s) 27, 191-199 (2008).
  13. Hadi Veisi, Hossein Sameti, "Cepstral-domain hmm - based speech enhancement using vector taylor series and parallel model combination," ISSPA, 298-303(2012).
  14. Philipos C .Loizou, Speech Enhancement -Theory and Practice, (CRC Press, Florida, 2007).
  15. Varga A. and Moore R.,"Hidden markov model decomposition of speech and noise," ICASSP, 845-848 (1990).
  16. Nakamura, S. Qiang Hou, Shikano, K., "Model adaptation based on hmm decomposition for reverberant speech recognition," ICASSP, 21-24 (1997).
  17. G. J. Jung, "Improved on-line model compensation for robust speech recognition"(in Korean), Master's thesis (2002).
  18. Gales,M. and Young S.,"HMM recognition in noise using parallel model combination," EUROSPEECH, 837-840 (1993).
  19. M. J. F. Gales, S. Young, "Robust continuous speech recognition using parallel model combination," IEEE TSAP, 4, 352-359 (1996).
  20. Rabiner, lr, and Juang, bh, Fundamentals of Speech Recognition,( Prentice-Hall, New Jersey,1993).
  21. H.-G Hirsch, D. Pearce, "The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions," ISCA ITRW ASR (2000).

피인용 문헌

  1. Performance Improvement Methods of a Spoken Chatting System Using SVM vol.4, pp.6, 2015, https://doi.org/10.3745/KTSDE.2015.4.6.261