대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition

  • 오유리 (광주과학기술원, 정보통신 공학과) ;
  • 윤재삼 (광주과학기술원, 정보통신 공학과) ;
  • 김홍국 (광주과학기술원, 정보통신 공학과)
  • Oh, Yoo-Rhee (Dept. of Information and Communications, Gwangju Institute of Science and Technology) ;
  • Yoon, Jae-Sam (Dept. of Information and Communications, Gwangju Institute of Science and Technology) ;
  • kim, Hong-Kook (Dept. of Information and Communications, Gwangju Institute of Science and Technology)
  • 발행 : 2005.11.17

초록

In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

키워드