N- gram Adaptation Using Information Retrieval and Dynamic Interpolation Coefficient

정보검색 기법과 동적 보간 계수를 이용한 N-gram 언어모델의 적응

  • 최준기 (한국과학기술원 전산학과 음성인터페이스연구실) ;
  • 오영환 (한국과학기술원 전산학과 음성인터페이스연구실)
  • Published : 2005.12.01

Abstract

The goal of language model adaptation is to improve the background language model with a relatively small adaptation corpus. This study presents a language model adaptation technique where additional text data for the adaptation do not exist. We propose the information retrieval (IR) technique with N-gram language modeling to collect the adaptation corpus from baseline text data. We also propose to use a dynamic language model interpolation coefficient to combine the background language model and the adapted language model. The interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech data reserved for held-out validation data. This allows the final adapted model to improve the performance of the background model consistently The proposed approach reduces the word error rate by $13.6\%$ relative to baseline 4-gram for two-hour broadcast news speech recognition.

Keywords