문장음성인식을 위한 VCCV 기반의 효율적인 언어모델

Efficient Language Model based on VCCV unit for Sentence Speech Recognition

  • 박선희 (성균관대학교 정보통신공학부) ;
  • 노용완 (성균관대학교 정보통신공학부) ;
  • 홍광석 (성균관대학교 정보통신공학부)
  • 발행 : 2003.11.21

초록

In this paper, we implement a language model by a bigram and evaluate proper smoothing technique for unit of low perplexity. Word, morpheme, clause units are widely used as a language processing unit of the language model. We propose VCCV units which have more small vocabulary than morpheme and clauses units. We compare the VCCV units with the clause and the morpheme units using the perplexity. The most common metric for evaluating a language model is the probability that the model assigns the derivative measures of perplexity. Smoothing used to estimate probabilities when there are insufficient data to estimate probabilities accurately. In this paper, we constructed the N-grams of the VCCV units with low perplexity and tested the language model using Katz, Witten-Bell, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.

키워드