대한전기학회:학술대회논문집 (Proceedings of the KIEE Conference)
- 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 B
- /
- Pages.836-839
- /
- 2003
문장음성인식을 위한 VCCV 기반의 효율적인 언어모델
Efficient Language Model based on VCCV unit for Sentence Speech Recognition
- 발행 : 2003.11.21
초록
In this paper, we implement a language model by a bigram and evaluate proper smoothing technique for unit of low perplexity. Word, morpheme, clause units are widely used as a language processing unit of the language model. We propose VCCV units which have more small vocabulary than morpheme and clauses units. We compare the VCCV units with the clause and the morpheme units using the perplexity. The most common metric for evaluating a language model is the probability that the model assigns the derivative measures of perplexity. Smoothing used to estimate probabilities when there are insufficient data to estimate probabilities accurately. In this paper, we constructed the N-grams of the VCCV units with low perplexity and tested the language model using Katz, Witten-Bell, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.