• Title/Summary/Keyword: statistics-based spelling error correction

Search Result 2, Processing Time 0.015 seconds

Context-sensitive Spelling Error Correction using Eojeol N-gram (어절 N-gram을 이용한 문맥의존 철자오류 교정)

  • Kim, Minho;Kwon, Hyuk-Chul;Choi, Sungki
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1081-1089
    • /
    • 2014
  • Context-sensitive spelling-error correction methods are largely classified into rule-based methods and statistical data-based methods, the latter of which is often preferred in research. Statistical error correction methods consider context-sensitive spelling error problems as word-sense disambiguation problems. The method divides a vocabulary pair, for correction, which consists of a correction target vocabulary and a replacement candidate vocabulary, according to the context. The present paper proposes a method that integrates a word-phrase n-gram model into a conventional model in order to improve the performance of the probability model by using a correction vocabulary pair, which was a result of a previous study performed by this research team. The integrated model suggested in this paper includes a method used to interpolate the probability of a sentence calculated through each model and a method used to apply the models, when both methods are sequentially applied. Both aforementioned types of integrated models exhibit relatively high accuracy and reproducibility when compared to conventional models or to a model that uses only an n-gram.

Improving Recall for Context-Sensitive Spelling Correction Rules using Conditional Probability Model with Dynamic Window Sizes (동적 윈도우를 갖는 조건부확률 모델을 이용한 한국어 문맥의존 철자오류 교정 규칙의 재현율 향상)

  • Choi, Hyunsoo;Kwon, Hyukchul;Yoon, Aesun
    • Journal of KIISE
    • /
    • v.42 no.5
    • /
    • pp.629-636
    • /
    • 2015
  • The types of errors corrected by a Korean spelling and grammar checker can be classified into isolated-term spelling errors and context-sensitive spelling errors (CSSE). CSSEs are difficult to detect and to correct, since they are correct words when examined alone. Thus, they can be corrected only by considering the semantic and syntactic relations to their context. CSSEs, which are frequently made even by expert wiriters, significantly affect the reliability of spelling and grammar checkers. An existing Korean spelling and grammar checker developed by P University (KSGC 4.5) adopts hand-made correction rules for correcting CSSEs. The KSGC 4.5 is designed to obtain very high precision, which results in an extremely low recall. Our overall goal of previous works was to improve the recall without considerably lowering the precision, by generalizing CSSE correction rules that mainly depend on linguistic knowledge. A variety of rule-based methods has been proposed in previous works, and the best performance showed 95.19% of average precision and 37.56% of recall. This study thus proposes a statistics based method using a conditional probability model with dynamic window sizes. in order to further improve the recall. The proposed method obtained 97.23% of average precision and 50.50% of recall.