DOI QR코드

DOI QR Code

Improving Recall for Context-Sensitive Spelling Correction Rules using Conditional Probability Model with Dynamic Window Sizes

동적 윈도우를 갖는 조건부확률 모델을 이용한 한국어 문맥의존 철자오류 교정 규칙의 재현율 향상

  • 최현수 (부산대학교 전자전기컴퓨터공학과) ;
  • 권혁철 (부산대학교 컴퓨터공학과) ;
  • 윤애선 (부산대학교 불어불문학과)
  • Received : 2015.01.09
  • Accepted : 2015.02.23
  • Published : 2015.05.15

Abstract

The types of errors corrected by a Korean spelling and grammar checker can be classified into isolated-term spelling errors and context-sensitive spelling errors (CSSE). CSSEs are difficult to detect and to correct, since they are correct words when examined alone. Thus, they can be corrected only by considering the semantic and syntactic relations to their context. CSSEs, which are frequently made even by expert wiriters, significantly affect the reliability of spelling and grammar checkers. An existing Korean spelling and grammar checker developed by P University (KSGC 4.5) adopts hand-made correction rules for correcting CSSEs. The KSGC 4.5 is designed to obtain very high precision, which results in an extremely low recall. Our overall goal of previous works was to improve the recall without considerably lowering the precision, by generalizing CSSE correction rules that mainly depend on linguistic knowledge. A variety of rule-based methods has been proposed in previous works, and the best performance showed 95.19% of average precision and 37.56% of recall. This study thus proposes a statistics based method using a conditional probability model with dynamic window sizes. in order to further improve the recall. The proposed method obtained 97.23% of average precision and 50.50% of recall.

한국어 맞춤법 검사기가 교정하는 오류어의 유형은 크게 단순 철자오류와 문맥의존 철자오류로 구분할 수 있다. 이 중 문맥의존 철자오류는 어절(word)단위로 봤을 때는 올바르지만, 문맥을 고려하였을 때 오류가 되는 유형으로, 교정 난도가 매우 높다. 문맥의존 철자오류는 글을 쓰는 사람들도 자주 저지르는 오류이므로, 이를 잘 검색하여 정확하게 교정하는 것이 맞춤법 검사기의 사용자가 갖는 신뢰도에 큰 영향을 미친다. 높은 정확도가 매우 중요하므로, 문맥의존 철자오류의 교정 방법은 대부분 규칙에 기반한다. 반대 급부로 재현율이 매우 낮다는 단점을 갖는다. 문맥의존 철자오류의 교정에서 재현율을 높이기 위한 방법은 크게 언어지식을 이용하여 규칙을 일반화하는 방법과 통계 정보에 기반을 하여 공기 어휘의 제약 조건을 확장하는 방법으로 나뉠 수 있다. 기존 연구는 언어지식을 이용하여 규칙을 일반화하는 다양한 방식을 연구했으나, 최고 성능이 평균 정확도 95.19%, 평균 재현율 37.56%을 보였다. 본 논문에서는 통계정보에 기반한 규칙의 확장 방식을 제안한다. 동적 윈도우를 갖는 조건부확률 모델을 이용한 방법이며, 최고 성능은 평균 정확도 97.23%, 평균 재현율 50.50%을 보여주었다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Christopher, D. Manning, Raghavan Prabhakar, and Schutze Hinrich, "Introduction to information retrieval," An Introduction To Information Retrieval: 151-177. (2008)
  2. "Korean Speller and Grammar Checker 4.5," , released in Oct. 2nd 2013.
  3. Hyun Soo Choi, Hyuk-Chul Kwon, Aesun Yoon, "Improving Recall for Context-Sensitive Spelling Correction Rules Using Integrated Method," Proc. of the KIISE Korea Computer Congress 2014, pp. 580-582. (in Korean)
  4. Minho Kim, Hyuk-Chul Kwon, Hyun Soo Choi, Aesun Yoon, "Generalization of Context-sensitive Spelling Correction Rules using Korean WordNet," Journal of KIISE : Computing Practices and Letters, Vol. 20, No. 2, pp. 106-110, Feb. 2014. (In Korean)
  5. Hyun Soo Choi, Hyuk-Chul Kwon, Aesun Yoon, "Improving Recall for Context-Sensitive Spelling Correction by Weakening Constraints on Case Markers," Journal of KIISE : Software and Applications, Vol. 41, No. 3, pp. 249-256, Mar. 2014. (in Korean)
  6. Stephen D. Richardson and Lisa C. Braden-harder. "The experience of developing a large-scale natural language text processing system: CRITIQUE," Proc. The 2nd Annual Applied Natural Language Conference, pp. 195-202. (1988)
  7. Ralph M. Weischedel and Norman K. Sondheimer, "Meta-rules as a basis for processing ill-formed input," Comutational Linguistics, Vol. 9, No. 3-4, pp. 161-177. (1983)
  8. Linda Z. Suri, "Language transfer: A foundation for correcting the written English of ASL signers," University of Delaware Technical Report TR-91-19. (1991)
  9. Minho Kim, Hyuk-Chul Kwon, Sungki Choi, "Context-sensitive Spelling Error Correction using Eojeol N-gram," Journal of KIISE, Vol. 40, No. 12, pp. 1081-1089, Dec. 2014. (In Korean)
  10. Cheol Choi, Sejin Park, Cheoljung Kim, Gyus, "Analysis of Uncorrected Typing Rate of keyboard to Design Ergonomic Keyboard Based on Qwerty Keyboard," Proc. of the Ergonomics Society of Korea Spring Conference, Vol. 1, pp. 142-145, 2000. (in Korean)
  11. "Urimal baeumteo" [Online]. Available at http://urimal.cs.pusan.ac.kr