DOI QR코드

DOI QR Code

통합적 제약완화 방식을 통한 한국어 문맥의존 철자오류 교정규칙의 재현율 향상

Improving Recall for Context-Sensitive Spelling Correction Rules Through Integrated Constraint Loosening Method

  • 최현수 (부산대학교 전자전기컴퓨터공학과) ;
  • 윤애선 (부산대학교 불어불문학과) ;
  • 권혁철 (부산대학교 정보컴퓨터공학부)
  • 투고 : 2014.09.02
  • 심사 : 2015.03.05
  • 발행 : 2015.06.15

초록

문맥의존 철자오류는 단독으로 사용하면 정확한 어절이지만, 문맥을 고려했을 때 오류인 유형이다. 이를 검색하고 교정하기가 매우 어려우며, 고품질 맞춤법 검사기의 성능을 크게 좌우한다. 한국어 맞춤법 검사기에서의 문맥의존 철자오류는 언어 전문가에 의해 수작업으로 구축된 교정규칙을 사용하는 것이 가장 일반적이다. 이때 규칙을 이용한 방법은 그 특성상 교정 정확도는 매우 높지만, 재현율은 매우 낮다. 본 논문에서는 기존에 연구되었던 교정규칙에서의 선택제약 명사 확장 방식과 조사 제약조건을 완화하는 방법을 통합하여 정확도를 유지하거나 거의 낮추지 않으면서, 재현율을 향상시키는 방법을 제안한다. 또한, 두 방식을 단순하게 통합하지 않고 수의적 부사 삽입과 활용형, 관형형을 고려하여 단계별로 통합하는 방식을 제안하여 평균적으로 정확도를 거의 낮추지 않고 재현율을 약 13% 향상시킨다.

Context-sensitive spelling errors (CSSE) are hard to correct, since they are perfect words when analyzed alone. Determined only by considering the semantic and syntactic relations of their context, CSSEs affect largely the performance of spelling and grammar checkers. The existing Korean Spelling and Grammar Checker (KSGC 4.5) adopts a rule-based method, which uses hand-made correction rules for CSSEs. Using rule-based method, the KSGC 4.5 is designed to obtain the very high precision, which results in the extremely low recall. In this paper, we integrate our previous works that control the CSSE correction rules, in order to improve the recall without sacrificing the precision. In addition to the integration, facultative insertion of adverbs and conjugation suffix of predicates are also considered, as for constraint-loosening linguistic features.

키워드

과제정보

연구 과제 주관 기관 : 한국연구재단

참고문헌

  1. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schutze, Introduction to information retrieval, Vol. 1, pp. 151-177, Cambridge: Cambridge university press, 2008.
  2. "Korean Speller and Grammar Checker 4.5" [Online]. Available: http://speller.cs.pusan.ac.kr (released in Oct. 2nd 2013)
  3. Minho Kim, Hyuk-Chul Kwon, Hyunsoo Choi, Aesun Yoon, "Generalization of Context-sensitive Spelling Correction Rules using Korean WordNet," Journal of KIISE : Computing Practices and Letters, Vol. 20, No. 2, pp. 106-110, Feb. 2014. (in Korean)
  4. Hyun Soo Choi, Hyuk-Chul Kwon, Aesun Yoon, "Improving Recall for Context-Sensitive Spelling Correction by Weakening Constraints on Case Markers," Journal of KIISE : Software and Applications, Vol. 41, No. 3, pp. 249-256, Mar. 2014. (in Korean)
  5. Hirst, Graeme and Alexander Budanitsky. "Correcting real-word spelling errors by restoring lexical cohesion," Natural Language Engineering, Vol. 11, No. 1, pp. 87-111, Mar. 2005. https://doi.org/10.1017/S1351324904003560
  6. Wilcox-O'Hearn, Amber, Graeme Hirst, and Alexander Budanitsky, "Real-word spelling correction with trigrams: A reconsideration of the mays, damerau, and mercer model," Proc. of 9th International Conference on Intelligent Text Processing and Computational Linguistics, Vol. 4919, pp. 605-616, Feb. 2008.
  7. Islam, Aminul and Diana Inkpe, "Real-Word Spelling Correction using Google Web 1T 3-grams," Proc. of International Conference on Natural Language Processing and Knowledge Engineering, Vol. 3, pp. 1241-1249, 2009.
  8. Stephen D. Richardson and Lisa C. Braden-harder. "The experience of developing a large-scale natural language text processing system: CRITIQUE," Proc. of The 2nd Annual Applied Natural Language Conference, pp. 195-202, 1988.
  9. Ralph M. Weischedel and Norman K. Sondheimer, "Meta-rules as a basis for processing ill-formed input," Comutational Linguistics, Vol. 9, No. 3-4, pp. 161-177, Jul-Dec. 1983.
  10. Linda Z. Suri, "Language transfer: A foundation for correcting the written English of ASL signers," University of Delaware Technical Report TR-91-19, 1991.
  11. Jaime G. Carbonell and Philip J. Hayes, "Recovery strategies for parsing extragrammatical language," Computational Linguistics, Vol. 9, No. 3-4. Jul-Dec. 1983.
  12. Ki-shim Nam and Young-gun Ko, Syntax.Semantic of Korean Language, TOP publisher, Seoul, 1991. (in Korean)
  13. "Urimal baeumteo" [Online]. Available: http://urimal.cs.pusan.ac.kr\