Browse > Article
http://dx.doi.org/10.9708/jksci.2013.18.10.173

A Method for Detection and Correction of Pseudo-Semantic Errors Due to Typographical Errors  

Kim, Dong-Joo (College of Liberal Arts, Anyang University)
Abstract
Typographical mistakes made in the writing process of drafts of electronic documents are more common than any other type of errors. The majority of these errors caused by mistyping are regarded as consequently still typo-errors, but a considerable number of them are developed into the grammatical errors and the semantic errors. Pseudo semantic errors among these errors due to typographical errors have more noticeable peculiarities than pure semantic errors between senses of surrounding context words within a sentence. These semantic errors can be detected and corrected by simple algorithm based on the co-occurrence frequency because of their prominent contextual discrepancy. I propose a method for detection and correction based on the co-occurrence frequency in order to detect semantic errors due to typo-errors. The co-occurrence frequency in proposed method is counted for only words with immediate dependency relation, and the cosine similarity measure is used in order to detect pseudo semantic errors. From the presented experimental results, the proposed method is expected to help improve the detecting rate of overall proofreading system by about 2~3%.
Keywords
Semantic Error; Pseudo Semantic Error; Typographical Error; Proofreading System;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Byung-hoon Lee, Korean Spelling Corrector Based on Corpus Analysis, MS Thesis, Yonsei University, 1993.
2 Dong-joo Kim, "Detecting Spelling Errors by Comparison of Words within a Document," Journal of the Korea Society of Computer and Information, Vol. 16, No. 12, pp. 83-92, 2011.   과학기술학회마을   DOI   ScienceOn
3 Dong-joo Kim, et al., "Design and Implementation of Morphological Analyser for Korean Spell Checker," Proceedings of IEEK Summer Conference, IEEK, Vol. 20, No. 1, pp. 255-258, 1997.
4 Hyuk-chul Kwon, "Korean Spelling and Grammar Checker", Journal of the Korea Society of Computer and Information, Vol. 15, No. 10, pp. 24-34, 1997.
5 Hall, Patrick A. V., et al., "Approximate string matching," ACM Computing Surveys, vol. 12, No. 4, pp. 381-402, December, 1980.   DOI
6 Jae-Hyuk Choi, "Automatic Korean Spacing Words Correction System With Bidirectional Longest Match Strategy," Proceedings of the 9th Conference on Hangul and Korean Information Processing, pp. 304-315, 1997.
7 Seung-Shik Kang, et al., "Morphological Analysis and Spelling Check Function of Korean Morphological Analyzer HAM," Proceedings of the 8th Conference on Hangul and Korean Information Processing, pp. 246-252, 1996.   과학기술학회마을
8 Hang Li, et al., "Word Clustering and Disambiguation Based on Co-occurrence Data," The On-Line Proceedings of the ACL, 1998.
9 Ellen Riloff, "Atomatically Generating Extraction Patterns from Untagged Text," Proceedings of the AAAI-96, pp. 1044-1049, 1996.
10 Kong-joo Lee, et al., "Automatic Word Classification and Wordtags in Korean," Proceedings of the 23th KISS Spring Conference, Vol. 23, No. 1, pp. 961-964, 1996.
11 Young-sin Lee, et al., "Automatic Spelling Correction using an Error-tolerant Morphological Analyzer and Co-occurrence Information," Proceedings of the 24th KISS Spring Conference, Vol. 24, No. 1, pp. 411-413, 1998.
12 Ted Pedersen, et al., "A New Supervised Learning Algorithm for Word Sense Disambiguation," Proceedings of the AAAI-97, pp. 604-609, 1997.
13 Marc Light, "Morphological Cues for Lexical Semantics," The On-Line Proceedings of the ACL, 1996.
14 Kamal Nigram, et al., "Learning to Classify Text from Labeled and Unlabeled Documents," Proceedings of the AAAI-98, pp. 792-799, 1998.
15 Yun-jin Nam, et al., "Constructing Dictionary Information for the Processing of Derivational Suffixes of Nouns based on Corpus Analysis," Journal of the Korea Society of Computer and Information, Vol. 23, No. 4, pp. 389-401, 1996.
16 Fernando Pereira, et al., "Distributional Clustering of English Words," ACL On-line proceeding, 1994.
17 Lillian Jane Lee, Similarity-Based Approaches to Natural Language Processing, Ph. D. Thesis, Harvard University, 1997.
18 Young-soog Chae, et al., "Introduction of KIBS (Korean Information Base System) Project," International Conference on Language Resources and Evalution (LREC2000), Serial. 2, Athens, Greece, pp. 1731-1735, 2000.
19 Dae-seon Choi, et al., "A Two-Phase Dependency Parser of Korean," Proceedings of the natural language pacific rim symposium, 1995.
20 Jong-hyeok Lee, et al., "Structural Disambiguation Using Constraint-Satisfaction Algorithm for Dependency Parsing," Proceesings of the International Conference on Computer Processing of Oriental Language, pp. 213-216, 1995.
21 Hyung-jong Noh, et al., "A Joint Statistical Model for Word Spacing and Spelling Error Correction Simultaneously," Journal of the Korea Information Science Society: Software and Applications, Vol. 34, No. 2, pp. 131-139, 07.   과학기술학회마을