Browse > Article
http://dx.doi.org/10.5626/JOK.2015.42.1.68

Automatic Word Spacing Using Raw Corpus and a Morphological Analyzer  

Shim, Kwangseob (Sungshin Univ.)
Publication Information
Journal of KIISE / v.42, no.1, 2015 , pp. 68-75 More about this Journal
Abstract
This paper proposes a method for the automatic word spacing of unsegmented Korean sentences. In our method, eojeol monograms are used for word spacing as opposed to the syllable n-grams that have been used in previous studies. The use of a Korean morphological analyzer is limited to the correction of typical word spacing errors. Our method gives a 98.06% syllable accuracy and a 94.15% eojeol recall, when 10-fold cross-validated with the Sejong corpus, after filtering out non-hangul eojeols. The processing rate is 250K eojeols or 1.8 MB per second on a typical personal computer. Syllable accuracy and eojeol recall are related to the size of the eojeol dictionary, better performance is expected with a bigger corpus.
Keywords
automatic word spacing; morphological analysis; eojeol dictionary; sejong corpus;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Seung-Shik Kang, "Eojeol-Block Bidirectional Algorithm for Automatic Word Spacing of Hangul Sentences," Journal of KIISE : Software and Applications, Vol. 27, No. 4, pp. 441-447, 2000. (in Korean)   과학기술학회마을
2 Kye Sung Kim, et al., "Three-Stage Word-Spacing System for Continuous Syllable Sentence in Korea," Journal of KIISE B, Vol. 25, No. 12, pp. 1838-1844, 1998. (in Korean)
3 Do-Gil Lee, et al., "Two Statistical Models for Automatic Word Spacing of Korean Sentences," Journal of KIISE : Software and Applications, Vol. 30, No. 4, pp. 358-371, 2003. (in Korean)   과학기술학회마을
4 Harksoo Kim, "A Reliable and Simple Patternmatching Method for Implementing an Automatic Word Spacing System in Low Performance Devices," Journal of KIISE : Software and Applications, Vol. 39, No. 10, pp. 818-823, 2012. (in Korean)
5 Kwangseob Shim, "Automatic Word Spacing based on Conditional Random Fields," The Korean Journal of Cognitive Science, Vol. 22, No. 2, pp. 217-233, 2011. (in Korean)   과학기술학회마을   DOI
6 Seong-Bae Park, et al., "Self-Organizing n-gram Model for Automatic Word Spacing," Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 633-640, 2006.
7 Jae Sung Lee, "Word Spacing Consistency Check using Syllable and Morpheme Information," Journal of the Korea Contents Association, Vol. 10, No. 5, pp. 10-19, 2010. (in Korean)   과학기술학회마을   DOI
8 Seung-Shik Kang, "A Decomposition Algorithm of Korean Compound Nouns," Journal of KIISE B, Vol. 25, No. 1, pp. 172-182, 1998. (in Korean)
9 Kwangseob Shim and Jaehyung Yang, "MACH : A Supersonic Korean Morphological Analyzer," Proc. of the 19th International Conference on Computational Linguistics, pp. 939-945, 2002.
10 Kwangseob Shim, "Syllable-based POS Tagging without Korean Morphological Analysis," The Korean Journal of Cognitive Science, Vol. 22, No. 3, pp. 327-345, 2011. (in Korean)   과학기술학회마을   DOI
11 The National Institute of the Korean Language, 21st Century Sejong Project Final Result, 2011.12 Revised Edition, 2011. (in Korean)
12 Changki Lee, "Joint Models for Korean Word Spacing and POS Tagging using Structural SVM," Journal of KIISE : Software and Applications, Vol. 40, No. 12, pp. 826-832, 2013. (in Korean)   과학기술학회마을