Browse > Article
http://dx.doi.org/10.3745/KTSDE.2014.3.10.407

Korean Homograph Tagging Model based on Sub-Word Conditional Probability  

Shin, Joon Choul (울산대학교 지능형컴퓨터연구실)
Ock, Cheol Young (울산대학교 전기공학부 IT융합전공)
Publication Information
KIPS Transactions on Software and Data Engineering / v.3, no.10, 2014 , pp. 407-420 More about this Journal
Abstract
In general, the Korean morpheme analysis procedure is divided into two steps. In the first step as an ambiguity generation step, an Eojeol is analyzed into many morpheme sequences as candidates. In the second step, one appropriate candidate is chosen by using contextual information. Hidden Markov Model(HMM) is typically applied in the second step. This paper proposes Sub-word Conditional Probability(SCP) model as an alternate algorithm. SCP uses sub-word information of adjacent eojeol first. If it failed, then SCP use morpheme information restrictively. In the accuracy and speed comparative test, HMM's accuracy is 96.49% and SCP's accuracy is just 0.07% lower. But SCP reduced processing time 53%.
Keywords
Korean Morphological Analyzer; HMM; Homograph; Tagging;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 David Yarowsky, "Word-sense disambiguation using statistical models of Roget's categories trained on large corpora", International Conference On Computational Linguistics(Proceedings of the 14th conference on Computational linguistics-Vol.2), pp.454-460, 1992.
2 Jin-dong Kim, Heui-Seok Lim, and Hae-Chang Rim, "Twoply HMM: A Part-of-Speech Tagging Model based on Morpheme-Unit considering the Characteristics of Korean", Journal of KIISE, Vol.24, No.12, pp.1502-1512, Dec., 1997.
3 Hee-Geun Park, Y. M. Ahn, and Y. H. Seo, "Korean Part-of-Speech Tagging System Using Resolution Rules for Individual Ambiguous Word(in Korean)", Journal of KIISE: Computing Practices and Letters, Vol.13, No.6, pp.427-431, 2007.   과학기술학회마을
4 Scott M. Thede, Mary P. Harper, "A Second-Order Hidden Markov Model for Part-of-Speech Tagging", In Proceedings of the 37th of ACL, pp.175-182, 1999.
5 Eric Brill, "Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging", Computational Linguistics, Vol.21, No.4, pp.543-565. 1995.
6 Dan Roth, Dmitry Zelenko, "Part of speech tagging using a network of linear separators", Proceedings of COLINGACL 98, pp.1136-1142, 1998.
7 Soojong Lim, Youngja Park, and Mansuk Song, "Word Sense Disambiguation of Korean Verbs Using Weight Information from Context", In Proceedings of the 10th Conference on Hangul and Korean Language Information Processing, pp.425-429, Oct., 1998.
8 Jun-Su Kim, H. S. Choe, and C. Y. Ock, "A Korean Homonym Disambiguation Model Based on Statistics Using Weights(in Korean)", Journal of KIISE: Software and Applications, Vol.30, No.11, 2003.
9 Wang Woo Lee, "Word Sense Disambiguation System Using Lexical Co-occurrencing Set and Thesaurus(in Korean)", Master Thesis, Ulsan university, 2003.
10 Yong-Gu Lee, Y. M. Chung, "An Experimental Study on an Effective Word Sense Disambiguation Model Based on Automatic Sense Tagging Using Dictionary Information (in Korean)", Journal of the Korean Society for Information Management, Vol.24, No.1, 2005.
11 Jeong Heo, H. C. Seo, and M. G. Jang, "Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary(in Korean)", Journal of KIISE: Software and Applications, Vol.33, No.12, 2003.
12 Dong Myung Kim, "Simultaneous Korean POS and Homonym Tagging System using HMM(in Korean)", Masters Thesis, Ulsan University, 2009.
13 Joon-Choul Shin, Cheol-Young Ock, "A Stage Transition Model for Korean Part-of-Speech and Homograph Tagging", Journal of KIISE: Software and Applications, Vol.39, No.11, pp.889-901, 2012.   과학기술학회마을
14 Minho Kim, H. C. Kwon, "Word Sense Disambiguation using Semantic Relations in Korean WordNet(in Korean)", Journal of KIISE: Software and Applications, Vol.38, No.10, pp.503-577, 2011.   과학기술학회마을
15 Joon-Choul Shin, C. Y. Ock, "A Korean Morphological Analyzer using a Pre-analyzed Partial Word-phrase Dictionary(in Korean)", Journal of KIISE, Vol.39, No.5, 2012.
16 Young-Jun Base, Cheol-Young Ock, "Semantic Analysis of Korean Compound Noun using Lexical Semantic Network(U-WIN)", Ph. D. Thesis, Ulsan University, 2013.
17 Joon-Choul Shin, Cheol-Young Ock, "Comparison between Markov Model and Hidden Markov Model for Korean Part-of-Speech and Homograph Tagging", In Proceddings of the 25th Conference of Hangul and Korean Information Processing, pp.152-155, Oct., 2013.
18 Ho Suk Lee, "A Survey of conditional Random Fields and Applications", In Proceddings of Fall Conference of KIISE, Vol.36, No.2, 2009.
19 Seung-Hoon Na, Chang-Hyun Kim, and Young-Kil Kim, "Semi-CRF or Linear-chain CRF? A comparative Study of Joint Models for Korean Morphological Analysis and POS Tagging", In Proceddings of the 25th Conference of Hangul and Korean Information Processing, pp.9-12, 2013.