Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2012.19B.3.195

Morpheme Recovery Based on Naïve Bayes Model  

Kim, Jae-Hoon (한국해양대학교 IT공학부)
Jeon, Kil-Ho (한국해양대학교 컴퓨터공학과)
Abstract
In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Na$\ddot{i}$ve Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the $F_1$-score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.
Keywords
POS Tagging; Naive Bayes Model; Morpheme Recovery; Morphological Recovery;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
2 X.-H. Phan, CRFTagger: CRF English POS Tagger, http://crftagger.sourceforge.net/, 2006.
3 I. G. Councill, C. L. Giles, and M.-Y. Kan, "ParsCit: An open-source CRF reference string parsing package", Proceedings of the Language Resources and Evaluation Conference (LREC 08), pp.661-667, 2008.
4 L. Ramshaw and M. Marcus, "Text chunking using transformation-based learning", Proceedings of the 3rd Workshop on Very Large Corpora (ACL 1995), pp.82-94, 1995.
5 전길호, 기계학습을 이용한 음절기반 품사 부착, 한국해양대학교 대학원, 컴퓨터공학과, 석사학위 논문, 2012.
6 김재훈 외, 구문구조 부착 말뭉치 구축, 모비코앤시스메타(주), 최종보고서, 2005.
7 http://crfpp.googlecode.com/svn/trunk/doc/index.html
8 C. D. Manning, P. Raghavan, and H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2007.
9 A. R. Martinez, "Part-of-Speech tagging", WIREs Computational Statistics, Vol.4, pp.107-113, 2012.   DOI
10 P. J. Antony and K. P. Soman, "Parts Of Speech Tagging for Indian Languages: A Literature Survey", International Journal of Computer Applications, Vol.34, No.8. pp.22-29, 2011.
11 심광섭, "형태소 분석기 사용을 배제한 음절 단위의 한국어 품사 태깅", 인지과학, 제 22권 제 3호, pp.327-345, 2011.   과학기술학회마을
12 김재훈, "가중치망 모델을 이용한 한국어 품사 태깅", 한국정보과학회논문지, 제 25권 제 6호, pp.951-959, 1998.
13 임희석, 김진동, 임해창, "통계 정보와 언어 지식의 보완적 특성을 고려한 혼합형 품사 태깅", 정보과학회논문지B, 제 25권 제11호, pp.1705-1715, 1998.   과학기술학회마을
14 김재훈, 이공주, "사례기반 학습을 이용한 음절기반 한국어 단어 분리 및 범주 결정", 정보처리학회논문지B, 제 10권 제 1호, pp.47-56, 2003.   과학기술학회마을   DOI