Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2011.18B.1.045

Part-Of-Speech Tagging and the Recognition of the Korean Unknown-words Based on Machine Learning  

Choi, Maeng-Sik (강원대학교 컴퓨터정보통신공학과)
Kim, Hark-Soo (강원대학교 컴퓨터정보통신공학)
Abstract
Unknown morpheme errors in Korean morphological analysis are divided into two types: The one is the errors that a morphological analyzer entirely fails to return any morpheme sequences, and the other is the errors that a morphological analyzer returns incorrect combinations of known morphemes. Most previous unknown morpheme estimation techniques have been focused on only the former errors. This paper proposes a unknown morpheme estimation method which can handle both of the unknown morpheme errors. The proposed method detects Eojeols (Korean spacing units) that may include unknown morpheme errors using SVM (Support Vector Machine). Then, using CRFs (Conditional Random Fields), it segments morphemes from the detected Eojeols and annotates the segmented morphemes with new POS tags. In the experiments, the proposed method outperformed the conventional method based on the longest matching of functional words. Based on the experimental results, we knew that the second type errors should be dealt with in order to increase the performance of Korean morphological analysis.
Keywords
Unknown Morpheme Estimation; Unknown Morpheme Recognition; Unknown Morpheme Tagging;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Riloff, E., Jones, R., "Learning dictionaries for information extraction by multi-level bootstrapping," In Proceedings of the 16th National Conference on Artificial Intelligence, pp.474-479, 1999.
2 http://www.sejong.or.kr (2010. 7. 5 방문).
3 김형철, 서형원, 김재훈, "접사 정보를 이용한 영어 미등록어의 품사부착 성능개선", 2009년도 한국마린엔지니어링학회 공동학술대회 논문집, pp.375-376, 2009.
4 강승식, "음절 정보와 복수어 단위 정보를 이용한 한국어 형태소 분석", 서울대학교 컴퓨터공학과 박사학위 논문, 1993.
5 박봉래, 황영숙, 임해창, "유사 어절의 TAIL 패턴 분석에 기반한 미등록 명사 추정", 1996년도 한국정보과학회 봄 학술발표 논문집 제23권 제1호, pp.907-910, 1996.
6 김선호, 윤준태, 송만석, "한국어 문서 처리를 위한 동적 생성 로컬 사전 기반 미등록어 분석", 정보과학회논문지:소프트웨어 및 응용 제29권 제6호, pp.407-416, 2002.   과학기술학회마을
7 Chang, C.-C. and C.-J. Lin., "LIBSVM: a library for support vector machines," Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. 2001.
8 McCallum, Andrew Kachites., "MALLET: A Machine Learning for Language Toolkit," http://mallet.cs.umass.edu. 2002.