[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/JKIIS.2008.18.4.501

Part-Of-Speech Tagging using multiple sources of statistical data

Cho, Seh-Yeong (명지대학교 컴퓨터소프트웨어학과)

Publication Information

Journal of the Korean Institute of Intelligent Systems / v.18, no.4, 2008 , pp. 501-506 More about this Journal

Abstract

Statistical POS tagging is prone to error, because of the inherent limitations of statistical data, especially single source of data. Therefore it is widely agreed that the possibility of further enhancement lies in exploiting various knowledge sources. However these data sources are bound to be inconsistent to each other. This paper shows the possibility of using maximum entropy model to Korean language POS tagging. We use as the knowledge sources n-gram data and trigger pair data. We show how perplexity measure varies when two knowledge sources are combined using maximum entropy method. The experiment used a trigram model which produced 94.9% accuracy using Hidden Markov Model, and showed increase to 95.6% when combined with trigger pair data using Maximum Entropy method. This clearly shows possibility of further enhancement when various knowledge sources are developed and combined using ME method.

Keywords

Maximum Entropy; Part of speech; N-gram; trigger pair;

Citations & Related Records

Reference

1	Ken Church and Patrich Hanks, "Word Association Norms, Mutual Information, and Lexicography," Computational Linguistics, Volume 16, number 1, pages 22-29, March 1990
2	L.E. Baum and T. Petrie, "Statistical inference for probabilitsic functions of finite state Markov chains," Ann. Math. Sat., vol.37, pp.1554-1563
3	J. Darroch and D. Ratcliff, Generalized iterative scaling for log-linear models. Ann. Math. Statistics, 43:1470-1480, 1972 DOI ScienceOn
4	박성배, 장병탁, "최대 엔트로피 모델을 이용한 텍스트 단위화," 제13회 한글 및 한국어 정보처리 학술대회 논문집, pp. 130-137, 2001
5	Adwait Ratnaparkhi, "A Maximum Entropy Model of Part-of-speech tagging," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.133-142, 1996
6	Sehyeong Cho, "Improvement of language models using dual-source backoff," Lecture Notes in Artificial Intelligence, vol.3157, pp.892-900, Springer, 2004
7	Daniel Jurafsky and James H. Martin, Speech and Language Processing, Prentice-Hall, 2000
8	Adwait Ratnaparkhi, "Maximum Entropy Models For Natural Language Ambiguity Resolution," Ph.D. thesis, University of Pennsylvania, 1998
9	E. T. Jaynes, "Information Theory and Statistical Mechanics," Physical Review 1957
10	A.Berger, S.A. Della Pietra, and V.J. Della Pietra, "A Maximum Entropy Approach to Natural Language Processing," Computation Linguistics, 22(1):39-71 1996
11	Ronald Rosenfeld, "Adaptive Statistical Language Modeling: A Maximum Entropy Approach," Ph.D. thesis, School of Computer Science Carnegie Mellon University Pittsburgh, April 19, 1994

KSCI

Part-Of-Speech Tagging using multiple sources of statistical data 이종의 통계정보를 이용한 품사 부착 기법

Part-Of-Speech Tagging using multiple sources of statistical data