[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KTSDE.2016.5.11.535

Coreference Resolution for Korean Using Random Forests

Jeong, Seok-Won (강원대학교 컴퓨터정보통신공학전공)
Choi, MaengSik (강원대학교 컴퓨터정보통신공학전공)
Kim, HarkSoo (강원대학교 컴퓨터정보통신공학전공)

Publication Information

KIPS Transactions on Software and Data Engineering / v.5, no.11, 2016 , pp. 535-540 More about this Journal

Abstract

Coreference resolution is to identify mentions in documents and is to group co-referred mentions in the documents. It is an essential step for natural language processing applications such as information extraction, event tracking, and question-answering. Recently, various coreference resolution models based on ML (machine learning) have been proposed, As well-known, these ML-based models need large training data that are manually annotated with coreferred mention tags. Unfortunately, we cannot find usable open data for learning ML-based models in Korean. Therefore, we propose an efficient coreference resolution model that needs less training data than other ML-based models. The proposed model identifies co-referred mentions using random forests based on sieve-guided features. In the experiments with baseball news articles, the proposed model showed a better CoNLL F1-score of 0.6678 than other ML-based models.

Keywords

Coreference Resolution; Random Forest; Sieve-Guided Features;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Park, Cheon-Eum, Kyoung-Ho Choi, and Changki Lee, "Korean Coreference Resolution using the Multi-pass Sieve," Journal of KIISE, Vol.41, No.11, pp.992-1005, 2014. DOI
2	Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns," Proceedings of the 25th Annual Meeting on Association for Computational Linguistics. 1987.
3	Strube Michael, "Never look back: An alternative to centering," Proceedings of the 17th International Conference on Computational Linguistics-Volume 2, Association for Computational Linguistics, 1998.
4	Ellen F. Prince, "Toward a taxonomy of given-new information," Radical Pragmatics, 1981.
5	Breiman Leo, "Random Forests," Machine Learning, Vol.45, No.1, pp.5-32, 2001. DOI
6	Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, Vol.22, No.1, pp.39-71, 1996.
7	J. A. K. Suykens and J. Vandewalle, "Least squares support vector machine classifiers," Neural Processing Letters, Vol.9, No.3, pp.293-300, 1999. DOI
8	Lee, Heeyoung, et al., "Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task," Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, 2011.
9	M. Vilain, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman, "A model-theoretic coreference scoring scheme," Proceedings of the 6th Conference on Message Understanding, Association for Computational Linguistics, pp.45-52, 1995.
10	A. Bagga and B. Baldwin, "Algorithms for scoring coreference chains," The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Vol.1, pp.563-566, 1998.
11	X. Luo, "On coreference resolution performance metrics," Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 25-32, 2005.
12	S. Pradhan, L. Ramshaw, M. Marcus, M. Palmer, R. Weischedel, and N. Xue, "Conll-2011 shared task: Modeling unrestricted coreference in ontonotes," Proc. of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, pp. 1-27, 2011.
13	Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Explorations, Vol.11, Iss.1, 2009.
14	C. C. Chang and C. J. Lin, "LIBSVM: a library for support vector machines," Proc. of ACM Transactions on Intelligent Systems and Technology (TIST), Vol.2, Iss.3, Apr., 2011, Article No.27, 2011.

KSCI

Coreference Resolution for Korean Using Random Forests 랜덤 포레스트를 이용한 한국어 상호참조 해결

Coreference Resolution for Korean Using Random Forests