Browse > Article
http://dx.doi.org/10.3745/KTSDE.2016.5.11.535

Coreference Resolution for Korean Using Random Forests  

Jeong, Seok-Won (강원대학교 컴퓨터정보통신공학전공)
Choi, MaengSik (강원대학교 컴퓨터정보통신공학전공)
Kim, HarkSoo (강원대학교 컴퓨터정보통신공학전공)
Publication Information
KIPS Transactions on Software and Data Engineering / v.5, no.11, 2016 , pp. 535-540 More about this Journal
Abstract
Coreference resolution is to identify mentions in documents and is to group co-referred mentions in the documents. It is an essential step for natural language processing applications such as information extraction, event tracking, and question-answering. Recently, various coreference resolution models based on ML (machine learning) have been proposed, As well-known, these ML-based models need large training data that are manually annotated with coreferred mention tags. Unfortunately, we cannot find usable open data for learning ML-based models in Korean. Therefore, we propose an efficient coreference resolution model that needs less training data than other ML-based models. The proposed model identifies co-referred mentions using random forests based on sieve-guided features. In the experiments with baseball news articles, the proposed model showed a better CoNLL F1-score of 0.6678 than other ML-based models.
Keywords
Coreference Resolution; Random Forest; Sieve-Guided Features;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Park, Cheon-Eum, Kyoung-Ho Choi, and Changki Lee, "Korean Coreference Resolution using the Multi-pass Sieve," Journal of KIISE, Vol.41, No.11, pp.992-1005, 2014.   DOI
2 Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns," Proceedings of the 25th Annual Meeting on Association for Computational Linguistics. 1987.
3 Strube Michael, "Never look back: An alternative to centering," Proceedings of the 17th International Conference on Computational Linguistics-Volume 2, Association for Computational Linguistics, 1998.
4 Ellen F. Prince, "Toward a taxonomy of given-new information," Radical Pragmatics, 1981.
5 Breiman Leo, "Random Forests," Machine Learning, Vol.45, No.1, pp.5-32, 2001.   DOI
6 Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra, "A maximum entropy approach to natural language processing," Computational Linguistics, Vol.22, No.1, pp.39-71, 1996.
7 J. A. K. Suykens and J. Vandewalle, "Least squares support vector machine classifiers," Neural Processing Letters, Vol.9, No.3, pp.293-300, 1999.   DOI
8 Lee, Heeyoung, et al., "Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task," Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, 2011.
9 M. Vilain, J. Burger, J. Aberdeen, D. Connolly, and L. Hirschman, "A model-theoretic coreference scoring scheme," Proceedings of the 6th Conference on Message Understanding, Association for Computational Linguistics, pp.45-52, 1995.
10 A. Bagga and B. Baldwin, "Algorithms for scoring coreference chains," The First International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, Vol.1, pp.563-566, 1998.
11 X. Luo, "On coreference resolution performance metrics," Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 25-32, 2005.
12 S. Pradhan, L. Ramshaw, M. Marcus, M. Palmer, R. Weischedel, and N. Xue, "Conll-2011 shared task: Modeling unrestricted coreference in ontonotes," Proc. of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics, pp. 1-27, 2011.
13 Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Explorations, Vol.11, Iss.1, 2009.
14 C. C. Chang and C. J. Lin, "LIBSVM: a library for support vector machines," Proc. of ACM Transactions on Intelligent Systems and Technology (TIST), Vol.2, Iss.3, Apr., 2011, Article No.27, 2011.