Browse > Article
http://dx.doi.org/10.9708/jksci/2012.17.9.029

Named Entity and Event Annotation Tool for Cultural Heritage Information Corpus Construction  

Choi, Ji-Ye (Dept. of Digital Media, SangMyung University)
Kim, Myung-Keun (Dept. of Digital Media, SangMyung University)
Park, So-Young (Dept. of Game & Mobile Contents, SangMyung University)
Abstract
In this paper, we propose a named entity and event annotation tool for cultural heritage information corpus construction. Focusing on time, location, person, and event suitable for cultural heritage information management, the annotator writes the named entities and events with the proposed tool. In order to easily annotate the named entities and the events, the proposed tool automatically annotates the location information such as the line number or the word number, and shows the corresponding string, formatted as both bold and italic, in the raw text. For the purpose of reducing the costs of the manual annotation, the proposed tool utilizes the patterns to automatically recognize the named entities. Considering the very little training corpus, the proposed tool extracts simple rule patterns. To avoid error propagation, the proposed patterns are extracted from the raw text without any additional process. Experimental results show that the proposed tool reduces more than half of the manual annotation costs.
Keywords
Named Entity Annotated Corpus; Event Annotated Corpus; Corpus Construction; Named Entity Recognition;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ozlem Uzuner, Brett R South, Shuying Shen, Scott L DuVall, "2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text", J Am Med Inform Assoc, Vol.18, No.5, pp.552-556, Jun. 2011.   DOI   ScienceOn
2 Hae-Chang Rim, Young-Sook Hwang, Kyung-Mi Park, "Development of Bio Text Mining System", Communications of KIISE, Vol.21, No.6, pp.60-68, Jul. 2003.
3 Masaki Noguchi, Kenta Miyoshi, Takenobu Tokunaga, Ryu Iida, Mamoru Komachi, Kentaro Inui, "Multiple Purpose Annotation using SLAT -Segment and Link-based Annotation Tool-", Proceedings of the 2nd Linguistic Annotation Workshop, pp.61-64, May. 2008.
4 Mitchell P. Marcus, B. Santorini, and M. A. Marchinkiewicz, "Building a large annotated corpus of English : the Penn TreeBank", Computational Linguistics, Vol.19, No.2, pp.313-330, Jun. 1993.
5 Hye-Kyum Kim, Kyung-Mi Park, Yeo-Chan Yoon, Hae-Chang Rim, So-Young Park, "Tree Tagging Tool using Two-phrase Parsing", Proceedings of the 17th Annual Conference on Human and Cognitive Language Technology, pp.151-158, Oct. 2005.
6 Piek Vossen, Attila Gorog, Fons Laan, Maarten van Gompel, Ruben Izquierdo, Antal van den Bosch, "DutchSemCor: Building a semantically annotated corpus for Dutch", Proceedings of eLex, pp.286-296, Nov. 2011.
7 Joo-Young Lee, Young-In Song, Hae-Chang Rim, "Title Named Entity Recognition based on Automatically Constructed Context Patterns and Entity Dictionary", Proceedings of the 17th Annual Conference on Human and Cognitive Language Technology, Vol.16, No.1, pp.111-117, Oct. 2004.
8 Chang-Ki Lee, Myung-Gil Jang, "Named Entity Recognition with Structural SVMs and Pegasos algorithm", Cognitive Science, Vol.21, No.4, pp.655-667, Dec. 2010.
9 Bang-Hyeon Na, "A Design of Cultural and Historical Contents Model for Web Services", Proceedings of the Association of Korean Cultural and Historical Geographers Symposium, pp.27-35, Nov. 2010.
10 Dong-hwan Yoo, "The current situation and the task of developing the national cultural heritage contents", Korean Studies, Vol.12, pp.5-49, Jun. 2008.
11 So-Young Cha, Jung-Wha Kim, "Constructing a Foundation for Semantic Structure of Korean Heritage Information : A Study on Creating a Substructure of Korean Heritage Portal by Implementing CIDOC CRM", Proceedings of the 17th Conference on the Korean Society for Information Management, pp.177-184, Aug. 2010.
12 The Academy of Korean Studies, "Encyclopedia of Korean Culture", Dec. 1991.
13 Seong-Won Kim, Dong-Yul Ra, "Korean Named Entity Recognition Using Two-level Maximum Entropy Model", Proceedings of KIISE Symposium, Vol.2, No.1, pp.81-86, Jun. 2008.
14 Hee-Sun Chung, Hee-Sun Kim, "Database and Corpus Construction methodology for the Content of Religious architectural heritage Information", Proceedings of a Seminar Held by the Convergence Study Team of SangMyung University, pp.43-60, Jun. 2012.
15 The Institute of Seoul Studies, "Modern Cultural Heritage Resource and Cataloging Project Report", Jun. 2004.
16 Tomoko Ohta, Jin-Dong Kim, Sampo Pyysalo, Yue Wang, Jun'ichi Tsujii, "Incorporating GENETAG-style annotation to GENIA corpus", Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing(BioNLP), pp.106-107, Jun. 2009.