Heuristic-based Korean Coreference Resolution for Information Extraction

  • Euisok Chung (Human Information Processing Dept., Electronics and Telecommunications Research Institute, 161, Kajong-Dong, Yusong-Gu, Daejon, 305-350, KOREA) ;
  • Soojong Lim (Human Information Processing Dept., Electronics and Telecommunications Research Institute, 161, Kajong-Dong, Yusong-Gu, Daejon, 305-350, KOREA) ;
  • Yun, Bo-Hyun (Human Information Processing Dept., Electronics and Telecommunications Research Institute, 161, Kajong-Dong, Yusong-Gu, Daejon, 305-350, KOREA)
  • 발행 : 2002.02.01

초록

The information extraction is to delimit in advance, as part of the specification of the task, the semantic range of the output and to filter information from large volumes of texts. The most representative word of the document is composed of named entities and pronouns. Therefore, it is important to resolve coreference in order to extract the meaningful information in information extraction. Coreference resolution is to find name entities co-referencing real-world entities in the documents. Results of coreference resolution are used for name entity detection and template generation. This paper presents the heuristic-based approach for coreference resolution in Korean. We constructed the heuristics expanded gradually by using the corpus and derived the salience factors of antecedents as the importance measure in Korean. Our approach consists of antecedents selection and antecedents weighting. We used three kinds of salience factors that are used to weight each antecedent of the anaphor. The experiment result shows 80% precision.

키워드