Judgment about the Usefulness of Automatically Extracted Temporal Information from News Articles for Event Detection and Tracking

사건 탐지 및 추적을 위해 신문기사에서 자동 추출된 시간정보의 유용성 판단

  • 김평 (한국과학기술정보연구원 NTIS 사업단) ;
  • 맹성현 (한국정보통신대학교 공학부)
  • Published : 2006.06.01

Abstract

Temporal information plays an important role in natural language processing (NLP) applications such as information extraction, discourse analysis, automatic summarization, and question-answering. In the topic detection and tracking (TDT) area, the temporal information often used is the publication date of a message, which is readily available but limited in its usefulness. We developed a relatively simple NLP method of extracting temporal information from Korean news articles, with the goal of improving performance of TDT tasks. To extract temporal information, we make use of finite state automata and a lexicon containing time-revealing vocabulary. Extracted information is converted into a canonicalized representation of a time point or a time duration. We first evaluated the extraction and canonicalization methods for their accuracy and investigated on the extent to which temporal information extracted as such can help TDT tasks. The experimental results show that time information extracted from text indeed helps improve both precision and recall significantly.

시간정보는 정보 추출, 질의응답 시스템, 자동 요약과 같은 자연언어 처리 응용분야에서 중요한 역할을 한다. 사건 탐지 및 추적 분야에서는 기사의 발행일이 기사간 유사도 계산에 많이 사용되고 있지만 그 유용성에는 한계가 있다. 본 연구에서는 사건 탐지 및 추적 시스템의 성능을 향상시키기 위해서, 한국어 신문기사를 대상으로 비교적 간단한 자연언어 처리 기술을 사용해서 시간정보를 추출하는 방법을 개발하였다. 시간표현 어구를 추출하기 위해 품사패턴과 어휘사전이 사용되었고, 추출된 시간표현 어구는 정규화 과정을 통해 특정 시각 또는 기간으로 변환되었다. 실험을 통해 시간표현 추출과정의 정확도를 측정하였고, 기사에서 자동으로 추출된 시간을 사용함으로써 사건 탐지 및 추적 시스템의 성능을 향상시킬 수 있었다.

Keywords

References

  1. J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of the ACM SIGIR conference on Research and development in information retrieval, Pages 10-18, 2001 https://doi.org/10.1145/383952.383954
  2. J. Allan, R. Papka and V. Lavrenko. On-line new event detection and tracking. In Proceedings of ACM SIGIR conference on Research and development in information retrieval. Pages 37-45. 1998 https://doi.org/10.1145/290941.290954
  3. J. Allan et al. Topic Detection and Tracking Pilot Study Final Report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Feb 1998
  4. Y. Yang, T. Pierce, and J. Carbonell. A Study on Retrospective and On-Line Event Detection. In Proceedings of ACM SIGIR Conference on Research and development in information retrieval, Pages 28-36. 1998 https://doi.org/10.1145/290941.290953
  5. Y. Yang et al. Learning Approaches for Detection and Tracking News Events. IEEE Intelligent Systems. 14(4):32-43, July/August 1999. Special Issue on Applications of Intelligent Information Retrieval https://doi.org/10.1109/5254.784083
  6. R. Papka, J. Allan, and V. Lavrenko. UMASS approaches to detection and tracking at TDT2. In Proceedings of the TDT-99 workshop. NIST. 1999
  7. T. Leek. R. Schwartz and S. Sista. Probabilistic Approaches to Topic Detection and Tracking. TOPIC DETECTION AND TRACKING, Kluwer Academic Publishers. Pages 67-83
  8. B. Sundheim, N. Chinchor, Named Entity Task Definition. Version 2.0. 31 May 95. In Proceedings of the 6th Message Understanding Conference (MUC-6). Pages 319-332. Morgan Kaufman Publishers, Inc., 1995
  9. N. Chinchor. MUC-7 Information Extraction Task Definition. Version 5.1, 23 July 1998. In Proceedings of the 7th Message Understanding Conference (MUC-7), 1998
  10. L. Ferro, I. Mani, B. Sundheim, G. Wilson. TIDES Temporal Annotation Guidelines. MITRE Technical Report Version 1.0.2, June 2001
  11. G.B. Alice, T. Meulen. Representing Time in Natural Language. MIT Press, Cambridge, Massachusetts. 1995
  12. B. Moulin. Temporal Contexts for Discourse Representation: An Extension of the Conceptual Graph Approach. Artificial Intelligence, 7: Pages 227-255. 1997 https://doi.org/10.1023/A:1008224616031
  13. Juntae Yoon, Yoonkwan Kim, Mansuk Song. Identifying Temporal Expression and its Syntactic Role Using FST and Lexical Data from Corpus. In Proceedings of Colling. 2000 https://doi.org/10.3115/992730.992784
  14. J. Allan. Introduction to Topic Detection and Tracking. TOPIC DETECTION AND TRACKING. Kluwer Academic Publishers. Pages 1-16
  15. N. Stokes, P. Hatch, J. Carthy. Lexical semantic relatedness and online new event detection. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. Pages 324-325, 2000 https://doi.org/10.1145/345508.345623
  16. F. Fukumoto, Y. Suzuki. Event Tracking based on Domain Dependency. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. Pages 57-64. 2000 https://doi.org/10.1145/345508.345548
  17. Pyung Kim, Kiyoun Sung, Sung Hyon Myaeng, Jae Cheol Ryou, Extracting Temporal Information from Korean News Articles for Event Detection and Tracking. In Proceedings of the 20th International Conference on Computer Processing of Oriental Languages. Pages 392-401, 2003
  18. Lee, S. H., Myaeng, S. H. Kim, J. Y., Jang, D. H., Seo, J.H., Kim, H. Packaging Hanguel Test Collection as an Evaluation System of Information Retrieval. In Proceedings of the 5th Korea Science & Technology Infrastructure Workshop (in Korean). 2000