DOI QR코드

DOI QR Code

An Efficient Damage Information Extraction from Government Disaster Reports

  • Shin, Sungho (Decision Support Technology Lab., Korea Institute of Science and Technology Information) ;
  • Hong, Seungkyun (Department of Big Data Analysis, Korea University of Science and Technology) ;
  • Song, Sa-Kwang (Decision Support Technology Lab., Korea Institute of Science and Technology Information)
  • 투고 : 2017.06.23
  • 심사 : 2017.10.17
  • 발행 : 2017.12.31

초록

One of the purposes of Information Technology (IT) is to support human response to natural and social problems such as natural disasters and spread of disease, and to improve the quality of human life. Recent climate change has happened worldwide, natural disasters threaten the quality of life, and human safety is no longer guaranteed. IT must be able to support tasks related to disaster response, and more importantly, it should be used to predict and minimize future damage. In South Korea, the data related to the damage is checked out by each local government and then federal government aggregates it. This data is included in disaster reports that the federal government discloses by disaster case, but it is difficult to obtain raw data of the damage even for research purposes. In order to obtain data, information extraction may be applied to disaster reports. In the field of information extraction, most of the extraction targets are web documents, commercial reports, SNS text, and so on. There is little research on information extraction for government disaster reports. They are mostly text, but the structure of each sentence is very different from that of news articles and commercial reports. The features of the government disaster report should be carefully considered. In this paper, information extraction method for South Korea government reports in the word format is presented. This method is based on patterns and dictionaries and provides some additional ideas for tokenizing the damage representation of the text. The experiment result is F1 score of 80.2 on the test set. This is close to cutting-edge information extraction performance before applying the recent deep learning algorithms.

키워드

참고문헌

  1. S. Shin, S. Hong, and S. Song, "Disaster Damage Information Extraction from Government Reports," Proceedings of the 8th International Conference on Internet (ICONI), 2016.
  2. Kim et al., "The Management of Disaster Information based on Big Data and Cloud Computing," Journal of Disaster Prevention, Vol. 17, no. 2, pp. 14-33, 2015.
  3. M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky, "Semeval-2010 task 13: Tempeval-2," In Proc. of the 5th International Workshop on Semantic Evaluation (SemEval'10), pp. 57-62, 2010. http://www.aclweb.org/anthology/S10-1010
  4. J. Lee and Y. Kwon, "A Proposal of Methods for Extracting Temporal Information of History-related Web Document based on Historical Objects Using Machine Learning Techniques," Journal of Internet Computing and Services (JICS), Vol. 16, no. 4, pp. 39-50, 2015. https://doi.org/10.7472/jksii.2015.16.4.39
  5. T. Baldwin, M. Catherine, B. Han, Y.B. Kim, A. Ritter, W. Xu, "Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition," Proceedings of Workshop on Noisy User-generated Text (WNUT), 2015. https://doi.org/10.18653/v1/w15-4319
  6. Information extraction, https://en.wikipedia.org/wiki/Information_extraction
  7. S. G. Small and L. Medsker, "Review of information extraction technologies and applications," Neural Computing and Applications, Vol. 25, no. 3, pp. 533-548, 2014. https://doi.org/10.1007/s00521-013-1516-6
  8. S. Shin, H. Jung, and M. Y. Yi, "Building a Business Knowledge Base by a Supervised Learning and Rule-based Method," KSII Transactions on Internet and Information Systems, Vol. 9, no. 1, pp. 407-420, 2015. https://doi.org/10.3837/tiis.2015.01.025
  9. M. Miwa and M. Bansal, "End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures," Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1105-1116, 2016. https://doi.org/10.18653/v1/p16-1105
  10. H. Yoon, J. Kim, J. Park, and T. Chang, "Development of Electronic Documents and Management System for Transfer of Disaster Damage and Recovery Information," Journal of Society for e-Business Studies, Vol. 20, no. 2, pp. 15-26, 2015. https://doi.org/10.7838/jsebs.2015.20.2.015
  11. A. McCallum and W. Li, "Early Results for Named Entity Recognition with Conditional Random Fields, features Induction and Web-Enhanced Lexicons," Proceedings of Conference on Computational Natural Language Learning, pp. 188-191, 2003. https://doi.org/10.3115/1119176.1119206
  12. M. Collins, "Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms," Proceedings of Empirical Methods in Natural Language Processing, pp. 1-8, 2002. https://doi.org/10.3115/1118693.1118694
  13. S. Shin, C. H. Jeong, D. Seo, S. P. Choi, and H. Jung, "Improvement of the Performance in Rule-Based Knowledge Extraction by Modifying Rules," Proceedings of the 2nd International Workshop on Semantic Web-based Computer Intelligence with Big-data, 2013. http://inscite.kisti.re.kr/cfp/SWCIB2013/proceeding/swcib2013_submission_3.pdf
  14. C. N. Seon, J. H. Yoo, H. Kim, J. H. Kim, and J. Seo, "Lightweight Named Entity Extraction for Korean Short Message Service Text," KSII Transactions on Internet & Information, Vol. 5, no. 3, pp. 560-574, 2011. https://doi.org/10.3837/tiis.2011.03.006