Browse > Article
http://dx.doi.org/10.7472/jksii.2015.16.4.39

A Proposal of Methods for Extracting Temporal Information of History-related Web Document based on Historical Objects Using Machine Learning Techniques  

Lee, Jun (Dept. of Telecomm. and Info. Engineering, Korea Aerospace University)
KWON, YongJin (Dept. of Telecomm. and Info. Engineering, Korea Aerospace University)
Publication Information
Journal of Internet Computing and Services / v.16, no.4, 2015 , pp. 39-50 More about this Journal
Abstract
In information retrieval process through search engine, some users want to retrieve several documents that are corresponding with specific time period situation. For example, if user wants to search a document that contains the situation before 'Japanese invasions of Korea era', he may use the keyword 'Japanese invasions of Korea' by using searching query. Then, search engine gives all of documents about 'Japanese invasions of Korea' disregarding time period in order. It makes user to do an additional work. In addition, a large percentage of cases which is related to historical documents have different time period between generation date of a document and record time of contents. If time period in document contents can be extracted, it may facilitate effective information for retrieval and various applications. Consequently, we pursue a research extracting time period of Joseon era's historical documents by using historic literature for Joseon era in order to deduct the time period corresponding with document content in this paper. We define historical objects based on historic literature that was collected from web and confirm a possibility of extracting time period of web document by machine learning techniques. In addition to the machine learning techniques, we propose and apply the similarity filtering based on the comparison between the historical objects. Finally, we'll evaluate the result of temporal indexing accuracy and improvement.
Keywords
Temporal information; Termporal Extraction; Machine learning; Similarity filtering; Historical information; Historical Object;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B. Shaparenko, R. Caruana, J. Gehrke, and T. Joachims. Identifying Temporal Patterns and Key Players in Document Collections. In Proceedings of the IEEE ICDM Workshop on Temporal Data Mining: Algorithms, Theory and Applications (TDM '05), pages 165-174, 2005. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.64.8382
2 Toyoda, M., & Kitsuregawa, M. What's Really New on the Web? Identifying New Pages from a Series of Unstable Web Snapshots. In WWW2006: Proceedings of the 15th International World Wide Web Conference (pp. 233-241). Edinburgh, Scotland. May 23-26: ACM Press. http://dx.doi.org/10.1145/1135777.1135815   DOI
3 J. Strotgen, M. Gertz, and P. Popov. Extraction and Exploration of Spatio-temporal Information in Documents. In Proceedings of the 6th Workshop on Geographic Information Retrieval (GIR '10), pages 1-8, 2010. http://dx.doi.org/10.1145/1722080.1722101   DOI
4 Seung-Shik Kang, Byoung-Tak Zhang, A General Morphological Analyzer and Spelling Checker for the Korean Language Using Syllable Characteristics, Journal of KIISE (B), Vol.23. No.5, 1996 http://www.dbpia.co.kr/Journal/ArticleDetail/444487
5 K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of cotraining.In Proceedings of the Workshop on Information and Knowledge Management, 2000. http://dx.doi.org/10.1145/354756.354805   DOI
6 J. F. Allen. Maintaining Knowledge about Temporal Intervals. In Communications of the ACM, 26(11):832-843, 1983. http://dx.doi.org/10.1145/182.358434   DOI
7 O. Alonso, M. Gertz, and R. Baeza-Yates. On the value of temporal information in information retrieval. SIGIR Forum, 41(2):35-41, 2007. http://dx.doi.org/10.1145/1328964.1328968   DOI
8 J. Pustejovsky, J. M. Castano, et al. TimeML: Robust Specification of Event and Temporal Expressions in Text. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering, pages 28-34, 2008. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.8972
9 O.Alonso, J. Strotgen, R. Baeza-Yates, and M. Gertz. Temporal information retrieval: Challenges and opportunities. In International Temporal Web Analytics Workshop, pages 1-8, 2011 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.363.4483
10 O. Kolomiyets and M.-F. Moens. Meeting TempEval-2: Shallow Approach for Temporal Tagger. In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW '09), pages 52-57, 2009. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.164.9479
11 O. Alonso, M. Gertz, and R. Baeza-Yates. Clustering and Exploring Search Results Using Timeline Constructions. In Proceedings of the 18th ACM International Conference on Information and Knowledge Management (CIKM '09), pages 97-106,2009. http://dx.doi.org/10.1145/1645953.1645968
12 J. Makkonen and H. Ahonen-Myka. Utilizing Temporal Information in Topic Detection and Tracking. In Proceedings of 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL '03), pages 393-404, 2003. http://dx.doi.org/10.1007/978-3-540-45175-4_36   DOI
13 O. Alonso, R. Baeza-Yates, and M. Gertz. Effectiveness of Temporal Snippets. In Proceedings of the Workshop on Web Search Result Summarization and Presentation (WSSP 09), pages 1-4, 2009. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.5485
14 A. Qamra, B. Tseng, and E. Chang. Mining Blog Stories Using Community-based and Temporal Clustering. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM '06), pages 58-67, 2006. http://dx.doi.org/10.1145/1183614.1183627   DOI
15 R. Swan and J. Allan. TimeMine: Visualizing Automatically Constructed Timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '00), page 393, 2000. http://dx.doi.org/10.1145/345508.345674   DOI
16 A. Jatowt, K. Kanazawa, S. Oyama, and K. Tanaka. Supporting Analysis of Future-related Information in News Archives and the Web. In Proceedings of the 9th Joint Conference on Digital Libraries (JCDL '09), 2009. http://dx.doi.org/10.1145/1555400.1555420   DOI