DOI QR코드

DOI QR Code

Conflict Resolution of Patterns for Generating Linked Data From Tables

테이블로부터 링크드 데이터 생성을 위한 패턴 충돌 해소

  • Received : 2014.01.28
  • Accepted : 2014.03.28
  • Published : 2014.06.25

Abstract

Recently, many researchers have paid attention to the study on generation of new linked data from tables by using linked open data (e.g. RDF, OWL). This paper proposes a new method for such generation of linked data. A pattern-based method intrinsically has a conflict problem among patterns. For instance, several patterns, mapping a single header of a table into different properties of linked data, conflict with each others. Existing studies have sacrificed precision by applying a statistically dominant pattern or have ignored conflicting patterns to increase precision. The proposed method finds appropriate patterns for all headers in a given table by connecting patterns applied to the headers. Experiments using DBPedia and Wikipedia showed results that conflicts of patterns are effectively resolved by the proposed method.

최근 링크드 오픈 데이터(예, RDF, OWL)를 이용해 대량의 테이블로부터 새로운 링크드 데이터를 생성하기 위한 연구가 주목을 받고 있다. 본 논문은 이러한 링크드 데이터 생성을 위해 패턴을 이용한 방법을 제안한다. 패턴을 이용한 방법은 근본적으로 패턴들 간의 충돌 문제를 안고 있다. 예를 들어, 어떤 테이블 헤더(header)를 서로 다른 링크드 데이터 속성들로 맵핑하는 패턴들은 서로 충돌한다. 기존의 연구들은 통계적으로 우세한 패턴을 적용하여 정확도의 감소를 감수하거나 정확도를 높이기 위해 충돌하는 패턴들을 무시해 왔다. 제안하는 방법은 주어진 테이블에 적용되는 패턴들을 연계함으로써 모든 헤더들에 대한 적합한 패턴들을 찾는다. DBPedia와 위키피디아의 테이블을 이용한 실험에서 제안한 방법이 패턴 충돌을 효과적으로 해소하는 결과를 보였다.

Keywords

References

  1. Cafarella, M.J., Halevy, A.Y.,Wang, Z.D.,Wu, E., and Zhang, Y., "Webtables: exploring the power of tables on the web," PVLDB, vol. 1, no. 1, pp. 538-549, 2008.
  2. Yoon, S.-Y., "A Study on National Linking System Implementation based on Linked Data for Public Data," KISSE, vol. 30, no. 1, pp. 259-284, 2013. https://doi.org/10.3743/KOSIM.2013.30.1.259
  3. Limaye, G., Sarawagi, S., and Chakrabarti, S., "Annotating and searching web tables using entities, types and relationships." Proceedings of VLDB, pp. 1338-1347, 2010.
  4. Carlson, A., Betteridge, J., Wang, R. C., Hruschka Jr, E. R., and Mitchell, T. M.. "Coupled semi-supervised learning for information extraction," Proceedings of WSDM, pp. 101-110, 2010.
  5. Mulwad, V., Finin, T., and Joshi, A.. "Semantic Message Passing for Generating Linked Data from Tables." Proceedings of ISWC, pp. 363-378, 2013.
  6. Wang, R. C., and Cohen, W. W., "Character-level analysis of semi-structured documents for set expansion." Proceedings of EMNLP, pp. 1503-1512. 2009.
  7. Kang, S.-J., "English-Korean Cross-lingual Link Discovery Using Link Probability and Named Entity Recognition," KIIS vol. 23, no. 3, pp. 191-195, 2013. https://doi.org/10.5391/JKIIS.2013.23.3.191
  8. Kang, S.-J. and Kang I.-S., "Generalization of Ontology Instances Based on WordNet and Google," KIIS vol. 19, no. 3, pp. 363-370, 2009. https://doi.org/10.5391/JKIIS.2009.19.3.363
  9. Chang, M.-S., "A Study on Focused Crawling of Web Document for Building of Ontology Instances," KIIS vol. 19, no. 3, pp. 363-378, 2008.
  10. Hurst, Matthew. "Towards a theory of tables." IJDAR, vol 8, no 2, pp. 123-131, 2006. https://doi.org/10.1007/s10032-006-0016-y
  11. Wang, J., Wang, H., Wang, Z., & Zhu, K. Q. "Understanding tables on the web." Proceedings of Conceptual Modeling, pp. 141-155, 2012.
  12. Pantel, P., and Pennacchiotti, M., "Espresso: Leveraging generic patterns for automatically harvesting semantic relations." Proceedings of ACL, pp. 113-120, 2006.
  13. Nakashole, N., Theobald, M., & Weikum, G., "Scalable knowledge harvesting with high precision and high recall." Proceedings of WSDM, pp. 227-236, 2011.