DOI QR코드

DOI QR Code

Large Scale Incremental Reasoning using SWRL Rules in a Distributed Framework

분산 처리 환경에서 SWRL 규칙을 이용한 대용량 점증적 추론 방법

  • Received : 2016.08.24
  • Accepted : 2017.01.18
  • Published : 2017.04.15

Abstract

As we enter a new era of Big Data, the amount of semantic data has rapidly increased. In order to derive meaningful information from this large semantic data, studies that utilize the SWRL(Semantic Web Rule Language) are being actively conducted. SWRL rules are based on data extracted from a user's empirical knowledge. However, conventional reasoning systems developed on single machines cannot process large scale data. Similarly, multi-node based reasoning systems have performance degradation problems due to network shuffling. Therefore, this paper overcomes the limitations of existing systems and proposes more efficient distributed inference methods. It also introduces data partitioning strategies to minimize network shuffling. In addition, it describes a method for optimizing the incremental reasoning process through data selection and determining the rule order. In order to evaluate the proposed methods, the experiments were conducted using WiseKB consisting of 200 million triples with 83 user defined rules and the overall reasoning task was completed in 32.7 minutes. Also, the experiment results using LUBM bench datasets showed that our approach could perform reasoning twice as fast as MapReduce based reasoning systems.

빅데이터 시대가 도래 하면서 시맨틱 데이터의 양이 빠른 속도로 증가하고 있다. 이러한 대용량 시맨틱 데이터에서 의미 있는 암묵적 정보를 추론하기 위해서 지식 사용자의 경험적 지식을 기반으로 작성된 SWRL(Semantic Web Rule Language) 규칙들을 활용하는 많은 연구가 진행되고 있다. 그러나 기존의 단일 노드의 추론 시스템들은 대용량 데이터 처리에 한계가 있고, 다중 노드 기반의 분산 추론 시스템들은 네트워크 셔플링으로 인해 성능이 저하되는 문제점들이 존재한다. 따라서 본 논문에서는 기존 시스템들의 한계를 극복하고 보다 효율적인 분산 추론 방법을 제안한다. 또한 네트워크 셔플링을 최소화 할 수 있는 데이터 파티셔닝 전략을 소개하고, 점증적 추론에서 사용되는 추가된 새로운 데이터의 선별과 추론 규칙의 순서결정으로 추론 과정을 최적화 할 수 있는 방법에 대해 설명한다. 제안하는 방법의 성능을 측적하기 위해 약 2억 트리플로 구성된 WiseKB 온톨로지와 84개의 사용자 정의 규칙을 이용한 실험에서 32.7분이 소요되었다. 또한 LUBM 벤치 마크 데이터를 이용한 실험에서 맵-리듀스 방식에 비해 최대 2배 높은 성능을 보였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Horrocks, Ian, et al., "SWRL: A semantic web rule language combining OWL and RuleML," W3C Member submission 21 (2004): 79.
  2. Zaharia, Matei, et al., "Spark: cluster computing with working sets," Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, Vol. 10, 2010.
  3. Hogan, Aidan, Andreas Harth, and Axel Polleres. "Saor: Authoritative reasoning for the web," Asian Semantic Web Conference, Springer Berlin Heidelberg, 2008.
  4. McBride, Brian, "The resource description framework (RDF) and its vocabulary description language RDFS," Handbook on ontologies, Springer Berlin Heidelberg, 51-65, 2004.
  5. Krieger, Hans-Ulrich, "A Temporal Extension of the Hayes and ter Horst Entailment Rules for RDFS and OWL," AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning. 2011.
  6. Urbani, Jacopo, et al., "WebPIE: A web-scale parallel inference engine using MapReduce," Web Semantics: Science, Services and Agents on the World Wide Web 10 (2012): 59-75. https://doi.org/10.1016/j.websem.2011.05.004
  7. Wu, Haijiang, et al., "Scalable Horn-Like Rule Inference of Semantic Data Using MapReduce," International Conference on Knowledge Science, Engineering and Management. Springer International Publishing, 2014.
  8. Yong Uk Song, June Seok Hong, Wooju Kim, Suk Hee Youn, Sung Kyu Lee, "Development of an SWRL-based Backward Chaining Inference Engine SMART-B for the Next Generation Web," Journal of Intelligence and Information Systems, 12.2 pp. 67- 81, Jun. 2006.
  9. Schatzle, Alexander, et al., "Sempala: Interactive SPARQL query processing on hadoop," International Semantic Web Conference. Springer International Publishing, 2014.
  10. Zaharia, Matei, et al., "Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing," Proc. of the 9th USENIX conference on Networked Systems Design and Implementation, USENIX Association, 2012.