DOI QR코드

DOI QR Code

Scalable Ontology Reasoning Using GPU Cluster Approach

GPU 클러스터 기반 대용량 온톨로지 추론

  • Received : 2015.07.06
  • Accepted : 2015.10.13
  • Published : 2016.01.15

Abstract

In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

근래에 들어 다양한 시멘틱 서비스를 위하여 기존의 지식을 바탕으로 새로운 지식을 고속으로 추론할 수 있는 대용량 온톨로지 추론 기법이 요구되고 있다. 이런 추세에 따라 대규모의 클러스터를 활용하는 하둡 및 Spark 프레임워크 기반의 온톨로지 추론 엔진 개발이 연구되고 있다. 또한, 기존의 CPU에 비해 많은 코어로 구성되어 있는 GPGPU를 활용하는 병렬 프로그래밍 방식도 온톨로지 추론에 활용되고 있다. 앞서 말한 두 가지 방식의 장점을 결합하여, 본 논문에서는 RDFS 대용량 온톨로지 데이터를 인-메모리 기반 프레임워크인 Spark를 통해 분산시키고 GPGPU를 이용하여 분산된 데이터를 고속 추론하는 방법을 제안한다. GPGPU를 통한 온톨로지 추론은 기존의 추론 방식보다 저비용으로 고속 추론을 수행하는 것이 가능하다. 또한 Spark 클러스터의 각 노드를 통하여 대용량 온톨로지 데이터에 대한 부하를 줄일 수 있다. 본 논문에서 제안하는 추론 엔진을 평가하기 위하여 LUBM10, 50, 100, 120에 대해 추론 속도를 실험하였고, 최대 데이터인 LUBM120(약 1백7십만 트리플, 2.1GB)의 실험 결과, 인-메모리(Spark) 추론 엔진 보다 7배 빠른 추론 성능을 보였다.

Keywords

Acknowledgement

Grant : 대용량 지식처리용 분산 병렬 추론 플랫폼 개발

Supported by : 정보통신기술진흥센터

References

  1. Jeffrey Dean, Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, 2008.
  2. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Shoica, "Spark: Cluster Computing with Working Sets," HotCloud 2010, Jun. 2010.
  3. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," NSDI 2012, Apr. 2012.
  4. Norman Heino, Jeff Z. Pan, "RDFS Reasoning on Massively Parallel Hardware," The Semantic Web-ISWC 2012, Vol. 7649, pp 133-148, 2012.
  5. Martin Peters, Christopher Brink, Sabine Sachweh, Albert Zundorf, "Rule-based Reasoning on Massively Parallel Hardware," SSWS 2013, Vol. 1046, pp. 33-48, 2013.
  6. Oren Segal, Philip Colangelo, Nashbeh Nasiri, Zhuo Qian, Martin Margala, "SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters," 2015.
  7. Jacopo Urbani, et al., "WebPIE: A Web-scaleparallel inference engine using MapReduce," WebSemantics: Science, Services and Agents on theWorld Wide Web 10, pp. 59-75, 2012. https://doi.org/10.1016/j.websem.2011.05.004
  8. Thusoo, Ashish, et al., "Hive: a warehousing solution over a map-reduce framework," Proc. of the VLDB Endowment 2.2, pp. 1626-1629, 2009. https://doi.org/10.14778/1687553.1687609
  9. Wan-Gon Lee, Je-Min Kim, Young-Tack Park, "Distributed Table Join for Scalable RDFS Reasoning on Cloud Computing Environment," Journal of KIISE, Vol. 41, No. 9, pp. 674-685, Sep. 2014. https://doi.org/10.5626/JOK.2014.41.9.674
  10. Kornacker, Marcel, and Justin Erickson, "Cloudera Impala: real-time queries in Apache Hadoop, for real," 2012-10 [2013-02]. [Online]. Available: http://blog.cloudera.com/blog/2012/10/cloudera-impalareal-time-queries-in-apache-hadoop-for-real, 2012.
  11. Jagvaral Batselem, Young-Tack Park, "Distributed scalable RDFS reasoning," Big Data and Smart Computing(BigComp), pp. 31-34, 2015.
  12. John E. Stone, David Gohara, Guochun Shi, "OpenCL: A parallel programming standard for heterogeneous computing systems," Computing in science and engineering, 2010.
  13. Yonghong Yan, Max Grossman, Vivek Sarkar, "JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA," Euro-Par 2009 Parallel Processing Lecture Notes in Computer Science, Vol. 5704, pp. 887-899, 2009.
  14. Shrinivas Joshi, "Leveraging Aparapi to Help Improve Financial Java Application Performance," AMD, 2012.
  15. Stephen Warshall, "A Theorem on Boolean Matrices," Journal of the ACM(JACM), 1962.
  16. Klyne, Graham, and Jeremy J. Carroll, "Resource Description Framework (RDF): Concepts and abstract syntax," 2006.
  17. Patric Hayes, Brain McBride, "RDF Semantics. Technical report, W3C Recommendation," 2004.

Cited by

  1. Spatial Computation on Spark Using GPGPU vol.5, pp.8, 2016, https://doi.org/10.3745/KTCCS.2016.5.8.181