• Title/Summary/Keyword: 대규모 RDF 데이터

Search Result 6, Processing Time 0.02 seconds

A Dynamic Partitioning Scheme for Distributed Storage of Large-Scale RDF Data (대규모 RDF 데이터의 분산 저장을 위한 동적 분할 기법)

  • Kim, Cheon Jung;Kim, Ki Yeon;Yoo, Jong Hyeon;Lim, Jong Tae;Bok, Kyoung Soo;Yoo, Jae Soo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1126-1135
    • /
    • 2014
  • In recent years, RDF partitioning schemes have been studied for the effective distributed storage and management of large-scale RDF data. In this paper, we propose an RDF dynamic partitioning scheme to support load balancing in dynamic environments where the RDF data is continuously inserted and updated. The proposed scheme creates clusters and sub-clusters according to the frequency of the RDF data used by queries to set graph partitioning criteria. We partition the created clusters and sub-clusters by considering the workloads and data sizes for the servers. Therefore, we resolve the data concentration of a specific server, resulting from the continuous insertion and update of the RDF data, in such a way that the load is distributed among servers in dynamic environments. It is shown through performance evaluation that the proposed scheme significantly improves the query processing time over the existing scheme.

Property-based Decomposition Storage Model for RDF Data Management (RDF 데이터 관리를 위한 프로퍼티 기반 분할 저장 모델)

  • Kim, Sung-Wan;Lim, Hae-Chull
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.223-225
    • /
    • 2005
  • 시맨틱 웹의 구현을 위한 수단으로 RDF 및 기타 기반 기술이 사용되고 있다. 이에 따라, 방대한 RDF 데이터의 효율적인 관리를 위한 연구들이 최근 활발하게 국내외에서 진행 중이다. 기존의 많은 연구들은 관계형 데이터베이스 시스템을 이용하여 트리플 형태의 RDF 데이터의 저장하는 방법을 제안하였다. 이러한 방법은 하나의 대규모 테이블상에 RDF 데이터를 저장하므로 데이터 관리측면에서 장점이 있으나 질의 처리 측면에서 볼 때 항상 테이블 전체를 접근해야 하므로 검색 성능이 저하될 수 있는 문제점이 있다. 본 논문에서는 질의 처리 성능을 높이기 위해 프로퍼티를 기반으로 RDF 데이터를 절러 개의 테이블로 분할 저장하는 기법을 제안한다.

  • PDF

An Efficient Indexing Scheme Considering the Characteristics of Large Scale RDF Data (대규모 RDF 데이터의 특성을 고려한 효율적인 색인 기법)

  • Kim, Kiyeon;Yoon, Jonghyeon;Kim, Cheonjung;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.1
    • /
    • pp.9-23
    • /
    • 2015
  • In this paper, we propose a new RDF index scheme considering the characteristics of large scale RDF data to improve the query processing performance. The proposed index scheme creates a S-O index for subjects and objects since the subjects and objects of RDF triples are used redundantly. In order to reduce the total size of the index, it constructs a P index for the relatively small number of predicates in RDF triples separately. If a query contains the predicate, we first searches the P index since its size is relatively smaller compared to the S-O index. Otherwise, we first searches the S-O index. It is shown through performance evaluation that the proposed scheme outperforms the existing scheme in terms of the query processing time.

Effective Keyword Search on Semantic RDF Data (시맨틱 RDF 데이터에 대한 효과적인 키워드 검색)

  • Park, Chang-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.209-220
    • /
    • 2017
  • As a semantic data is widely used in various applications such as Knowledge Bases and Semantic Web, needs for effective search over a large amount of RDF data have been increasing. Previous keyword search methods based on distinct root semantics only retrieve a set of answer trees having different root nodes. Thus, they often find answer trees with similar meanings or low query relevance together while those with the same root node cannot be retrieved together even if they have different meanings and high query relevance. We propose a new method to find diverse and relevant answers to the query by permitting duplication of root nodes among them. We present an efficient query processing algorithm using path indexes to find top-k answers given a maximum amount of root duplication a set of answer trees can have. We show by experiments using a real dataset that the proposed approach can produce effective answer trees which are less redundant in their content nodes and more relevant to the query than the previous method.

A Study on Visualization for Large Ontology Data (대용량 온톨로지 데이터의 가시화 연구)

  • Chung, Sung-Moon;Lee, Jeong-Hoon;Han, Wook-Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1322-1323
    • /
    • 2011
  • 온톨로지는 정보들간의 체계 및 상호작용을 표현하고, 이를 통해 사용자들에게 유용한 지식을 제공하는 툴로 정보과학, 전자상거래, 및 의료 분야 등에서 널리 활용되고 있다. 본 논문에서는 온톨로지 데이터베이스 관리 시스템인 XML/RDF 를 이용하여 대규모의 온톨로지 데이터를 효율적으로 처리하고 가시화하는 방안에 대해 연구한다.

SSQUSAR : A Large-Scale Qualitative Spatial Reasoner Using Apache Spark SQL (SSQUSAR : Apache Spark SQL을 이용한 대용량 정성 공간 추론기)

  • Kim, Jonghoon;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.2
    • /
    • pp.103-116
    • /
    • 2017
  • In this paper, we present the design and implementation of a large-scale qualitative spatial reasoner, which can derive new qualitative spatial knowledge representing both topological and directional relationships between two arbitrary spatial objects in efficient way using Aparch Spark SQL. Apache Spark SQL is well known as a distributed parallel programming environment which provides both efficient join operations and query processing functions over a variety of data in Hadoop cluster computer systems. In our spatial reasoner, the overall reasoning process is divided into 6 jobs such as knowledge encoding, inverse reasoning, equal reasoning, transitive reasoning, relation refining, knowledge decoding, and then the execution order over the reasoning jobs is determined in consideration of both logical causal relationships and computational efficiency. The knowledge encoding job reduces the size of knowledge base to reason over by transforming the input knowledge of XML/RDF form into one of more precise form. Repeat of the transitive reasoning job and the relation refining job usually consumes most of computational time and storage for the overall reasoning process. In order to improve the jobs, our reasoner finds out the minimal disjunctive relations for qualitative spatial reasoning, and then, based upon them, it not only reduces the composition table to be used for the transitive reasoning job, but also optimizes the relation refining job. Through experiments using a large-scale benchmarking spatial knowledge base, the proposed reasoner showed high performance and scalability.