Search | Korea Science

Design and Performance Analysis of MapReduce-based kNN join Query Processing Algorithm (맵리듀스 기반 kNN join 질의처리 알고리즘의 설계 및 성능평가)

Kim, TaeHoon;Lee, HyunJo;Chang, JaeWoo
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.733-736
- /
- 2014
최근 대용량 데이터에 대한 효율적인 데이터 분석 기법이 활발히 연구되고 있다. 대표적인 기법으로는 맵리듀스 환경에서 보로노이 다이어그램을 이용한 k 최근접점 조인(VkNN-join) 알고리즘이 존재한다. VkNN-join 알고리즘은 부분집합 Ri에 연관된 부분집합 Sj만을 후보탐색 영역으로 선정하여 질의를 처리하기 때문에 질의처리 시간을 감소시킨다. 그러나 VkNN-join은 색인 구축 비용이 높으며, kNN 연산 오버헤드가 큰 문제점이 존재한다. 이를 해결하기 위해, 본 논문에서는 대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘을 제안한다. 제안하는 알고리즘은 시드 기반의 동적 분할을 통해 색인구조 구축비용을 감소시킨다. 또한 시드 간 평균 거리를 기반으로 후보 영역을 선정함으로써, 연산 오버헤드를 감소시킨다. 아울러, 성능 평가를 통해 제안하는 기법이 질의처리 시간 측면에서 기존 기법에 비해 우수함을 나타낸다.
https://doi.org/10.3745/PKIPS.y2014m11a.733 인용 PDF

A Multi-level Inverted Index Technique for Structural Document Search (구조화 문서 검색을 위한 다단계 역색인 기법)

Kim, Jong-Ik
- The KIPS Transactions:PartB
- /
- v.15B no.4
- /
- pp.355-364
- /
- 2008
In general, we can use an inverted index for retrieving element lists from structured documents. An inverted index can retrieve a list of elements that have the same tag name. In this approach, however, the cost of query processing is linear to the length of a path query because all the structural relationships (parent-child and ancestor-descendant) should be resolved by structural join operations. In this paper, we propose an inverted index technique and a novel structural join technique for accelerating XML path query evaluation. Our inverted index can retrieve element lists for path segments in a parent-child relationship. Our structural join technique can handle lists of element pairs while the existing techniques handle lists of elements. We show through experiments that these two proposed techniques are integrated to accelerate evaluation of XML path queries.
https://doi.org/10.3745/KIPSTB.2008.15-B.4.355 인용 PDF KSCI

DRAZ: SPARQL Query Engine for heterogeneous metadata sources (DRAZ : 이기종 메타 데이터 소스를 위한 SPARQL 쿼리 엔진)

Qudus, UMAIR;Hossain, Md Ibrahim;Lee, ChangJu;Khan, Kifayat Ullah;Won, Heesun;Lee, Young-Koo
- Database Research
- /
- v.34 no.3
- /
- pp.69-85
- /
- 2018
Many researches proposed federated query engines to perform query on several homogeneous or heterogeneous datasets simultaneously that significantly improve the quality of query results. The existing techniques allow querying only over a few heterogeneous datasets considering the static binding using the non-standard query. However, we observe that a simultaneous system considering the integration of heterogeneous metadata standards can offer better opportunity to generalize the query over any homogeneous and heterogeneous datasets. In this paper, we propose a transparent federated engine (DRAZ) to query over multiple data sources using SPARQL. In our system, we first develop the ontology for a non-RDF metadata standard based on the metadata kernel dictionary elements, which are standardized by the metadata provider. For a given SPARQL query, we translate any triple pattern into an API call to access the dataset of corresponding non-RDF metadata standard. We convert the results of every API call to N-triples and summarize the final results considering all triple patterns. We evaluated our proposed DRAZ using modified Fedbench benchmark queries over heterogeneous metadata standards, such as DCAT and DOI. We observed that DRAZ can achieve 70 to 100 percent correctness of the results despite the unavailability of the JOIN operations.

SPARQL-SQL Conversion and Improvement in Response Time based on Expanded Class-Property Views (확장 클래스-속성 뷰기반의 SPARQL-SQL 질의 변환 및 속도 개선)

Lee, Seungwoo;Kim, Pyung;Kim, Jaehan;Sung, Won-Kyung
- Proceedings of the Korea Contents Association Conference
- /
- 2007.11a
- /
- pp.84-88
- /
- 2007
In a general tendency that DBMS is used as a tool for storing large size of triple knowledge, it still remains in issue that which DBMS schema should be designed for storing, managing, inferring, and querying the triple knowledge efficiently. In this paper, we present, in the view point of efficient query process, a method that processes a query using Expanded Class-Property Views (ECPV) and, as a result, improvement in response time. The response time of DBMS-based inference systems is proportioned to table size and the number of table join operations. The more query is complex, the more join operations it requires, and the longer response time it requires. ECPV is a table obtained by processing possible join operations before queries. To use ECPV in the query process, SPARQL queries should be converted into corresponding ECPV-based SQL queries. This paper describes the conversion process and shows the improvement in response time by experiments.
PDF

A Flexible Query Processing System for XML Regular Path Expressions (XML 정규 경로식을 위한 유연한 질의 처리 시스템)

김대일;김기창;김유성
- Journal of KIISE:Databases
- /
- v.30 no.6
- /
- pp.641-650
- /
- 2003
The eXtensible Markup Language(XML) is emerging as a standard format of data representation and exchange on the Internet. There have been researches about storing and retrieving XML documents using the relational database which has techniques in full growth about large data processing, recovery, concurrency control and so on. Since in previous systems same structure information and fundamental operation are used for processing of various kinds of XML queries, only some specific query can be efficiently processed not all types of query. In this paper, we propose a flexible query processing system. To process query efficiently, the proposed system analyzes regular path expression queries, and uses $\theta$-join operation using region numbering values to check ancestor-descendent relationship and equi-join operation using parent's region start value to check parent-child relationship. Thus, the proposed system processes efficiently XML regular path expressions. From the experimental results, we show that proposed XML query processing system is more efficient than previous systems.
PDF KSCI

Processing Sliding Window Multi-Joins using a Graph-Based Method over Data Streams (데이터 스트림에서 그래프 기반 기법을 이용한 슬라이딩 윈도우 다중 조인 처리)

Zhang, Liang;Ge, Jun-Wei;Kim, Gyoung-Bae;Lee, Soon-Jo;Bae, Hae-Young;You, Byeong-Seob
- Journal of Korea Spatial Information System Society
- /
- v.9 no.2
- /
- pp.25-34
- /
- 2007
Existing approaches that select an order for the join of three or more data streams have always used the simple heuristics. For their disadvantage - only one factor is considered and that is join selectivity or arrival rate, these methods lead to poor performance and inefficiency In some applications. The graph-based sliding window multi -join algorithm with optimal join sequence is proposed in this paper. In this method, sliding window join graph is set up primarily, in which a vertex represents a join operator and an edge indicates the join relationship among sliding windows, also the vertex weight and the edge weight represent the cost of join and the reciprocity of join operators respectively. Then the optimal join order can be found in the graph by using improved MVP algorithm. The final result can be produced by executing the join plan with the nested loop join procedure, The advantages of our algorithm are proved by the performance comparison with existing join algorithms.
PDF

SPARQL Query Processing in Distributed In-Memory System (분산 메모리 시스템에서의 SPARQL 질의 처리)

Jagvaral, Batselem;Lee, Wangon;Kim, Kang-Pil;Park, Young-Tack
- Journal of KIISE
- /
- v.42 no.9
- /
- pp.1109-1116
- /
- 2015
In this paper, we propose a query processing approach that uses the Spark functional programming and distributed memory system to solve the computational overhead of SPARQL. In the semantic web, RDF ontology data is produced at large scale, and the main challenge for the semantic web is to query and manipulate such a large ontology with a high throughput. The most existing studies on SPARQL have focused on deploying the Hadoop MapReduce framework, and although approaches based on Hadoop MapReduce have shown promising results, they achieve a low level of throughput due to the underlying distributed file processes. Therefore, in order to speed up the query processes, we suggest query- processing methods that are based on memory caching in distributed memory system. Our approach is also integrated with a clause unification method for propagating between the clauses that exploits Spark join, map and filter methods along with caching. In our experiments, we have achieved a high level of performance relative to other approaches. In particular, our performance was nearly similar to that of Sempala, which has been considered to be the fastest query processing system.
https://doi.org/10.5626/JOK.2015.42.9.1109 인용 KSCI

Improving Join Performance for SPARQL Query Processing in the Clouds (클라우드에서 SPARQL 질의 처리를 위한 조인 성능 향상)

Choi, Gyu-Jin;Son, Yun-Hee;Lee, Kyu-Chul
- Journal of KIISE
- /
- v.43 no.6
- /
- pp.700-709
- /
- 2016
Recently, with the rapid growth of LOD (Linked Open Data) existing methods based on a single machine have limitation in performance. Existing solutions use distributed framework such as Mapreduce in order to improve the performance. However, the MapReduce framework for processing SPARQL queries involves multiple MapReduce jobs and additional costs incurred. In addition, the problem of unnecessary data processing arises. In this study, we proposed a method to reduce the number of MapReduce jobs during SPARQL query processing and join indexes based on Bitmap for minimizing the costs of processing unnecessary data.
https://doi.org/10.5626/JOK.2016.43.6.700 인용 KSCI

Join Query Performance Optimization Based on Convergence Indexing Method (융합 인덱싱 방법에 의한 조인 쿼리 성능 최적화)

Zhao, Tianyi;Lee, Yong-Ju
- The Journal of the Korea institute of electronic communication sciences
- /
- v.16 no.1
- /
- pp.109-116
- /
- 2021
Since RDF (Resource Description Framework) triples are modeled as graph, we cannot directly adopt existing solutions in relational databases and XML technology. In order to store, index, and query Linked Data more efficiently, we propose a convergence indexing method combined R*-tree and K-dimensional trees. This method uses a hybrid storage system based on HDD (Hard Disk Drive) and SSD (Solid State Drive) devices, and a separated filter and refinement index structure to filter unnecessary data and further refine the immediate result. We perform performance comparisons based on three standard join retrieval algorithms. The experimental results demonstrate that our method has achieved remarkable performance compared to other existing methods such as Quad and Darq.
https://doi.org/10.13067/JKIECS.2021.16.1.109 인용 PDF KSCI

A Join Query with Aggregation functions Using Mapreduce (집계 함수를 포함하는 조인 질의의 맵리듀스를 사용한 효율적인 처리 기법)

Oh, So Hyeon;Lee, Ki Yong
- Proceedings of the Korea Information Processing Society Conference
- /
- 2015.04a
- /
- pp.132-135
- /
- 2015
맵리듀스(MapReduce)는 분산 환경에서의 빅데이터(Big Data), 즉 대용량 데이터를 처리하는 프로그래밍 모델이다. 대용량의 데이터를 분석하기 위해서 집계 함수(Aggregation function)로 데이터를 처리할 수 있다. 본 논문에서는 맵리듀스 환경을 기반으로 SQL 쿼리에서 집계 함수를 더 적은 비용으로 수행하며 효율적으로 처리할 수 있는 두 가지 전략을 제안한다. 두 가지 전략 중 더 높은 성능을 보이는 전략을 더 효율적인 처리 방법으로 판단한다. 첫 번째 전략은 두 테이블을 Join하여 집계 함수를 처리하는 방법이다. 두 번째 전략은 집계 함수를 처리하여 Join에 참여할 튜플의 수를 최소로 줄인 후 Join을 수행하고 다시 집계 함수를 처리하는 방법이다. 두 제안 방법을 비교하기 위하여 실험을 한 결과 두 번째 전략이 더 적은 비용이 드므로 더 효율적인 처리 방법인 것으로 보인다.
https://doi.org/10.3745/PKIPS.y2015m04a.132 인용 PDF

Search Result 116, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)