• Title/Summary/Keyword: join

Search Result 1,152, Processing Time 0.029 seconds

Transformation of Continuous Aggregation Join Queries over Data Streams

  • Tran, Tri Minh;Lee, Byung-Suk
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.27-58
    • /
    • 2009
  • Aggregation join queries are an important class of queries over data streams. These queries involve both join and aggregation operations, with window-based joins followed by an aggregation on the join output. All existing research address join query optimization and aggregation query optimization as separate problems. We observe that, by putting them within the same scope of query optimization, more efficient query execution plans are possible through more versatile query transformations. The enabling idea is to perform aggregation before join so that the join execution time may be reduced. There has been some research done on such query transformations in relational databases, but none has been done in data streams. Doing it in data streams brings new challenges due to the incremental and continuous arrival of tuples. These challenges are addressed in this paper. Specifically, we first present a query processing model geared to facilitate query transformations and propose a query transformation rule specialized to work with streams. The rule is simple and yet covers all possible cases of transformation. Then we present a generic query processing algorithm that works with all alternative query execution plans possible with the transformation, and develop the cost formulas of the query execution plans. Based on the processing algorithm, we validate the rule theoretically by proving the equivalence of query execution plans. Finally, through extensive experiments, we validate the cost formulas and study the performances of alternative query execution plans.

A Comparative Study of PRAM-based Join Algorithms (PRAM 기반의 조인 알고리즘 성능 비교 연구)

  • Choi, Yongsung;On, Byung-Won;Choi, Gyu Sang;Lee, Ingyu
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.379-389
    • /
    • 2015
  • With the advent of non-volatile memories such as Phase Change Memory (PCM or PRAM) and Magneto Resistive RAM (MRAM), active studies have been carried out on how to replace Dynamic Random-Access Memory (DRAM) with PRAM. In this paper, we study both endurance and performance issues of existing join algorithms that are based on PRAM-based computer systems and have been widely used until now: Block Nested Loop Join, Sort-Merge Join, Grace Hash Join, and Hybrid Hash Join. Our experimental results show that the existing join algorithms need to be redesigned to improve both the endurance and performance of PRAMs. To the best of our knowledge, this is the first research to scientifically study the results of the four join algorithms running on PRAM-based systems. In this work, our main contribution is the modeling and implementation of a PRAM-based simulator for a comparative study of the existing join algorithms.

Structural Semi-Join Operators for Efficient Path Processing in XML Databases (XML 데이터베이스에서 효율적인 경로처리를 위한 구조적 세미조인 기법)

  • Son, Seok-Hyun;Shin, Hyo-Seop
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.252-256
    • /
    • 2010
  • The structural join is one of core operators for efficient processing of XML queries. It can be mainly used for path-represented XML queries as it efficiently retrieves the node pairs that form a hierarchical relationship (i.e., ancestor-descendant, Parent-child relationship) among large-scale XML nodes. However, the structural join algorithms still suffer potential overhead in the middle of processing of XML path queries. In addressing this problem, the structural semi-join is proposed as a novel operator that retrieves only the ancestor or descendant nodes as join results for efficient processing. In this paper, we describe the algorithms for the structural semi-join and present the methods of XML path processing based on the structural semi-join algorithms. The experimental results show that the structural semi-join algorithms are very efficient in processing XML path processing.

Segment Join Technique for Processing in Queries Fast (빠른 XML질의 처리를 위한 세그먼트 조인 기법)

  • ;Moon Bongki;Lee Sukho
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.334-343
    • /
    • 2005
  • Complex queries such as path alld twig patterns have been the focus of much research on processing XML data. Structural join algorithms use a form of encoded structural information for elements in an XML document to facilitate join processing. Recently, structural join algorithms such as Twigstack and TSGeneric- have been developed to process such complex queries, and they have been shown that the processing costs of the algorithms are linearly proportional to the sum of input data. However, the algorithms have a shortcoming that their processing costs increase with the length of a queery. To overcome the shortcoming, we propose the segment join technique to augment the structural join with structural indexes such as the 1-Index. The SegmentTwig algorithm based on the segment join technique performs joins between a pair of segments, which is a series of query nodes, rather than joins between a pair of query nodes. Consequently, the query can be processed by reading only a query node per segment. Our experimental study shorts that segment join algorithms outperform the structural join methods consistently and considerably for various data sets.

Estimating Join Selectivity of Global XQuery Queries in Distributed Environments (분산 환경에서 전역 XQuery 질의의 조인 선택치 추정 방법)

  • Park, Jong-Hyun;Kang, Ji-Hoon
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.564-571
    • /
    • 2007
  • One of the methods for integrating XML data in distributed environments is using XML view. User can query toward distributed local XML views by using global XQuery queries in XQuery which is a standard query language for searching XML data. The global XQuery queries naturally contain join operations because of integrating and searching distributed heterogeneous data. Since join operations are generally expensive for processing a query, its processing technique is very important for efficient processing of global XQuery queries. Therefore there are some studies on the efficient processing of join operations and one of these studies is that selects minimum join cost by estimating a join selectivity. In case of SQL, there are already some researches for estimating a join selectivity and join cost of global SQL queries. However we can not apply their methods for estimating the selectivity of join operations in SQL queries into XQuery queries because of the structural difference between relational data and XML data. Therefore this paper proposes a method for estimating a selectivity of join operations in XQuery queries using the information of XML views. Our contribution is three threefold. First, we define the difference point for estimating join selectivity between SQL and XQuery. Second, we estimate join selectivity in XQuery queries by referring XML views. Third, we evaluate our estimating method.

An Efficient Block Index Scheme with Segmentation for Spatio-Textual Similarity Join

  • Xiang, Yiming;Zhuang, Yi;Jiang, Nan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3578-3593
    • /
    • 2017
  • Given two collections of objects that carry both spatial and textual information in the form of tags, a $\text\underline{S}patio$-$\text\underline{T}extual$-based object $\text\underline{S}imilarity$ $\text\underline{JOIN}$ (ST-SJOIN) retrieves the pairs of objects that are textually similar and spatially close. In this paper, we have proposed a block index-based approach called BIST-JOIN to facilitate the efficient ST-SJOIN processing. In this approach, a dual-feature distance plane (DFDP) is first partitioned into some blocks based on four segmentation schemes, and the ST-SJOIN is then transformed into searching the object pairs falling in some affected blocks in the DFDP. Extensive experiments on real and synthetic datasets demonstrate that our proposed join method outperforms the state-of-the-art solutions.

Design and Performance Analysis of MapReduce-based kNN join Query Processing Algorithm (맵리듀스 기반 kNN join 질의처리 알고리즘의 설계 및 성능평가)

  • Kim, TaeHoon;Lee, HyunJo;Chang, JaeWoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.733-736
    • /
    • 2014
  • 최근 대용량 데이터에 대한 효율적인 데이터 분석 기법이 활발히 연구되고 있다. 대표적인 기법으로는 맵리듀스 환경에서 보로노이 다이어그램을 이용한 k 최근접점 조인(VkNN-join) 알고리즘이 존재한다. VkNN-join 알고리즘은 부분집합 Ri에 연관된 부분집합 Sj만을 후보탐색 영역으로 선정하여 질의를 처리하기 때문에 질의처리 시간을 감소시킨다. 그러나 VkNN-join은 색인 구축 비용이 높으며, kNN 연산 오버헤드가 큰 문제점이 존재한다. 이를 해결하기 위해, 본 논문에서는 대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘을 제안한다. 제안하는 알고리즘은 시드 기반의 동적 분할을 통해 색인구조 구축비용을 감소시킨다. 또한 시드 간 평균 거리를 기반으로 후보 영역을 선정함으로써, 연산 오버헤드를 감소시킨다. 아울러, 성능 평가를 통해 제안하는 기법이 질의처리 시간 측면에서 기존 기법에 비해 우수함을 나타낸다.

Skewed Data Handling Technique Using an Enhanced Spatial Hash Join Algorithm (개선된 공간 해쉬 조인 알고리즘을 이용한 편중 데이터 처리 기법)

  • Shim Young-Bok;Lee Jong-Yun
    • The KIPS Transactions:PartD
    • /
    • v.12D no.2 s.98
    • /
    • pp.179-188
    • /
    • 2005
  • Much research for spatial join has been extensively studied over the last decade. In this paper, we focus on the filtering step of candidate objects for spatial join operations on the input tables that none of the inputs is indexed. In this case, many algorithms has presented and showed excellent performance over most spatial data. However, if data sets of input table for the spatial join ale skewed, the join performance is dramatically degraded. Also, little research on solving the problem in the presence of skewed data has been attempted. Therefore, we propose a spatial hash strip join (SHSJ) algorithm that combines properties of the existing spatial hash join (SHJ) algorithm based on spatial partition for input data set's distribution and SSSJ algorithm. Finally, in order to show SHSJ the outperform in uniform/skew cases, we experiment SHSJ using the Tiger/line data sets and compare it with the SHJ algorithm.