• Title/Summary/Keyword: Multi-way stream join

Search Result 5, Processing Time 0.022 seconds

Preprocessing Method for Handling Multi-Way Join Continuous Queries over Data Streams (데이터 스트림에서 다중 조인 연속질의의 효과적인 처리를 위한 전처리 기법)

  • Seo, Ki-Yeon;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Internet Computing and Services
    • /
    • v.13 no.3
    • /
    • pp.93-105
    • /
    • 2012
  • A data stream is a series of tuples which are generated in real-time, incessant, immense, and volatile manner. As new information technologies are actively emerging, stream processing methods are being needed to efficiently handle data streams. Especially, finding out an efficient evaluation for a multi-way join would make outstanding contributions toward improving the performance of a data stream management system because a join operation is one of the most resource-consuming operators for evaluating queries. In this paper, in order to evaluate efficiently a multi-way join continuous query, we propose a novel method to decrease the cost of a query by eliminating unsuccessful intermediate results. For this, we propose a matrix-based structure for monitoring data streams and estimate the number of final result tuples of the query and find out unsuccessful tuples by matrix multiplication operations. And then using these information, we process efficiently a multi-way join continuous query by filtering out the unsuccessful tuples in advance before actual evaluation of the query.

Optimizing Multi-way Join Query Over Data Streams (데이타 스트림에서의 다중 조인 질의 최적화 방법)

  • Park, Hong-Kyu;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.459-468
    • /
    • 2008
  • A data stream which is a massive unbounded sequence of data elements continuously generated at a rapid rate. Many recent research activities for emerging applications often need to deal with the data stream. Such applications can be web click monitoring, sensor data processing, network traffic analysis. telephone records and multi-media data. For this. data processing over a data stream are not performed on the stored data but performed the newly updated data with pre-registered queries, and then return a result immediately or periodically. Recently, many studies are focused on dealing with a data stream more than a stored data set. Especially. there are many researches to optimize continuous queries in order to perform them efficiently. This paper proposes a query optimization algorithm to manage continuous query which has multiple join operators(Multi-way join) over data streams. It is called by an Extended Greedy query optimization based on a greedy algorithm. It defines a join cost by a required operation to compute a join and an operation to process a result and then stores all information for computing join cost and join cost in the statistics catalog. To overcome a weak point of greedy algorithm which has poor performance, the algorithm selects the set of operators with a small lay, instead of operator with the smallest cost. The set is influenced the accuracy and execution time of the algorithm and can be controlled adaptively by two user-defined values. Experiment results illustrate the performance of the EGA algorithm in various stream environments.

An Efficient M-way Stream Join Algorithm Exploiting a Bit-vector Hash Table (비트-벡터 해시 테이블을 이용한 효율적인 다중 스트림 조인 알고리즘)

  • Kwon, Tae-Hyung;Kim, Hyeon-Gyu;Lee, Yu-Won;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.297-306
    • /
    • 2008
  • MJoin is proposed as an algorithm to join multiple data streams efficiently, whose characteristics are unpredictably changed. It extends a symmetric hash join to handle multiple data streams. Whenever a tuple arrives from a remote stream source, MJoin checks whether all of hash tables have matching tuples. However, when a join involves many data streams with low join selectivity, the performance of this checking process is significantly influenced by the checking order of hash tables. In this paper, we propose a BiHT-Join algorithm which extends MJoin to conduct this checking in a constant time regardless of a join order. BiHT-Join maintains a bit-vector which represents the existence of tuples in streams and decides a successful/unsuccessful join through comparing a bit-vector. Based on the bit-vector comparison, BiHT-Join can conduct a hash join only for successful joining tuples based on this decision. Our experimental results show that the proposed BiHT-Join provides better performance than MJoin in the processing of multiple streams.

Effective Load Shedding for Multi-Way windowed Joins Based on the Arrival Order of Tuples on Data Streams (다중 윈도우 조인을 위한 튜플의 도착 순서에 기반한 효과적인 부하 감소 기법)

  • Kwon, Tae-Hyung;Lee, Ki-Yong;Son, Jin-Hyun;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.37 no.1
    • /
    • pp.1-11
    • /
    • 2010
  • Recently, there has been a growing interest in the processing of continuous queries over multiple data streams. When the arrival rates of tuples exceed the memory capacity of the system, a load shedding technique is used to avoid the system becoming overloaded by dropping some subset of input tuples. In this paper, we propose an effective load shedding algorithm for multi-way windowed joins over multiple data streams. Most previous load shedding algorithms estimate the productivity of each tuple, i.e., the number of join output tuples produced by the tuple, based on its "join attribute value" and drop tuples with the lowest productivity. However, the productivity of a tuple cannot be accurately estimated from its join attribute value when the join attribute values are unique and do not repeat, or the distribution of the join attribute values changes over time. For these cases, we estimate the productivity of a tuple based on its "arrival order" on data streams, rather than its join attribute value. The proposed method can effectively estimate the productivity of a tuple even when the productivity of a tuple cannot be accurately estimated from its join attribute value. Through extensive experiments and analysis, we show that our proposed method outperforms the previous methods in terms of effectiveness and efficiency.

A Multi-way joins technique for multi join attributes in Stream Environments (스트림 환경에서 다중 조인 속성을 위한 멀티웨이 조인 처리기법)

  • Baek, Joohyun;Jung, Sungwon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.11a
    • /
    • pp.226-229
    • /
    • 2007
  • 스트리밍 환경에서 조인 연산은 기존의 기법과는 다른 처리 방법을 요구한다. 이런 문제를 해결 하기 위해 기존에 여러 가지의 다양한 기법들이 제안되었다. 하지만 지금까지 제안된 방법들은 두 개의 입력 스트림에 대한 조인만 고려하거나 단일 속성 멀티 스트림 조인에 대해서만 고려해왔다. 하지만 조인 속성이 여러개인 경우에는 한단계로 조인을 수행하는 것은 불가능하다. 이 눈문에서는 이러한 문제를 해결하기 위해서 지금까지 고려되어 왔던 환경에서 더 일반화 된 다중속성을 가지는 조인을 고려한다. 이러한 경우에는 조인이 다단계로 일어나게 되는데 이러한 환경에서는 이전 단계의 조인이 다음 단계의 조인에 영향을 미치게 된다. 그러므로 최종 조인 결과를 빠르게 만들어 내기 위해서는 여러 입력 스트림 중에서 어떤 조인을 먼저 수행하느냐에 따라 전체적인 조인결과를 만들어내는 속도가 달라지게 된다. 그러므로 전체 조인결과를 빠르게 만들어 내기 위해서 조인이 수행되는 과정에서 여러 입력 스트림중에 어떤 스트림을 먼저 수행할지를 결정함으로써 최종 조인 결과를 빠르게 만들어낼 수 있게 하는 방법을 제안한다.