Browse > Article

An Efficient Join Algorithm for Data Streams with Overlapping Window  

Kim, Hyeon-Gyu (KAIST 전산학과)
Kang, Woo-Lam (KAIST 전산학과)
Kim, Myoung-Ho (KAIST 전산학과)
Abstract
Overlapping windows are generally used for queries to process continuous data streams. Nevertheless, existing approaches discussed join algorithms only for basic types of windows such as tumbling windows and tuple-driven windows. In this paper, we propose an efficient join algorithm for overlapping windows, which are considered as a more general type of windows. The proposed algorithm is based on an incremental window join. It focuses on producing join results continuously when the memory overflow frequently occurs. It consists of (1) a method to use both of the incremental and full joins selectively, (2) a victim selection algorithm to minimize latency of join processing and (3) an idle time professing algorithm. We show through our experiments that the selective use of incremental and full joins provides better performance than using one of them only.
Keywords
Data streams; Hybrid join; Overlapping windows;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Cranor el at., Gigascope: A Stream Database for Network Applications, Proc. of ACM SIGMOD 2003   DOI
2 Ding et al., Joining Punctuated Streams, Proc. of EDBT, pp. 587-604, 2004
3 Golab et al., Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams, Proc. of VLDB, pp. 500-511, 2003
4 Das et al., Approximate Join Processing over Data Streams, Proc. of SIGMOD, pp. 40-51, 2003   DOI
5 Harmad et al., Scheduling for Shared Window Joins over Data Streams, Proc. of VLDB 2003
6 Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results, ICDE 2004
7 Li et al., Semantics and Evaluation Techniques for Window Aggregates in Data Streams, Proc. of ACM SIGMOD, pp. 311-322, 2005   DOI
8 Viglas et al,, Maximizing the Output Rate of Multi-Way Join Queries over Streaming Informa-tion Sources, Proc. of VLDB, pp. 285-296, 2003
9 Urban et al., XJoin: A Reactively-Scheduled Pipe-lined Join Operator, IEEE Data Engineering Bulletin 2000
10 Babcock et al., Models and Issues in Data Stream Systems, Proc. of ACM PODS, pp. 1-16, 2002   DOI