Browse > Article

An Efficient M-way Stream Join Algorithm Exploiting a Bit-vector Hash Table  

Kwon, Tae-Hyung (한국과학기술원 전산학과)
Kim, Hyeon-Gyu (한국과학기술원 전산학과)
Lee, Yu-Won (한국과학기술원 전산학과)
Kim, Myoung-Ho (한국과학기술원 전산학과)
Abstract
MJoin is proposed as an algorithm to join multiple data streams efficiently, whose characteristics are unpredictably changed. It extends a symmetric hash join to handle multiple data streams. Whenever a tuple arrives from a remote stream source, MJoin checks whether all of hash tables have matching tuples. However, when a join involves many data streams with low join selectivity, the performance of this checking process is significantly influenced by the checking order of hash tables. In this paper, we propose a BiHT-Join algorithm which extends MJoin to conduct this checking in a constant time regardless of a join order. BiHT-Join maintains a bit-vector which represents the existence of tuples in streams and decides a successful/unsuccessful join through comparing a bit-vector. Based on the bit-vector comparison, BiHT-Join can conduct a hash join only for successful joining tuples based on this decision. Our experimental results show that the proposed BiHT-Join provides better performance than MJoin in the processing of multiple streams.
Keywords
Multi-way stream join; Bit-vector Hash Table;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Gehrke, J. and Madden, S. (2004): Query Processing for Sensor Networks. IEEE Pervasive Computing 3(1): 46-55   DOI   ScienceOn
2 Urhan, T. and Franklin, M. (2000): Xjoin: A Reatively-Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin 23(2): 27-33
3 Bizarro, P., Babu, S., DeWitt, D. and Widom, J. (2005): Content-based routing: Different plans for different data. Proceedings of the 31st international conference on Very large data bases, Trondheim, Norway: 757-768
4 Hai Y., Ee-Peng L. and Jun Z. (2006): On In- network Synopsis Join Processing for Sensor Networks. Proceedings of the 7th International Conference on Mobile Data Management, Nara, Japan : 32-39
5 Haas, P. J., Hellerstein J. M. (1999): Ripple Joins for Online Aggregation. Proceedings of the ACM SIGMOD international conference on Management of data, Piladelphia, USA: 287-298
6 Theodore, J., Charles D.C., Oliver S. (2003): Gigascope: A Stream Database for Network Applications. Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, California, USA : 647-651
7 Avnur, R. and Hellerstein, J. M. (2000): Eddies: Continuously adaptive query processing. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA: 261-272
8 Annita, N.W. and Peter, M.G.A. (1993): DataFlow query execution in parallel main-memory environment. Distributed and Parallel Databases 1(1): 103-128   DOI
9 Hammad, M.A., Aref, W.G. and Elmagarmid, A.K. (2003): Stream Window Join: Tracking Moving Object in Sensor-Network Database. Proceedings of 15th International Conference on Scientific and Statistical Database Management, Cambridge, Massachusetts, USA: 75-84
10 Viglas, S. and Naughton, J. F. (2002): Rate-Based Query Optimization for Streaming Information Sources. Proceedings of the 2002 ACM SIGMOD international conference on Management of data, 2002, Madison, Wisconsin, USA: 37-48
11 Lukas, G. and M Tamer Őzsu. (2003): Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. Proceedings of the 29th international conference on Very large data bases, Berlin, Germany (29): 500-511
12 Yijian B., Haixun, W. and Carlo, Z. (2007): Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach. Proceedings of the Seventh SIAM International Conference on Data Mining, Minneapolis, Minnesota, USA: 425-430
13 O'Neil, P. and Graefe, G. (1995): Multi-table joins through bitmapped join indices, ACM SIGMOD Record, 24(3): 8-11   DOI   ScienceOn
14 Yali, Z., Elke, A.R., and Goerge, T.H. (2004): Dynamic Plan Migration for Continuous Queries Over Data Streams. Proceedings of the ACM SIGMOD international conference on Management of data, Paris, France: 13-18
15 Stratis, D.V., Jeffrey F.N. and Josef, B. (2003): Maximizing the output rate of multi-join queries over streaming information sources. Proceedings of the 29th international conference on Very large data bases, Berlin, Germany (29): 285-296
16 Toshihide, I. and Tiko K. (1984): On the optimal nesting order for computing N-relational joins. ACM Transactions on Database Systems (TODS) 9(3): 482-502   DOI   ScienceOn
17 Babu, S., Munagala, K., Widom, J. and Motwani, R. (2005): Adaptive Caching for Continuous Queries, Proceedings of the 21st International Conference on Data Engineering, Washington, DC, USA: 118-129