Browse > Article

Effective Load Shedding for Multi-Way windowed Joins Based on the Arrival Order of Tuples on Data Streams  

Kwon, Tae-Hyung (한국과학기술원 전산학과)
Lee, Ki-Yong (한국과학기술원 전산학과)
Son, Jin-Hyun (한양대학교 전자컴퓨터공학부)
Kim, Myoung-Ho (한국과학기술원 전산학과)
Abstract
Recently, there has been a growing interest in the processing of continuous queries over multiple data streams. When the arrival rates of tuples exceed the memory capacity of the system, a load shedding technique is used to avoid the system becoming overloaded by dropping some subset of input tuples. In this paper, we propose an effective load shedding algorithm for multi-way windowed joins over multiple data streams. Most previous load shedding algorithms estimate the productivity of each tuple, i.e., the number of join output tuples produced by the tuple, based on its "join attribute value" and drop tuples with the lowest productivity. However, the productivity of a tuple cannot be accurately estimated from its join attribute value when the join attribute values are unique and do not repeat, or the distribution of the join attribute values changes over time. For these cases, we estimate the productivity of a tuple based on its "arrival order" on data streams, rather than its join attribute value. The proposed method can effectively estimate the productivity of a tuple even when the productivity of a tuple cannot be accurately estimated from its join attribute value. Through extensive experiments and analysis, we show that our proposed method outperforms the previous methods in terms of effectiveness and efficiency.
Keywords
Load Shedding; Mulit-way Stream Window Join; Hash Join;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. Cranor, T. Johnson, O. Spataschek and V. Shkapenyuk. Gigascope: A Stream Database for Network Applications, Proceedings of the ACM SIGMOD International Conference On Management of Data, San Diego, California, USA, pp.647-651, 2003.
2 M. A. Hammad, W. G. Aref and A. K. Elmagarmid. Stream Window Join: Tracking Moving Objects in Sensor-Network Databases. Proceedings of 15th International Conference on Scientific and Statistical Database Management, Cambridge, Massachusetts, USA, pp.75-84, 2003.
3 A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. Processing complex aggregate queries over data streams. Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, Wisconsin, USA, pp.61-72, 2002.
4 Y. Bai, H. Wang and C. Zaniolo. Load Shedding in Classifying Multi-Source Streaming Data: A Bayes Risk Approach. Proceedings of the Seventh SIAM International Conference on Data Mining, Minneapolis, Minnesota, USA, pp.425-430, 2007.
5 U. Srivastava and J. Widom. Memory-Limited Execution of Windowed Stream Joins. Proceedings of the 30th VLDB Conference, Toronto, Canada, pp.324-335, 2004.
6 S. D. Viglas, J. F. Naughton and J. Burger. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, vol.29, pp.285-296, 2003.
7 T. Kwon, H. Kim, M. Kim and J. Son, An Advanced Join Algorithm for Multiple Data Streams Using a Bit-vector Hash Table. IEICE Transaction on Information and Systems, vol.E92-D, no.7, pp.1429-1434, 2009.   DOI   ScienceOn
8 Y. Law and C. Zaniolo. Load Shedding for Window Joins on Multiple Data Streams. IEEE 23rd International Conference on Data Engineering, pp.674- 683, 2007.
9 H. Yu, EP. Lim and J. Zhang. On In-network Synopsis Join Processing for Sensor Networks. Proceedings of the 7th International Conference on Mobile Data Management, Nara, Japan, pp.32- 39, 2006.
10 A. Das, J. Gehrke and M. Riedewald. Approximate Join Processing over Data Streams. Proceedings of the 2003 ACM SIGMOD international conference on Management of data, San Diego, California, USA, pp.40-51, 2003.
11 J. Gehrke and S. Madden. Query Processing in Sensor Networks. IEEE Pervasive computing, vol.3, no.1, pp.46-55, 2004.   DOI   ScienceOn
12 B. Gredik, K. Wu, P. S. Yu and L. Liu. A Load Shedding Framework and Optimizations for Mway Windowed Stream Joins. IEEE 23rd International Conference on Data Engineering, pp. 536-545, 2007.
13 L. Golab and M. T. Ozsu. Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, vol.29, pp.500-511, 2003.