Browse > Article
http://dx.doi.org/10.9708/jksci.2012.17.2.149

A Study on the Efficiency of Join Operation On Stream Data Using Sliding Windows  

Yang, Young-Hyoo (Dept. of Information Management, Hanyang Women's University)
Abstract
In this thesis, the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e.g., if the output of the join is being aggregated. It is shown formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams. Previous work has addressed only the maximum-subset problem, and has implicitly used a frequency based model of stream arrival. There exists a sampling problem for this model. More importantly, it is shown that a broad class of applications for which an age-based model of stream arrival is more appropriate, and both approximation scenarios under this new model are addressed. Finally, for the case of multiple joins being executed with an overall memory constraint, an algorithm for memory allocation across the join that optimizes a combined measure of approximation in all scenarios considered is provided.
Keywords
sliding-window; join; maximum subset; arbitrary result; frequency-based model; age-based model; multiple join operation;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 YoungHyoo Yang, "An Efficient Query Processing in Stream DBMS using Query Preprocessor", Journal of The Korea Society of Computer and Information, Vol. 13, No. 1, pp. 65-73, 2008.   과학기술학회마을
2 Dongeon Lee et al., " A Multi-dimensional Query Processing Scheme for Stream Data Using Range Query Indexing", Journal of The Korea Society of Computer and Information, Vol. 14, No. 2, pp. 69-77, 2009.   과학기술학회마을
3 N. Alon, P. Gibbons, Y. Matias, and M. Szegedy. "Tracking join and self-join sizes in limited storage". In Proc. of the 1999 ACM Symp. on Principles of Database Systems, pp. 10-20, 1999.
4 B. Babcock, M. Datar, and R. Motwani. "Sampling from a moving window over streaming data". In Proc. of the 2002 Annual ACMSIAM Symp. on Discrete Algorithms, pp. 633-634, 2002.
5 S. Chaudhuri, R. Motwani, and V.R. Narasayya. "On random sampling over joins". In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pp. 263-274, June 1999.
6 W. G. Cochran. "Sampling Technique"s. John Wiley & Sons, 1977.
7 M. Datar, A. Gionis, P. Indyk, and R. Motwani. "Maintaining stream statistics over sliding windows". In Proc. of the 2002 Annual ACMSIAM Symp. on Discrete Algorithms, pp. 635-644, 2002.
8 A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. "Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintena nce". In Proc. of the 2002 Annual ACM Symp. on Theory of Computing, 2002.
9 L. Golab and M. Ozsu. "Issues in data stream managemen"t. SIGMOD Record, 32(2):pp.5-14, June 2003.   DOI   ScienceOn
10 S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proc. of the 2001 Annual ACM Symp. on Theory of Computing, pp. 471-475, 2001.
11 S. Krishnamurthy et al. "TelegraphCQ: An Architectural Status Repor"t. IEEE Data Engineering Bulletin, 26(1):pp. 11-18, March 2003.
12 R. Motwani and P. Raghavan. "Randomized Algorithms". Cambridge University Press, 1995.
13 J. Kang, J. F. Naughton, and S. Viglas. "Evaluating window joins over unbounded streams", In Proc. of the 2003 Intl. Conf. on Data Engineering, March 2003.
14 The STREAM Group. "STREAM: The Stanford Stream Data Manage"r. IEEE Data Engineering Bulletin, 26(1):pp. 19-26, March 2003.
15 Hong Shen, Yu Zhang, "Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows", Journal of computer science and technology, Volume 23, Number 6, pp.973-987 ISSN 1666-6046 , 2008.   DOI
16 A. Das, J. Gehrke, and M. Riedewald. "Approximate join processing over data streams", In Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, June 2003.
17 B. Babcock, S. Babu, M. Datar, R. Motwani, and J.Widom. "Models and issues in data stream systems", In Proc. of the 2002 ACM Symp. on Principles of Database Systems, pp. 1-16, June 2002.
18 A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. "Processing complex aggregate queries over data streams". In Proc. of the 2002 ACM SIGMOD Intl. Conf. on Management of Data, pp. 61-72, 2002.
19 N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. "Load-shedding in a data stream manage"r. In Proc. of the 2003 Intl. Conf. on Very Large Data Bases, September 2003.
20 T. Urhan and M.J. Franklin. Xjoin, "A reactively-scheduled pipelined join operato"r. IEEE Data Engineering Bulletin, 23(2):pp.27-33, June 2000.
21 B. Babcock, M. Datar, and R. Motwani. "Load-shedding for aggregation queries over data streams". In Proc. of the 2004 Intl. Conf. on Data Engineering, 2004.