DOI QR코드

DOI QR Code

A Review of Window Query Processing for Data Streams

  • Kim, Hyeon Gyu (Division of Computer, Sahmyook University) ;
  • Kim, Myoung Ho (Department of Computer Science, Korea Advanced Institute of Science and Technology)
  • 투고 : 2013.07.01
  • 심사 : 2013.07.30
  • 발행 : 2013.12.30

초록

In recent years, progress in hardware technology has resulted in the possibility of monitoring many events in real time. The volume of incoming data may be so large, that monitoring all individual data might be intractable. Revisiting any particular record can also be impossible in this environment. Therefore, many database schemes, such as aggregation, join, frequent pattern mining, and indexing, become more challenging in this context. This paper surveys the previous efforts to resolve these issues in processing data streams. The emphasis is on specifying and processing sliding window queries, which are supported in many stream processing engines. We also review the related work on stream query processing, including synopsis structures, plan sharing, operator scheduling, load shedding, and disorder control.

키워드

참고문헌

  1. D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik, "Aurora: a new model and architecture for data stream management," VLDB Journal, vol. 12, no. 2, pp. 120-139, 2003. https://doi.org/10.1007/s00778-003-0095-z
  2. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and issues in data stream systems," in Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, WI, 2002, pp. 1-16.
  3. L. Golab and M. T. Oszu, "Issues in data stream management," ACM SIGMOD Record, vol. 32, no. 2, pp. 5-14, 2003. https://doi.org/10.1145/776985.776986
  4. R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma, "Query processing, resource management, and approximation in a data stream management system," in Proceedings of the First Biennial Conference on Innovative Data Systems Research, Asilomar, CA, 2003, pp. 245-256.
  5. A. Arasu, S. Babu, and J. Widom, "The CQL continuous query language: semantic foundations and query execution," VLDB Journal, vol. 15, no. 2, pp. 121-142, 2006. https://doi.org/10.1007/s00778-004-0147-z
  6. J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker, "Semantics and evaluation techniques for window aggregates in data streams," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD, 2005, pp. 311-322.
  7. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah, "TelegraphCQ: continuous dataflow processing for an uncertain world," in Proceedings of the First Biennial Conference on Innovative Data Systems Research, Asilomar, CA, 2003.
  8. C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk, "Gigascope: a stream database for network applications," in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003, pp. 647-651.
  9. J. Chen, D. J. DeWitt, F. Tian, and Y. Wang, "NiagaraCQ: a scalable continuous query system for Internet databases," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000, pp. 379-390.
  10. Y. Bai, H. Thakkar, H. Wang, C. Luo, and C. Zaniolo, "A data stream language and system designed for power and extensibility," in Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, VA, 2006, pp. 337-346.
  11. A. Lerner and D. Shasha, "AQuery: query language for ordered data, optimization techniques, and experiments," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 345-356.
  12. M. Sullivan, "Tribeca: a stream database manager for network traffic analysis," in Proceedings of the 22nd International Conference on Very Large Data Bases, Mumbai, India, 1996, p. 594.
  13. T. Johnson, S. Muthukrishnan, V. Shkapenyuk, and O. Spatscheck, "A heartbeat mechanism and its application in gigascope," in Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, 2005, pp. 1079-1088.
  14. U. Srivastava and J. Widom, "Flexible time management in data stream systems," in Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Paris, France, 2004, pp. 263-274.
  15. N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani, "Design, implementation, and evaluation of the linear road benchmark on the stream processing core," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, IL, 2006, pp. 431-442.
  16. J. S. Vitter, "Random sampling with a reservoir," ACM Transactions on Mathematical Software, vol. 11, no. 1, pp. 37-57, 1985. https://doi.org/10.1145/3147.3165
  17. N. Koudas and S. Muthukrishnan, "Identifying representative trends in massive time series data sets using sketches," in Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, 2000, pp. 363-372.
  18. M. Garofalakis and P. B. Gibbons, "Wavelet synopses with error guarantees," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Madison, WI, 2002, pp. 476-487.
  19. D. Keim and M. Heczko, "Wavelets and their applications in databases," Department of Computer and Information Science, University of Konstanz, Konstanz, Germany, 2001.
  20. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, "Clustering data streams," in Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, 2000, pp. 359-366.
  21. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, "Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS," Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 29-53, 1997. https://doi.org/10.1023/A:1009726021843
  22. J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker, "No pane, no gain: efficient evaluation of sliding-window aggregates over data streams," ACM SIGMOD Record, vol. 34, no. 1, pp. 39-44, 2005. https://doi.org/10.1145/1058150.1058158
  23. A. Arasu and J. Widom, "Resource sharing in continuous sliding-window aggregates," in Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada, 2004, pp. 336-347.
  24. L. Golab and M. T. Ozsu, "Processing sliding window multijoins in continuous queries over data streams," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 500-511.
  25. S. D. Viglas, J. F. Naughton, and J. Burger, "Maximizing the output rate of multi-way join queries over streaming information sources," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 285-296.
  26. A. Das, J. Gehrke, and M. Riedewald, "Approximate join processing over data streams," in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003, pp. 40-51.
  27. J. Kang, J. F. Naughton, and S. D. Viglas, "Evaluating window joins over unbounded streams," in Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 2003, pp. 341-351.
  28. T. H. Kwon, H. G. Kim, M. H. Kim, and J. H. Son, "AMJoin: an advanced join algorithm for multiple data streams using a bit-vector hash table," IEICE Transaction on Information and Systems, vol. 92D, no. 7, pp. 1429-1434, 2009.
  29. T. H. Kwon, K. Y. Lee, and M. H. Kim, "Load shedding for multi-way stream joins based on arrival order patterns," Journal of Intelligent Information Systems, vol. 37, no. 2, pp. 245-265, 2011. https://doi.org/10.1007/s10844-010-0138-z
  30. M. A. Hammad, W. G. Aref, and A. K. Elmagarmid, "Stream window join: tracking moving objects in sensor-network databases," in Proceedings of the 15th International Conference on Scientific and Statistical Database Management, Cambridge, MA, 2003, pp. 75-84.
  31. M. A. Hammad, M. J. Franklin, W. G. Aref, and A. K. Elmagarmid, "Scheduling for shared window joins over data streams," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 297-308.
  32. M. Hong, A. J. Demers, J. E. Gehrke, C. Koch, M. Riedewald, and W. M. White, "Massively multi-query join processing in publish/subscribe systems," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007, pp. 761-772.
  33. R. Avnur and J. M. Hellerstein, "Eddies: continuously adaptive query processing," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, TX, 2000, pp. 261-272.
  34. A. Deshpande, "An initial study of overheads of eddies," ACM SIGMOD Record, vol. 33, no. 1, pp. 44-49, 2004.
  35. B. Babcock, S. Babu, M. Datar, and R. Motwani, "Chain: operator scheduling for memory minimization in data stream systems," in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003, pp. 253-264.
  36. N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker, "Load shedding in a data stream manager," in Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, 2003, pp. 309-320.
  37. B. Babcock, M. Datar, and R. Motwani, "Load shedding for aggregation queries over data streams," in Proceedings of the 20th International Conference on Data Engineering, Boston, MA, 2004, pp. 350-361.
  38. M. Al-Kateb and B. S. Lee, "Load shedding for temporal queries over data streams," Journal of Computing Science and Engineering, vol. 5, no. 4, pp. 294-304, 2011. https://doi.org/10.5626/JCSE.2011.5.4.294
  39. B. Gedik, K. L. Wu, P. S. Yu, and L. Liu, "Adaptive load shedding for windowed stream joins," in Proceedings of the 14th ACM International Conference on Information and Knowledge Management, Bremen, Germany, 2005, pp. 171-178.
  40. Y. N. Law and C. Zaniolo, "Load shedding for window joins on multiple data streams," in Proceedings of the 23rd IEEE International Conference on Data Engineering Workshop, Istanbul, Turkey, 2007, pp. 674-683.
  41. A. Ojewole, Q. Zhu, and W. C. Hou, "Window join approximation over data streams with importance semantics," in Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, VA, 2006, pp. 112-121.
  42. J. Xie, J. Yang, and Y. Chen, "On joining and caching stochastic streams," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD, 2005, pp. 359-370.
  43. U. Srivastava and J. Widom, "Memory-limited execution of windowed stream joins," in Proceedings of the 30th International Conference on Very Large Data Bases, Toronto, Canada, 2004, pp. 324-335.
  44. B. Gedik, K. L. Wu, P. S. Yu, and L. Liu, "A load shedding framework and optimizations for M-way windowed stream joins," in Proceedings of the 23rd IEEE International Conference on Data Engineering, Istanbul, Turkey, 2007, pp. 536-545.
  45. P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras, "Exploiting punctuation semantics in continuous data streams," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 555-568, 2003. https://doi.org/10.1109/TKDE.2003.1198390
  46. H. G. Kim, C. Kim, and M. H. Kim, "Adaptive disorder control in data stream processing," Computing and Informatics, vol. 31, no. 2, pp. 393-410, 2012.
  47. L. Ding, N. Mehta, E. A. Rundensteiner, and G. T. Heineman, "Joining punctuated streams," in Proceedings of the 9th International Conference on Extending Database Technology, Heraklion, Greece, 2004, pp. 587-604.
  48. S. Babu, U. Srivastava, and J. Widom, "Exploiting k-constraints to reduce memory overhead in continuous queries over data streams," ACM Transactions on Database Systems, vol. 29, no. 3, pp. 545-580, 2004. https://doi.org/10.1145/1016028.1016032
  49. D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J. H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik, "The design of the borealis stream processing engine," in Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research, Asilomar, CA, 2005, pp. 277-289.
  50. N. Tatbul, "Streaming data integration: challenges and opportunities," in Proceedings of the 26th IEEE International Conference on Data Engineering Workshop, Long Beach, CA, 2010, pp. 155-158.
  51. M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining data streams: a review," ACM SIGMOD Record, vol. 34, no. 2, pp. 18-26, 2005.
  52. C. C. Aggarwal and P. S. Yu, "A survey of synopsis construction in data streams," in Data Streams, Heidelberg, Germany: Springer, 2007, pp. 169-207.
  53. A. R. Mahdiraji, "Clustering data stream: a survey of algorithms," International Journal of Knowledge-Based and Intelligent Engineering Systems, vol. 13, no. 2, pp. 39-44, 2009. https://doi.org/10.3233/JAD-2009-0168

피인용 문헌

  1. Online monitoring of transformer through stream clustering of partial discharge signals pp.1751-8830, 2018, https://doi.org/10.1049/iet-smt.2018.5389