MMJoin: An Optimization Technique for Multiple Continuous MJoins over Data Streams

데이타 스트림 상에서 다중 연속 복수 조인 질의 처리 최적화 기법

  • 변창우 (인하공업전문대학 컴퓨터시스템과) ;
  • 이헌주 (서강대학교 컴퓨터공학과) ;
  • 박석 (서강대학교 컴퓨터공학과)
  • Published : 2008.02.15

Abstract

Join queries having heavy cost are necessary to Data Stream Management System in Sensor Network where plural short information is generated. It is reasonable that each join operator has a sliding-window constraint for preventing DISK I/O because the data stream represents the infinite size of data. In addition, the join operator should be able to take multiple inputs for overall results. It is possible for the MJoin operator with sliding-windows to do so. In this paper, we consider the data stream environment where multiple MJoin operators are registered and propose MMJoin which deals with issues of building and processing a globally shared query considering characteristics of the MJoin operator with sliding-windows. First, we propose a solution of building the global shared query execution plan. Second, we solved the problems of updating a window size and routing for a join result. Our study can be utilized as a fundamental research for an optimization technique for multiple continuous joins in the data stream environment.

센서 네트워크에 이용되는 데이타 스트림 관리 시스템에서는 한정적 정보들이 개별적으로 입력되기 때문에 종합적인 결과를 얻기 위해서는 상대적인 계산 비용이 높은 조인 연산자는 필연적으로 요구된다. 데이타 스트림은 잠재적으로 무한한 크기를 가지므로 조인 연산자는 슬라이딩 윈도우 제약사항을 가져야 함은 당연하다. 또한, 종합적인 결과를 얻기 위해 조인 연산자는 여러 입력을 취할 수 있어야 한다. 이를 가능하게 하는 것이 바로 슬라이딩 윈도우를 가지는 MJoin 연산자이다. 본 논문에서는 이러한 여러 MJoin 연산자가 시스템에 등록되어 있는 환경을 가정하고, 슬라이딩 윈도우를 가지는 MJoin의 특성을 반영하여 전역적으로 공유된 질의 처리 기법인 MMJoin 기법을 제안한다. MMJoin 기법은 첫째, 전역적으로 공유된 질의 실행 계획 수릴 문제, 조인 연산 결과에 대한 윈도우 갱신 문제 및 라우팅 문제로 나누어 다룬다. 이러한 연구의 노력은 데이타 스트림 환경에서 효율적인 다중 질의 최적화 및 처리 기법의 기초연구로 활용될 수 있다.

Keywords

References

  1. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and Issues in Data Stream Systems," In Proc. 21st ACM Sym. on Principles of Database Systems, pp. 1-16, 2002
  2. J. Naughton, D. DeWitt, and D. Maier. The Niagara Internet Query System. IEEE Data Engineering Bulletin, Vol.24, No.2, pp. 27-33, 2001
  3. D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. The International Journal on Very Large Data Bases, Vol.12, Issue 2, pp. 120-139, 2003 https://doi.org/10.1007/s00778-003-0095-z
  4. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah, "TelegraphCQ: Continuous Dataflow Processing for an Uncertain World," In Proc. 1st Biennial Conf. on Innovative Database Research, pp. 269-280, 2003
  5. P. Bonnet, J. Gehrke, and P. Seshadri, "Towards Sensor Database Systems," In Proc. 2th Int. Conf. on Mobile Data Management, pp. 3-14, 2001
  6. S. Schmidt, M. Fiedler, and W.Lehner, "Source- aware Join Strategies of Sensor Data Streams," In Proc. 17th Int. Conf. on Scientific and statistical database management, pp. 123-132, 2005
  7. A. N. Wilschut and P. M. G. Apers, "Pipelining in query execution," Conf. on Database, Parallel Architectures and their Applications, p.562, 1991
  8. T. Urhan and M. J. Franklin. XJoin: A reactively- scheduled pipelined join operator. IEEE Data Engineering Bulletin, Vol.23, No.2, pp. 27-33, 2000
  9. S. D. Viglas, J. F. Naughton, and J. Burger, "Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources," In Proc. 29th VLDB Conf., pp. 285-296, 2003
  10. L. Golab and M. T. Ozau, "Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams," In Proc. 29th VLDB Conf., pp. 500-511, 2003
  11. J. Kang, J. F. Naughton, and S. D. Viglas, "Evaluating Window Joins over unbounded Streams," In ICDE03, pp. 341-352, 2003
  12. L. Ding and E. A. Rundensteiner, "Evaluating Window Joins over Punctuated Streams," In Proc. 13th ACM Int. Conf. on Information and Knowledge Management, pp. 98-107, 2004
  13. K. Shim and T. Sellis. Multiple-query optimization. ACM Transactions on Database Systems, Vol.13, Issue 1, pp. 23-52, 1988 https://doi.org/10.1145/42201.42203
  14. J. Chen, and D. J. DeWitt, "Dynamic Re-grouping of Continuous Queries," In Proc. 28th VLDB Conf., pp.430-441, 2002
  15. Y. Watanabe, and H. Kitagawa, "A Multiple Continuous Query Optimization Method Based on Query Execution Pattern Analysis," DASFAA 2004, LNCS 2973, pp. 443-456, 2003
  16. T. M. Ghanem, W. G. Aref, and A. K. Elmagarmid. Exploiting Predicate-Window Semantics over Data Streams. ACM SIGMOD Record, Vol. 35, Issue 1. March, pp. 555-568, 2006
  17. M. Hammad, M. Franklin, W. Aref, and A. Elmagarmid, "Scheduling for Shared Window Joins over Data Streams," In Proc. 29th VLDB Conf., pp. 297-308, 2003
  18. S. Wang, E. Rundensteiner, S. Ganguly, and S. Bhatnagar, "State-Slice: New Paradigm of Multi- Query Optimization of Window-Based Stream Queries," In Proc. 32nd VLDB Conf., pp.619-630, 2006
  19. S. Krishnamurthy, M.J. Franklin, J. M. Hellerstein, and G. Jacobson, "The Case for Precision Sharing," In Proc. 30th VLDB Conf., pp. 972-986, 2004
  20. C. D. Manning and H. SchUtze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999