큐보이드 전위트리를 이용한 빙산질의 처리

Iceberg Query Evaluation Technical Using a Cuboid Prefix Tree

  • 발행 : 2009.06.15

초록

무한한 데이터 스트림을 저장하는 것은 거의 불가능하기 때문에 데이터 스트림 환경에서 빙산질의를 수행하기 위해서는 새로운 데이터 구조와 알고리즘이 요구된다. 본 논문에서는 데이터 스트림 환경에서 빙산질의를 처리하기 위해 전위트리 구조에 기반한 규보이드 전위트리(Euboid prefix tree)를 제안한다. 큐보이드 전위트리는 빙산질의에 사용된 그룹항목으로 이루어진 항목집합만을 트리에서 관리하므로 전위트리보다 적은 메모리를 사용한다. 1-항목 관리를 통해서 빈발하지 않은 항목을 트랜잭션에서 제거함으로써 갱신 시 불필요하게 소요되는 시간을 줄일 수 있다. 또한 다중 빙산질의에서 공통적으로 사용된 그룹속성에 따라 노드를 공유함으로써 적은 메모리를 사용하여 효율적으로 다중 빙산질의를 처리할 수 있는 방법을 제안한다. 큐보이드 전위트리는 무한히 연속적으로 생성되는 데이터에 대하여 빙산질의를 처리하는데 있어서 메모리 사용량과 처리시간을 효과적으로 줄이며, 이를 여러 실험을 통해 확인하였다.

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to the characteristics of a data stream, it is impossible to save all the data elements of a data stream. Therefore it is necessary to define a new synopsis structure to store the summary information of a data stream. For this purpose, this paper proposes a cuboid prefix tree that can be effectively employed in evaluating an iceberg query over data streams. A cuboid prefix tree only stores those itemsets that consist of grouping attributes used in GROUP BY query. In addition, a cuboid prefix tree can compute multiple iceberg queries simultaneously by sharing their common sub-expressions. A cuboid prefix tree evaluates an iceberg query over an infinitely generated data stream while efficiently reducing memory usage and processing time, which is verified by a series of experiments.

키워드

참고문헌

  1. M. Garofalakis, J. Gehrke and R. Rastogi., 'Querying and mining data streams: you only get one look,' In the tutorial notes of the 28th International Conference on Very Large Databases. TUTORIAL SESSION: Tutorial 1, pp. 635-635, 2002 https://doi.org/10.1145/564691.564794
  2. S. Krishnamurthy, C. Wu, and M. Franklin., 'On-the-Fly Sharing for Streamed Aggregation,' In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 623-634, 2006 https://doi.org/10.1145/1142473.1142543
  3. J. Chen, D.J. DeWitt, F. Tian, and Y. Wang., 'NiagaraCQ: A scalable continuous query system for internet databases,' Proceedings of' the' 2000 ACM SIGMOD International Conference on Know-ledge Discovery and Data Mining, pp. 379-390, 2000 https://doi.org/10.1145/335191.335432
  4. S. Madden, M.A. Shah, J.M. Hellerstein, and V. Raman., 'Continuously adaptive continuous queries over streams,' In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 49-60, 2002 https://doi.org/10.1145/564691.564698
  5. S. Chandrasekaran, M.J. Franklin., 'Streaming queries over streaming data,' In Proceedings of 28th International Conference on Very Large Data Bases pp. 203-204, 2002
  6. A. Arasu, J. Widom., 'Resource Sharing in Continuous Sliding- Window Aggregates' In Proceedings of 30th International Conference on Very Large Data Bases, pp. 336-347, 2004
  7. Rui Zhang, Nick Koudas, Beng Chin Ooi, Divesh Srivastava, 'Multiple Aggregations Over Data Streams,' In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp, 299-310, 2005 https://doi.org/10.1145/1066157.1066192
  8. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J.D. Ullman., 'Computing Iceberg Queries Efficiently' In Proceedings of 24rd International Conference on Very Large Data Bases, pp. 299-310. 1998
  9. R. Agrawal, R. Srikant., 'Fast Algorithms for Mining Association Rules in Large Databases,' In Proceedings of 20th International Conference on Very Large Data Bases, pp. 487-499, 2004
  10. S. Brin, R. Motwani, J.D. Ullman, and S. Tsur., 'Dynamic itemset counting and implication rules for market basket data,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 255-264, 1997 https://doi.org/10.1145/253260.253325
  11. A. Savasere, E. Omiecinski, and S. Navathe., 'An Efficient Algorithm for Mining Association Rules in Large Databases,' In Proceedings of 20th International Conference on Very Large Data Bases, pp. 432-444. 1995
  12. C. Hidber., 'Online association rule mining,' In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 145-156, 1999 https://doi.org/10.1145/304181.304195
  13. D. Cheug, J. Han, V. Ng, and C.Y. Wong., 'Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique for Maintaining Discovered Association Rules,' In Proceedings of the 12th International Conference on Data Engineering. pp. 106-114, 1996
  14. V. Ganti, J. Gehrke, and R. Ramakrishnan., 'DAEMON: Mining and Monitoring Evolving Data,' In Proceedings of the 16th International Conference on Data Engineering, pp. 439-448, 2000 https://doi.org/10.1109/ICDE.2000.839443
  15. G.S. Manku and R. Motwani., 'Approximate Frequency Counts over Data Streams,' In Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346-357, 2002
  16. J.H. lang and W,S. Lee., 'Finding recent frequent itemsets adaptively over online data streams,' In Proceedings of the 2003 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 487-492, 2003 https://doi.org/10.1145/956750.956807