Browse > Article

Dynamic Data Cubes Over Data Streams  

Seo, Dae-Hong (Telcoware 네트워크사업부 Framework Solution 팀)
Yang, Woo-Sock (연세대학교 컴퓨터과학과)
Lee, Won-Suk (연세대학교 컴퓨터과학과)
Abstract
Data cube, which is multi-dimensional data model, have been successfully applied in many cases of multi-dimensional data analysis, and is still being researched to be applied in data stream analysis. Data stream is being generated in real-time, incessant, immense, and volatile manner. The distribution characteristics of data arc changing rapidly due to those characteristics, so the primary rule of handling data stream is to check once and dispose it. For those characteristics, users are more interested in high support attribute values observed rather than the entire attribute values over data streams. This paper propose dynamic data cube for applying data cube to data stream environment. Dynamic data cube specify user's interested area by the support ratio of attribute value, and dynamically manage the attribute values by grouping each other. By doing this it reduce the memory usage and process time. And it can efficiently shows or emphasize user's interested area by increasing the granularity for attributes that have higher support. We perform experiments to verify how efficiently dynamic data cube works in limited memory usage.
Keywords
Data Stream; OLAP; Data Cube;
Citations & Related Records
연도 인용수 순위
  • Reference
1 The OLAP Council., "MD-API the OLAP Application Program Interface Version 0.5 Specification," 1996
2 S. Chaudhuri and U. Dayal, "An overview of data warehousing and OLAP technology," SIGMOD Record, Vol.26, pp. 65-74, 1997   DOI   ScienceOn
3 George Colliat, "OLAP, relational and multidimensional database systems," ACM SIGMOD Record, Vol.25, No.3, pp. 64-69, 1995
4 Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang. "Efficient computation of iceberg cubes with complex measures," ACM SIGMOD Record, Vol.30, No.2, pp. 1-12, 2001   DOI   ScienceOn
5 J. Gray, S.Chaudhuri, A.Bosworth, A.Layman, D. Reichart, M.Venkatrao, "Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals," Data Mining and Knowledge Discovery, Vol.1, pp. 29-53, 1997   DOI   ScienceOn
6 Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi, "Modeling multidimensional database. In Proc.," the 13th Intl conference on Data Engineering, Birmingham, U.K., pp. 232-243, 1997
7 Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang, "Efficient Computation of Iceberg Cubes with Complex Measures," SIGMOD Conference, Vol.30, No.2, pp. 1-12, 2001   DOI   ScienceOn
8 D. Xin, J. Han, X. Li, and B.W. Wah, "Star- cubing: Computing iceberg cubes by top-down and bottom-up integration," Proceedings of the 29th international conference on Very large data bases, Vol.29, pp. 476-487, 2003
9 Inmon, W.H., Building the Data Warehouse, John Wiley, 1992
10 V. Harinarayan, A. Rajaraman, and J.D. Ullman, "Implementing data cubes efficiently," ACM SIGMOD Record, Vol.25, No.2, pp. 205-216, 1996   DOI   ScienceOn
11 Jiawei Han, Yixin Chen, Guozhu Dong, Jian Pei, Benjamin W. Wah, Jianyong Wang, Y. Dora Cai, "Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams," Distributed and Parallel Databases, Vol.18, No.2, pp. 173-197, 2005   DOI   ScienceOn
12 K. Beyer and R. Ramakrishnan, "Bottom-up computation of sparse and iceberg cubes," ACM SIGMOD Record, Vol.28, No.2, pp. 359-370, June 1999   DOI
13 M.E.J. NEWMAN, "Power laws, Pareto distributions and Zipf's law," Contemporary Physics, Vol.46, No.5, pp. 323-351, 2005   DOI   ScienceOn
14 M. Garofalakis, J. Gehrke, and R. Rastogi., "Querying and Mining Data Streams: You Only Get One Look," In tutorial notes of the 28th International Conference on Very Large Data Bases, TUTORIAL SESSION: Tutorial 1, pp. 635-635, 2002
15 Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, "Multi-dimensional regression analysis of time-series data streams," Proceedings of the 28th international conference on VLDB, pp. 323-334, 2002
16 Z. Shao, J. Han, and D. Xin, "MM-Cubing: Computing iceberg cubes by factorizing the lattice space," Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 213-222, June 2004