Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2004.11D.2.325

A Genetic Algorithm for Materialized View Selection in Data Warehouses  

Lee, Min-Soo (이화여자대학교 컴퓨터학과)
Abstract
A data warehouse stores information that is collected from multiple, heterogeneous information sources for the purpose of complex querying and analysis. Information in the warehouse is typically stored In the form of materialized views, which represent pre-computed portions of frequently asked queries. One of the most important tasks of designing a warehouse is the selection of materialized views to be maintained in the warehouse. The goal is to select a set of views so that the total query response time over all queries can be minimized while a limited amount of time for maintaining the views is given(maintenance-cost view selection problem). In this paper, we propose an efficient solution to the maintenance-cost view selection problem using a genetic algorithm for computing a near-optimal set of views. Specifically, we explore the maintenance-cost view selection problem in the context of OR view graphs. We show that our approach represents a dramatic improvement in terms of time complexity over existing search-based approaches that use heuristics. Our analysis shows that the algorithm consistently yields a solution that only has an additional 10% of query cost of over the optimal query cost while at the same time exhibits an impressive performance of only a linear increase in execution time. We have implemented a prototype version of our algorithm that is used to evaluate our approach.
Keywords
Data Warehouse; Genetic Algorithm; View Maintenance; View Materialization; View Selection; Warehouse Configuration;
Citations & Related Records
연도 인용수 순위
  • Reference
1 N. Roussopoulos, Materialized Views and Data Warehouses, in Proceedings of the Workshop on Knowledge Representation meets Databases(KRDB), 12.1-12.6, Athens, Greece, 1997
2 W. H. Inmon and C. Kelley, Rdb/VMS : Developing the Data Warehouse, QED Publishing Group, Boston, London, Toronto, 1993
3 J. Widom, Research Problems in Data Warehousing, in Proceedings of the Fourth International Conference on Information and Knowledge Management, Baltimore, Maryland, pp.25-30, 1995   DOI
4 Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom, View Maintenance in a Warehousing Environment, SIGMOD Record(ACM Special Interest Group on Management of Data) Vol.24, No.2, 316-27, 1995   DOI   ScienceOn
5 N. Roussopoulos, View Indexing in relational Databases, ACM Transactions on Database Systems Vol.7, No.2, pp.258-290, 1982   DOI   ScienceOn
6 A. Gupta and I. S. Mumick, Maintenance of Materialized Views : Problems, Techniques, and Applications, Data Engineering Bulletin, Special Issue on Materialized Views and Data Warehousing, Vol.18, No.2, pp.3-18, 1995
7 H. Gupta, Selection of Views to Materialize in a Data Warehouse, in Proceedings of the International Conference on Database Theory, Delphi, Greece, pp. 98-112, 1997
8 D. Theodoratos and T. K. Sellis, Data Warehouse Configuration, in Proceedings of the Twenty-third International Conference on Very Large Databases, Athens, Greece, pp.126-135, 1997
9 D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, Mass, 1989
10 Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Sringer-Verlag, New York, NY., 1994
11 MIT Technology Lab, GAlib : A C++ Library of Genetic Algorithm Components, URL, http://lancet.mit.edu/ga/
12 K. A. Ross, D. Srivastava, and S. Sudarshan, Materialized view maintenance and integrity constraint checking : Tradingspace for time, SIGMOD Record (ACM Special Interest Group on Management of Data), Vol.25, No.2, pp.447-458, 1996
13 W. Labio, D. Quass and B. Adelberg, Physical Database Design for Data Warehouses, in Proceedings of the International Conference on Database Engineering, Birmingham, England, pp.277-288, 1997   DOI
14 H. Gupta and I. Mumick, Selection of Views to Materialize Under a Maintenance Cost Constraint, in Proceedings of the International Conference on Management of Data, Jerusalem, Israel, pp.453-470, 1999
15 A. Swami, Optimization of large join queries : combining heuristics and combinational techniques, SIGMOD Record, Vol.18, No.2, pp.367-76, 1989   DOI
16 I. W. Flockhart and N. J. Radcliffe, A Genetic Algorithm-based Approach to Data Mining, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining(KDD-96), Portland, Oregon, pp.299-302, 1997
17 S. A. Cook, The Complexity of Theorem Proving Procedure, Annual ACM SIGACT Symposium on Theory of Computing, Vol.3, pp.151-158, 1971   DOI
18 S. Augier, G. Venturini and Y. Kodratoff, Learning First Order Logic Rules with a Genetic Algorithm, in Proceedings of the First International Conference on Knowledge Discovery and Data Mining(KDD-95), Montreal, Canada, pp.21-26, 1995
19 M. R. Garey and D. S. Johnson, Computers and Intractability-A Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979
20 E. H. L. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley, Chichester, UK, 1989
21 P. J. M. v. Laarhoven and E. H. L. Aarts, Simulated Annealing : Theory and Applications, Kluwer, Dordrecht, Holland, 1987