DOI QR코드

DOI QR Code

A Genetic Algorithm for Materialized View Selection in Data Warehouses

데이터웨어하우스에서 유전자 알고리즘을 이용한 구체화된 뷰 선택 기법

  • 이민수 (이화여자대학교 컴퓨터학과)
  • Published : 2004.04.01

Abstract

A data warehouse stores information that is collected from multiple, heterogeneous information sources for the purpose of complex querying and analysis. Information in the warehouse is typically stored In the form of materialized views, which represent pre-computed portions of frequently asked queries. One of the most important tasks of designing a warehouse is the selection of materialized views to be maintained in the warehouse. The goal is to select a set of views so that the total query response time over all queries can be minimized while a limited amount of time for maintaining the views is given(maintenance-cost view selection problem). In this paper, we propose an efficient solution to the maintenance-cost view selection problem using a genetic algorithm for computing a near-optimal set of views. Specifically, we explore the maintenance-cost view selection problem in the context of OR view graphs. We show that our approach represents a dramatic improvement in terms of time complexity over existing search-based approaches that use heuristics. Our analysis shows that the algorithm consistently yields a solution that only has an additional 10% of query cost of over the optimal query cost while at the same time exhibits an impressive performance of only a linear increase in execution time. We have implemented a prototype version of our algorithm that is used to evaluate our approach.

데이터 웨어하우스는 복잡한 질의 및 분석을 위해서 다양한 종류의 여러 정보 출처들로부터 정보를 모아서 저장한다. 일반적으로 웨어하우스에는 자주 실행되는 질의들을 미리 계산해서 구체화된 뷰의 형태로 저장한다. 웨어하우스를 설계할 때 가장 중요한 일들 중의 하나는 웨어하우스에서 유지될 구체화된 뷰의 선택이다. 이것은 뷰들의 유지를 위해 제한된 시간이 주어졌을 때, 모든 질의들에 대한 총 질의 응답 시간을 최소화하는 방법으로 일련의 뷰들을 선택하는 것이다(유지-비용 뷰 선택 문제). 본 논문에서는 최적에 가까운 일련의 뷰들을 계산하기 위해 유전자 알고리즘을 사용하여 유지-비용 뷰 선택 문제에 대한 효율적인 해결책을 제안한다. 특히 OR 뷰 그래프들의 관점에서의 유지-비용 뷰 선택 문제를 다룬다. 본 논문의 접근방식은 휴리스틱 방법을 사용한 기존의 탐색-기반 접근 방식들에 비해서, 시간 복잡도에서 큰 향상을 보여준다. 본 논문의 알고리즘은 최적의 질의 비용에 비해 10%이내의 추가비용만을 갖는 해결책을 제시하면서도 실행시간 측면에서는 매우 향상된 선형 증가만을 보인다. 본 논문의 알고리즘에 대한 프로토타입을 구현하였으며 이것을 사용하여 논문에서 제안하는 접근방식의 분석을 수행하였다.

Keywords

References

  1. W. H. Inmon and C. Kelley, Rdb/VMS : Developing the Data Warehouse, QED Publishing Group, Boston, London, Toronto, 1993
  2. J. Widom, Research Problems in Data Warehousing, in Proceedings of the Fourth International Conference on Information and Knowledge Management, Baltimore, Maryland, pp.25-30, 1995 https://doi.org/10.1145/221270.221319
  3. N. Roussopoulos, Materialized Views and Data Warehouses, in Proceedings of the Workshop on Knowledge Representation meets Databases(KRDB), 12.1-12.6, Athens, Greece, 1997
  4. A. Gupta and I. S. Mumick, Maintenance of Materialized Views : Problems, Techniques, and Applications, Data Engineering Bulletin, Special Issue on Materialized Views and Data Warehousing, Vol.18, No.2, pp.3-18, 1995
  5. Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom, View Maintenance in a Warehousing Environment, SIGMOD Record(ACM Special Interest Group on Management of Data) Vol.24, No.2, 316-27, 1995 https://doi.org/10.1145/568271.223848
  6. H. Gupta, Selection of Views to Materialize in a Data Warehouse, in Proceedings of the International Conference on Database Theory, Delphi, Greece, pp. 98-112, 1997
  7. D. Theodoratos and T. K. Sellis, Data Warehouse Configuration, in Proceedings of the Twenty-third International Conference on Very Large Databases, Athens, Greece, pp.126-135, 1997
  8. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, Mass, 1989
  9. Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Sringer-Verlag, New York, NY., 1994
  10. N. Roussopoulos, View Indexing in relational Databases, ACM Transactions on Database Systems Vol.7, No.2, pp.258-290, 1982 https://doi.org/10.1145/319702.319729
  11. K. A. Ross, D. Srivastava, and S. Sudarshan, Materialized view maintenance and integrity constraint checking : Tradingspace for time, SIGMOD Record (ACM Special Interest Group on Management of Data), Vol.25, No.2, pp.447-458, 1996
  12. W. Labio, D. Quass and B. Adelberg, Physical Database Design for Data Warehouses, in Proceedings of the International Conference on Database Engineering, Birmingham, England, pp.277-288, 1997 https://doi.org/10.1109/ICDE.1997.581802
  13. H. Gupta and I. Mumick, Selection of Views to Materialize Under a Maintenance Cost Constraint, in Proceedings of the International Conference on Management of Data, Jerusalem, Israel, pp.453-470, 1999
  14. A. Swami, Optimization of large join queries : combining heuristics and combinational techniques, SIGMOD Record, Vol.18, No.2, pp.367-76, 1989 https://doi.org/10.1145/66926.66961
  15. S. Augier, G. Venturini and Y. Kodratoff, Learning First Order Logic Rules with a Genetic Algorithm, in Proceedings of the First International Conference on Knowledge Discovery and Data Mining(KDD-95), Montreal, Canada, pp.21-26, 1995
  16. I. W. Flockhart and N. J. Radcliffe, A Genetic Algorithm-based Approach to Data Mining, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining(KDD-96), Portland, Oregon, pp.299-302, 1997
  17. S. A. Cook, The Complexity of Theorem Proving Procedure, Annual ACM SIGACT Symposium on Theory of Computing, Vol.3, pp.151-158, 1971 https://doi.org/10.1145/800157.805047
  18. M. R. Garey and D. S. Johnson, Computers and Intractability-A Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979
  19. E. H. L. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines, John Wiley, Chichester, UK, 1989
  20. P. J. M. v. Laarhoven and E. H. L. Aarts, Simulated Annealing : Theory and Applications, Kluwer, Dordrecht, Holland, 1987
  21. MIT Technology Lab, GAlib : A C++ Library of Genetic Algorithm Components, URL, http://lancet.mit.edu/ga/