A Query Pruning Technique for Optimizing Regular Path Expressions in Semistructured Databases

준구조적 데이타베이스에서의 정규경로표현 최적화를 위한 질의전지 기법

  • 박창원 (LG전자기술원 정보기술연구소) ;
  • 정진완 (한국과학기술원 전산학과)
  • Published : 2002.06.01

Abstract

Regular path expressions are primary elements for formulating queries over the semistructured data that does not assume the conventional schemas. In addition, the query pruning is an important optimization technique to avoid useless traversals in evaluating regular path expressions. However, the existing query pruning often fails to fully optimize multiple regular path expressions, and the previous methods that post-process the result of the existing query pruning must check exponential combinations of sub-results. In this paper, we present a new query pruning technique that consists of the preprocessing phase and the pruning phase. Our two-phase query pruning is affective in optimizing multiple regular path expressions, and is more scalable than the previous methods in that it never check the exponential combinations of sub-results.

정규경로표현은 전통적 스키마를 가정하지 않는 준구조적 데이타에 대해 질의를 고안하기 위한 기본적 질의 요소이다. 그리고 질의전지는 정규경로표현의 처리에 있어 불필요한 탐색을 제거하기 위한 중요한 최적화 기법이다. 그러나 기존 질의전지는 다중 정규경로표현들은 완전히 최적화하지 못하는 경우가 많으며, 기존 질의전지의 결과를 후처리하는 기존의 방법은 지수적으로 증가하는 많은 부분결과들의 조합들을 확인해야 한다. 본 논문에서는 전처리 단계와 전지 단계로 구성된 두 단계 질의전지라 부르는 새로운 기법을 소개한다. 우리의 두 단계 질의전지는 다중 정규경로표현의 최적화에 효과적이며, 지수적으로 증가하는 많은 부분결과들의 조합들을 화인하지 않는다는 점에서 기존의 방법보다 더 확장성이 있다.

Keywords

References

  1. Serge Abiteboul, 'Querying Semi-Structured Data,' Proceedings of the 6th International Conference on Database Theory, pages 1-18, 1997
  2. Serge Abiteboul, Peter Buneman, and Dan Suciu, Data on the Web: From Relations to Semi-structured Data and XML, Morgan Kaufmann Publishers, San Francisco, California, USA, 2000
  3. Peter Buneman, 'Semistructured Data,' Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 51-61, 1997 https://doi.org/10.1145/263661.263675
  4. Peter Buneman, Susan B. Davidson, Mary F. Fernandez. and Dan Suciu, 'Adding Structure to Unstructured Data,' Proceedings of the 6th International Conference on Database Theory. pages 336-350, 1997
  5. Roy Goldman and Jennifer Widom, 'DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases,' Proceedings of the 23rd International Conference on Very Large Data Bases, pages 436-445, 1997
  6. Svetlozar Nestorov, Jeffrey D. Ullman, Janet L. Wiener, Sudarshan S. Chawathe, 'Representative Objects: Concise Representations of Semi-structured, Hierarchical Data.' Proceedings of the 13th International Conference on Data Engineering, pages 79-90, 1997 https://doi.org/10.1109/ICDE.1997.581741
  7. Mary F. Fermandez and Dan Suciu, 'Optimizaing Regular Path Expressions Using Graph Schemas,' Proceedings of the 14th International Conference on Data Engineering, pages 14-23, 1998 (The full version is available at http://www.cs.washington.edu/homes/suciu/files/paper-techrep.ps) https://doi.org/10.1109/ICDE.1998.655753
  8. Jason McHugh and Jennifer Widom, 'Compile-Time Path Expansion in Lore,' Proceedings of the Workshop on Query Processing for Semi-structured Data and Non-Standard Data Formats, 1999
  9. Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi, 'Rewriting of Regular Expressions and Regular Path Queries,' Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 194-204, 1999 https://doi.org/10.1145/303976.303996
  10. Jason McHugh and Jennifer Widom, 'Optimizaing Branching Path Expressions,' Technical Report, Stanford University, 1999
  11. Michael Kifer, Won Kim, and Yehoshua Sagiv, 'Querying Object-Oriented Databases,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 393-402, 1992 https://doi.org/10.1145/130283.130342
  12. Vassilis Christophides, Serge Abiteboul, Sophie Cluet, and Michel Scholl, 'From Structured Documents to Novel Query Facilities,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 313-324, 1994 https://doi.org/10.1145/191839.191901
  13. Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey D. Ullman, and Jennifer Widom, 'Querying Semistructured Heterogeneous Information.' Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases, pages 319-344, 1995
  14. Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener, 'The Lorel Query Language for Semistructured Data,' International Journal on Digital Libraries, 1(1), pages 68-88, 1997 https://doi.org/10.1007/s007990050005
  15. Alon Y. Halevy, 'Theory of Answering Queries Using Views,' SIGMOD Record, 29(4), 2000 https://doi.org/10.1145/369275.369284
  16. Jason McHugh and Jennifer Widom, 'Query Optimization for XML,' Proceedings of the 25th International Conference on Very Large Data Bases, pages 315-326, 1999
  17. Vassilis Christophides, Sophie Cluet, and Guido Moerkotte, 'Evaluating Queries with Generalized Path Expressions,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 413-422, 1996 https://doi.org/10.1145/233269.233358