An XML Query Optimization Technique by Signature based Block Traversing

시그니처 기반 블록 탐색을 통한 XML 질의 최적화 기법

  • 박상원 (서울대학교 컴퓨터공학부) ;
  • 박동주 (삼성전자 정보통신총괄 통신연구소) ;
  • 정태선 (서울대학교 컴퓨터공학부) ;
  • 김형주 (서울대학교 컴퓨터공학부)
  • Published : 2002.02.01

Abstract

Data on the Internet are usually represented and transfered as XML. the XML data is represented as a tree and therefore, object repositories are well-suited to store and query them due to their modeling power. XML queries are represented as regular path expressions and evaluated by traversing each object of the tree in object repositories. Several indexes are proposed to fast evaluate regular path expressions. However, in some cases they may not cover all possible paths because they require a great amount of disk space. In order to efficiently evaluate the queries in such cases, we propose an optimized traversing which combines the signature method and block traversing. The signature approach shrink the search space by using the signature information attached to each object, which hints the existence of a certain label in the sub-tree. The block traversing reduces disk I/O by early evaluating the reachable objects in a page. We conducted diverse experiments to show that the hybrid approach achieves a better performance than the other naive ones.

인터넷에서 사용되는 많은 데이터들이 XML로 표현되고 있는 추세이다. 이러한 XML 데이터는 트리 형태로 표현되므로 이것을 저장하고 질의하는 시스템으로 그 모델링 능력 때문에 객체 저장소가 적합하다. 객체 저장소에서 XML의 각 노드는 객체로 저장된다. XML 질의의 특징은 정규 경로식으로 표현되는 것이며 이것은 XML 트리의 각 객체를 탐색하면서 처리된다. 정규 경로식을 지원하기 위하여 여러 인덱스들이 제안되었지만 이러한 인덱스들은 디스크 공간이라는 제약 때문에 모든 가능한 경로에 대한 인덱스를 제공하지는 못한다. 이러한 상태에서 정규 경로식을 잘 지원하기 위해서 블록 탐색과 시그니처 방법을 이용하여 질의를 효과적으로 처리하는 최적 객체 탐색 기법을 제안하였다. 시그니처는 트리의 각 노드에 시그니처를 첨가하여 탐색 범위를 줄이는 것이다. 블록 탐색은 한 페이지 내에 있는 접근 가능한 객체들을 미리 처리함으로써 디스크 I/O를 줄이는 것이다. 이와 같은 두가지 방법을 같이 이용하면 일반적인 질의 처리보다 월등히 나은 성능을 보인다는 것을 실험을 통하여 보였다.

Keywords

References

  1. Serge Abiteboul., Querying Semistructured Data, International Conference on Database Theory, January 1997
  2. P. Buneman, Semistructured Data, ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, May 1997 https://doi.org/10.1145/263661.263675
  3. Jason McHugh, Serge Abiteboul, Roy Goldman, Dalian Quass, and Jennifer Widom, Lore: A Database Management System for Semistructured Data, SIGMOD Record, 26(3), 9 1997 https://doi.org/10.1145/262762.262770
  4. eXcelon, An XML Data Server For Building Enterprise Web Applications, http://www.odi.com/products/white_papers.html, 1999
  5. Gerald Huch, Ingo Macherius and Peter Fankhauser, PDOM: Lightweight Persistency Support for the Document Object Model, OOPSLA, November, 1999
  6. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, The Lorel Query Language for Semistructured Data, International Journal on Digital Library, 1(1), 4 1997 https://doi.org/10.1007/s007990050005
  7. Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu, XML-QL: A Query Language for XML, htttp://www.w3.org/TR/NOTE-xml-ql, August 1998
  8. Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu, A Query Language and Optimization Techniques for Unstructured Data, SIGMOD, 1996 https://doi.org/10.1145/235968.233368
  9. V. Chnstophides, S. Abiteboul, S. Cluet, and M. Scholl, From Structured Documents to Novel Query Facilities, SIGMOD, 1994 https://doi.org/10.1145/191839.191901
  10. Elisa Bertino and Won Kim, Indexing Techniques for Queries on Nested Objects, IEEE Transactions on Knowledge and Data Engineering, 1(2), 1989 https://doi.org/10.1109/69.87960
  11. Tova Milo and Dan Suciu, Index Structures for Path Expressions, ICDT, 1999
  12. Sangwon Park and Hyoung-Joo Kim, A New Query Processing Technique for XML Based on Signature, DASFAA, April 2001 https://doi.org/10.1109/DASFAA.2001.916360
  13. 박상원, 김형주, 시그니처를 이용한 XML 질의 최적화 방법, 정보과학회 논문지(데이타베이스), 28(1), March 2001
  14. R.G.G. Cattell and Douglas K, Barry, The Object Database Standard: ODMG 2.0, Morgan Kaufmann Publisher, Inc., 1997
  15. M. Kifer, W. Kim, and Y. Sagiv, Querying Object-Oriented Databases. SIGMOD. 1992 https://doi.org/10.1145/141484.130342
  16. Roy Goldman and Jennifer Widom, DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, VLDB, 1997
  17. Jason McHugh and Jennifer Widom, Query Optimization for XML, VLDB, 1999
  18. Alin Deutsch, Mary Fernandez, and Dan Suciu, Storing Semistructured Data with STORED, SIGMOD, 1999 https://doi.org/10.1145/304182.304220
  19. Daniela Florcscu and Donald Kossmann, Storing and Querying XML Data using an RDBMS, Data Engineering Bulletin, 22(3), September 1999
  20. Jayavel Shanmugasundaram, Kristin Tufte, Gang He, Chun Zhang, David DeWitt, and Jeffrey Naughton, Relational Databases for Querying XML Documents: Limitations and Opportunities, VLDB, 1999
  21. Takeyuki Shimura, Masatoshi Yoshikawa, and Shunsuke Uemura, Storage and Retrieval of XML Documents Using Object-Relational Databases, DBXA, 1999
  22. Minos Garofalakis, Aristides Gionis, Rajeev Rastogi, S. Seshadri, and Kyuseok Shim, XTRACT: A System for Extracting Document Type Descriptors from XML Documents, SIGMOD, 2000 https://doi.org/10.1145/342009.335409
  23. Hwan-Seung Yong, Sukho Lee, and Hyoung-Joo Kim, Applying Signatures for Forward Traversal Query Processing in Object-Oriented Databases, ICDE, 1994 https://doi.org/10.1109/ICDE.1994.283076
  24. Walter W. Chang and Hans J. Schek, A Signature Access Method for the Starburst Database System, VLDB, 19S9
  25. Peter Linz, An Introduction to Formal Languages and Automata, Houghton Mifflin Company, 1990
  26. Won Kim, Introduction to Object-Oriented Databases, The MIT Press, 1990
  27. Won Kim, A New Way to Compute the Product and Join of Relations, SIGMOD, 1980 https://doi.org/10.1145/582250.582278