Browse > Article

A Path Partitioning Technique for Indexing XML Data  

김종익 (서울대학교 컴퓨터공학부)
김형주 (서울대학교 컴퓨터공학부)
Abstract
Query languages for XML use paths in a data graph to represent queries. Actually, paths in a data graph are used as a basic constructor of an XML query. User can write more expressive Queries by using Patterns (e.g. regular expressions) for paths. There are many identical paths in a data graph because of the feature of semi-structured data. Current researches for indexing XML utilize identical paths in a data graph, but such an index can grow larger than source data graph and cannot guarantee efficient access path. In this paper we propose a partitioning technique that can partition all the paths in a data graph. We develop an index graph that can find appropriate partitions for a path query efficiently. The size of our index graph can be adjusted regardless of the source data. So, we can significantly improve the cost for index graph traversals. In the performance study, we show our index much faster than other graph based indexes.
Keywords
XML; XML; semi-structured data; path index; path partition;
Citations & Related Records
연도 인용수 순위
  • Reference
1 /
[] / Xmark: The xml benchmark project
2 Chin-Wan Chung, Jun-Ki Min, and Kyuseok Shim. APEX: An Adaptive Path Index for XML Data. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, 2002:121-132   DOI
3 Sangwon Park and Hyoung-Joo Kim. SigDAQ: an enhanced XML query optimization technique. Journal of System Software 61(2):91-103, 2002   DOI   ScienceOn
4 The Internet Movie Database Ltd. Internet movie database, http://www.imdb.com
5 Quanzhong Li and Bongki Moon. Indexing and Querying XML Data for Regular Path Expressions. In Proceedings of the Conference on Very Large Data Bases, 2001:361-370
6 Jongik Kim and Hyoung-Joo Kim. Efficient Processing of Regular Path Joins using PID. Information and Software Technology, 45 (5):241-251, April 2003   DOI   ScienceOn
7 Raghav Kaushik, Pradeeep Shenoy, Philip Bohannon, and Ehud Gudes. Exploiting Local Similarity for Indexing Paths in Graph-Structured Data. In IEEE International Conference on Data Engineering, 2002:129-140   DOI
8 Xmark: The xml benchmark project. http://monetdb.cwi.nl/xml/index.html
9 Arnaud Le Hors, Philippe Le Hegaret, Lauren Wood, Gavin Nicol, Jonathan Robie, Mike Champion, and Steve Byrne. Document Object Model Level2 Core. W3C Recommendation, 2000
10 Tim Bray, Jean Paoli, and C. M. Sperberg McQueen. Extensible markup language (XML) 1.0. W3C Recommendation, 1998
11 Alin Deutsch, Mary F. Fernandez, Daniela Florescu, Alon Y. Levy, and Dan Suciu. A Query language for XML. Computer Networks 31(11-16): 1155-1169, 1999   DOI   ScienceOn
12 Peter Buneman, Mary F. Fernandez, and Dan Suciu, 'UnQL: a query language and algebra for semistructured data based on structural recursion. VLDB Journal: Very Large Data Bases, 9(1):76-110, May 2000   DOI
13 Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet Wiener. The lorel query language for semistructured data. International Journal on Digital Libraries 1(1):68-88, 1997
14 James Clark and Steve DeRose. XML Path Language (XPath) 1.0. W3C Recommendation, 1999
15 Tova Milo and Dan Suciu. Index structures for path expressions. In Proceedings of the International Conference on Database Theory, 1999: 277-295
16 Don Chamberlin, Daniela Florescu, Jonathan Robie, Jerome Simeon, and Mugur Stefanescu. XQuery: A Query Language for XML. W3C Working Draft, February 2001
17 Yannis Papakonstantinou, Serge Abiteboul, and Hector Carcia-Molina. Object exchange across heterogeneous information source. In IEEE International Conference on Data Engineering, 1995: 251-260   DOI
18 Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, 1996: 505-516   DOI
19 Roy Goldman and Jennifer Widom. DataGuides: enabling query formulation and optimization in semistructured databases. In Proceedings of the Conference on Very Large Data Bases, 1997: 436-445
20 Mary F. Fernandez and Dan Suciu. Optimizing regular path expressions using graph schemas. In IEEE International Conference on Data Engineering, 1998:14-23   DOI
21 Chun Zhang, Jeffrey Naughton, David DeWitt, Qiong Luo, and Guy Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, 2001:425-436   DOI
22 Svetlozar Nestorov, Jeffery Ullman, Janet Wiener, and Sudarshan Chawathe. Representative objects: concise representations of semistructured, hierarchical data. In IEEE International Conference on Data Engineering, 1997:79-90   DOI
23 Dan Suciu. Semistructured data and XML. In Proceedings of International Conference on Foundations of Data Organization 51(12):1050-1052, 1998
24 B. Cooper, N. Sample, M. J. Franlin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proceedings of the Conference on Very Large Data Bases, 2001:341-350