Browse > Article

Branching Path Query Processing for XML Documents using the Prefix Match Join  

Park Young-Ho (한국과학기술원 전산학과)
Han Wook-Shin (경북대학교 컴퓨터공학과)
Whang Kyu-Young (한국과학기술원 전산학과)
Abstract
We propose XIR-Branching, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval(IR) techniques and novel instance join techniques. A partial match query is defined as the one having the descendent-or-self axis '//' in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR-Branching is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR-Branching has its basis on the conventional schema-level methods using relational tables(e.g., XRel, XParent, XIR-Linear[21]) and significantly improves their efficiency and scalability using two techniques: an inverted index technique and a novel prefix match join. The former supports linear path expressions as the method used in XIR-Linear[21]. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. XIR-Linear shows the efficiency for linear path expressions, but does not handle branching path expressions. However, we have to handle branching path expressions for querying more in detail and general. The paper presents a novel method for handling branching path expressions. XIR-Branching reduces a candidate set for a query as a schema-level method and then, efficiently finds a final result set by using a novel prefix match join as an instance-level method. We compare the efficiency and scalability of XIR-Branching with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR-Branching is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.
Keywords
XML; inverted index; prefix match join; partial match queries; branching Path expressions; XIR-Branching; XIR-Linear;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Kyu-Young Whang, Min-Jae Lee, Jae-Gil Lee, Min-Soo Kim, and Wook-Shin Han, 'Odysseus: a High-Performance ORDBMS Tightly-Coupled with IR Features,' Technical Report CS-TR-2004-204, Department of Computer Science, http://cs.kaist.ac.kr/research/technical/Archive/CS-TR-2004-204.pdf, KAIST, Dec., 2004
2 Teleport Pro Version 1.29, http://www.tenmax.com/teleport/pro/home.htm
3 ReGet Deluxe 3.3 Beta(build 173), http://deluxe.reget.com/en/
4 Kyu-Young Whang, Min-Jae Lee, Jae-Gil Lee, Min-Soo Kim, and Wook-Shin Han, 'Odysseus: a High-Performance ORDBMS Tightly-Coupled with IR Features,' In Proc. the 21st Int'l Conf. on Data Engineering,(ICDE), National Center of Sciences, Tokyo, Japen, April 5-8, 2005   DOI
5 R. Kaushik, P. Bohannon, J. F Naughton, H. F Korth, 'Covering Indexes for Branching Path Queries,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 133-144, 2002   DOI
6 M. Altinel, M. J. Franklin, 'Efficient Filtering of XML Documents for Selective Dissemination of Information,' In Proc. the 26th Int'l Conf. on Very Large Data Bases(VLDB), pp. 53-64, Cairo, Egypt, Sept. 10-14, 2000
7 Z. Ives, A. Levy, and D. Weld, Efficient Evaluation of Regular Path Expressions on Streaming XML Data, Technical Report UW-CSE-2000-05-02, University of Washington, 2000
8 J. McHugh, J. Widom, 'Query Optimization for XML,' In Proc. the 25th Int'l Conf. on Very Large Data Bases(VLDB), pp. 315-326, Edinburgh, Scotland, UK, Sept. 7-10, 1999
9 Igor Tatarinov, et. al, 'Storing and querying ordered XML using a relational database system', Proc. of ACM SIGMOD, pp. 204-215, 2002   DOI
10 R. Goldman and J. Widom, 'DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases,' In Proc. the 23th Int'l Conf. on Very Large Data Bases(VLDB), pp. 436-445, Athens, Greece, Aug. 26-29, 1997
11 Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., Srivastava, D., and Wu, Y., 'Structural Joins: A Primitive for Efficient XML Query Pattern Matching,' In Proc. 18th Int'l Conf. on Data Engineering, San Jose, California, pp. 141-152, Feb. 2002   DOI
12 A. Aboulnaga, A. R. Alameldeen, and J. Naughton, 'Estimating the Selectivity of XML Path Expressions for Internet Scale Applications,' In Proc. the 27th Int'l Conf. on Very Large Data Bases (VLDB), pp. 591-600, Rome, Italy, Sept. 11-14, 2001
13 G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York 1983
14 N. Polyzotis and M. Garofalakis, 'Statistical Synopses for Graph-structured XML Databases,' In Proc. 2002 ACM SIGMOD Int'l Conf. on Management of Data, pp. 358-369, Madison, Wisconsin, June 3-6, 2002   DOI
15 B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon, 'A Fast Index for Semistructured Data,' In Proc. the 27th Int'l Conf. on Very Large Data Bases(VLDB), pp. 341-350, Rome, Italy, Sept. 11-14, 2001
16 박영호, 한욱신, 황규영, '정보 검색 기술을 이용한 대규모 이질적인 XML 문서에 대한 효율적인 선형 경로 질의 처리,' 정보과학회논문지:데이타베이스, 제31권, 제5호, 2004년 10월
17 Jan-Marco Bremer and Michael Gertz, 'XQuery/IR: Integrating XML Document and Data Retrieval,' In Proc. the Fifth Int'l Workshop on the Web and Databases(WebDB 2002), pp. 1-6, Madison, Wisconsin, 2002
18 N. Bruno, N. Koudas, and D. Srivastava, 'Holistic Twig Joins: Optimal XML Pattern Matching,' In Proc. 2002 ACM SIGMOD Int'l Conf. on Management of Data, pp. 310-321, Madison, Wisconsin, June 3-6, 2002
19 Jiang,H.,Lu, H.,Wang, W., and Ooi, B.C., XR-Tree : Indexing XML Data for Efficient Structural Joins, In IEEE International Conference on Data Engineering, 2003   DOI
20 H. Jiang, W. Wang, H. Lu, and J. X. Yu, 'Holistic Twig Joins on Indexed XML Documents,' In Proc. the 29th Int'l Conf. on Very Large Data Bases(VLDB), pp. 273-284, Berlin, Germany, Sept. 9-12, 2003
21 Daniela Florescu, Donald Kossmann, and Ioana Manolescu, 'Integrating Keyword Search into XML Query Processing,' In Proc. the 9th WWW Conference/Computer Networks, pp. 119-135, Amsterdam, NL, May 2000   DOI
22 Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram, 'XRANK: Ranked Keyword Search over XML Documents,' In Proc. 2003 ACM SIGMOD Int'l Conf. on Management of Data, pp. 16-27, San Diego, California, June 9-12, 2003
23 A. Halverson, J. Burger, L. Galanis, A. Kini, R. Krishnamurthy, A. N. Rao, F. Tian, S. Viglas, Y. Wang, J. F. Naughton, and D. J. DeWitt, 'Mixed Mode XML Query Processing,' In Proc. the 29th Int'l Conf. on Very Large Data Bases(VLDB), pp. 225-236, Berlin, Germany, Sept. 9-12, 2003
24 H. Jiang, H. Lu, W. Wang and J. Yu, 'XParent: An Efficient RDBMS-Based XML Database System,' ICDE 2002   DOI
25 F. Mandreoli, R. Martoglia, P. Tiberio, 'Searching Similar(Sub)Sentences for Example-Based Machine Translation,' In Proc. SEBD'02, Isola d'Elba, Italy, June 2002
26 Q. Li and B. Moon, 'Indexing and Querying XML Data for Regular Path Expressions,' In Proc. the 27th Int'l Conf. on Very Large Data Bases(VLDB), pp. 361-370, Rome, Italy, Sept. 11-14, 2001
27 C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohmann, 'On Supporting Containment Queries in Relational Database Management Systems,' In Proc. 2001 ACM SIGMOD Int'l Conf. on Management of Data, pp. 425-436, Santa Barbara, California, May 21-24, 2001   DOI
28 Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., Srivastava, D., and Wu, Y., 'Structural Joins: A Primitive for Efficient XML Query Pattern Matching,' In Proc. 18th Int'l Conf. on Data Engineering, San Jose, California, pp. 141-152, Feb. 2002   DOI
29 M.Yoshikawa, T.Amagasa, T.Shimura, & S.Uemura: 'XRel: a path-based approach to storage and retrieval of XML documents using relational databases,' Proc. ACM Transactions on Internet Technology, Vol. 5, Augus, 2001   DOI
30 H. Jiang, H. Lu, W. Wang, and J. Xu Yu, 'Path Materialization Revisited: An Efficient Storage Model for XML Data,' In Proc. the 13th Australasian Database Conference(ADC), pp. 85-94, Melbourne, Australia, Jan. 28 - Feb. 1, 2002   DOI
31 J. Clark and S. DeRose, XML Path Language (XPath), W3C Recommendation, http://www.w3.org/TR/xpath, Nov. 1999
32 eXtensible Markup Language(XML), http://www.w3.org/XML/
33 J. Naughton et al., 'The Niagara Internet Query System,' IEEE Data Engineering Bulletin, Vol. 24, No. 2, pp. 27-33, June, 2001
34 Xyleme, http://www.xyleme.com