An Efficient Path Expression Join Algorithm Using XML Structure Context

Kim, Hak-Soo;Shin, Young-Jae;Hwang, Jin-Ho;Lee, Seung-Mi;Son, Jin-Hyun;

doi:10.3745/KIPSTD.2007.14-D.6.605

The KIPS Transactions:PartD (정보처리학회논문지D)

Volume 14D Issue 6
/
Pages.605-614
/
2007
/
1598-2866(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

An Efficient Path Expression Join Algorithm Using XML Structure Context

XML 구조 문맥을 사용한 효율적인 경로 표현식 조인 알고리즘

김학수 (한양대학교 컴퓨터공학과) ;
신영재 (한양대학교 컴퓨터공학과) ;
황진호 (한양대학교 컴퓨터공학과) ;
이승미 (한양대학교 컴퓨터공학과 BK21 AIS사업팀) ;
손진현 (한양대학교 컴퓨터공학과)

Published : 2007.10.31

https://doi.org/10.3745/KIPSTD.2007.14-D.6.605 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

As a standard query language to search XML data, XQuery and XPath were proposed by W3C. By widely using XQuery and XPath languages, recent researches focus on the development of query processing algorithm and data structure for efficiently processing XML query with the enormous XML database system. Recently, when processing XML path expressions, the concept of the structural join which may determine the structural relationship between XML elements, e.g., ancestor-descendant or parent-child, has been one of the dominant XPath processing mechanisms. However, structural joins which frequently occur in XPath query processing require high cost. In this paper, we propose a new structural join algorithm, called SISJ, based on our structured index, called SI, in order to process XPath queries efficiently. Experimental results show that our algorithm performs marginally better than previous ones. However, in the case of high recursive documents, it performed more than 30% by the pruning feature of the proposed method.

XML 데이터 검객을 위한 표준 질의 언어로서 XQuery와 XPath가 W3C에 의해 표준으로 제정되었다. XQuery와 XPath를 보편적으로 사용함에 따라, 최근 연구는 방대한 XML 데이터베이스에서 XPath 경고 표현식에 대한 효율적인 질의 처리를 위한 데이터 구조 및 알고리즘 개발에 초점을 두고 있다. 최근에, XPath 경로 표현식을 처리할 때 XML 엘리먼트 사이의 구조적 관계(조상-자손, 부모-자식)를 결정하는 구조적 조인의 개념은 중요한 XPath 프로세싱 기법중의 하나가 되었다. 그러나 XPath 질의 처리에서 자주 발생하는 구조적 조인들은 높은 비용을 요구한다. 본 논문에서, 우리는 XPath 질의들을 효율적으로 처리하기 위해 제안한 구조적 인덱스(SI) 기반의 새로운 구조적 조인 알고리즘(SISJ)을 제안한다. 실험 결과에서는 이전의 알고리즘보다 근소하게 더 효율적인 성능을 보여 준다. 그러나 재귀성이 높은 문서에 대해서는 제안기법의 가지치기 특성으로 인해 약 30% 이상의 성능향상을 보였다.

Keywords

References

Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve maler, 'Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation 6 October '2000.' See http://www.w3.org/ TR/REC-xml/
Howard Katz, 'XQuery From The Experts - a guide to the W3C XML Query Language,' Addison-Wesley, 2003
Mary Fernandez, Ashok Malhotra, et al., 'XQuery 1.0 and XPath 2.0 Data Model(XDM),' W3C Candidate Recommendation 3 November 2005. See http://www.w3.org/TR/xpath-datamodel/
Anders berglund, Scott Boag, et al., 'XML Path Language (XPath) 2.0, W3c Candidate Recommendation 3 November 2005. See http://www.w3.org/TR/xpath20/
Chun Zhang, et al., 'On Supporting Containment Queries in Relational Database Management Systems', ACM SIGMOD, May 2001 https://doi.org/10.1145/376284.375722
Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Yuqing Wu, 'Structural Joins: A Primitive for Efficient XML Query Pattern Matching,' pp 141-152, 2002, ICDE 2002 https://doi.org/10.1109/ICDE.2002.994704
Shu- Yao Chien, Zografoula Vagena, Donghui Zhang, 'Efficient Structural Joins on Indexed XML Documents,' VLDB, 2002
Nicolas Bruno, Nick koudas, Divesh Srivastava, 'Holistic Twig Joins: Optimal XML Pattern Matching,' ACM SIGMOD, June 2002 https://doi.org/10.1145/564691.564727
Antonm Guttman, 'R-Trees: A Dynamic Index Structure For Spatial Searching,' ACM SIGMOD, 1984 https://doi.org/10.1145/602259.602266
Hanyu Li, Mong Li Lee, Wynne Hsu, Chao Chen, 'An Evaluation of XML Indexes for Structural Join,' SIGMOD Record, Vol.33, No.3, September 2004 https://doi.org/10.1145/1031570.1031576
Quanzhong Li, Bongki Moon, 'Indexing and Querying XML Data for Regular Path Expressions,' pp 361-370, VLDB 2001
Sudipto Guha, H.V. Jagadish, et al., 'Approximate XML Joins,' SIGMOD 2002
S. Ceri, P. Fraternali, S. Paraboschi, 'XML: Current Developments and Future Challenges for the Database Community,' EDBT 2000 https://doi.org/10.1007/3-540-46439-5
Tova Milo, Dan Suciu, 'Index Structures for Path Expressions,' pp.277-295, ICDT 1999 https://doi.org/10.1007/3-540-49257-7
G. Graefe, 'Query evaluation techniques for large databases,' ACM Computing Surveys, 25(2), 1993 https://doi.org/10.1145/152610.152611
J. McHugh and J. Widom, 'Query optimization for XML,' In Proceedings of VLDB, 1999
J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald, 'Efficiently publishing relational data as XML documents,' In Proceddings of VLDB, 2000
E. Shekita and M. Carey, 'A performance evaluation of pointer based joins,' Proceedings of SIGMOD, 1990
Denilson Barbosa, Alberto Mendelzon, etc., 'ToXgene - the ToX XML Data Generator,' IBM, See http://www.cs.toronto.edu /tox/toxgene/