A Clustering Method Based on Path Similarities of XML Data

Choi Il-Hwan;Moon Bong-Ki;Kim Hyoung-Joo;

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Volume 33 Issue 3
/
Pages.342-352
/
2006
/
1229-7739(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

A Clustering Method Based on Path Similarities of XML Data

XML 데이타의 경로 유사성에 기반한 클러스터링 기법

Choi Il-Hwan ;
Moon Bong-Ki (University of Arizona Department of Computer Science) ;
Kim Hyoung-Joo

최일환 (서울대학교 컴퓨터공학부) ;
문봉기 ;
김형주 (서울대학교 컴퓨터공학부)

Published : 2006.06.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Current studies on storing XML data are focused on either mapping XML data to existing RDBMS efficiently or developing a native XML storage. Some native XML storages store each XML node with parsed object form. Clustering, the physical arrangement of each object, can be an important factor to increase the performance with this storing method. In this paper, we propose re-clustering techniques that can store an XML document efficiently. Proposed clustering technique uses path similarities among data nodes, which can reduce page I/Os when returning query results. And proposed technique can process a path query only using small number of clusters as possible instead of using all clusters. This enables efficient processing of path query because we can reduce search space by skipping unnecessary data. Finally, we apply existing clustering techniques to store XML data and compare the performance with proposed technique. Our results show that the performance of XML storage can be improved by using a proper clustering technique.

최근의 XML 저장소에 관한 연구들은 기존의 데이타 저장을 위해 주로 사용해 왔던 관계형 데이타베이스에 효율적으로 XML 데이타를 매핑하는 기법이나 XML 데이타를 위한 새로운 전용 저장소에 대한 연구들이 주를 이룬다. XML 전용 저장소에서 많이 사용되는 방식으로 XML 문서를 파싱하여 각 노드들을 개별적인 객체로 생성한 후 이를 저장하는 방식이 있다. 이러한 저장 방식에서는 개별적인 객체들의 물리적 배치, 즉 클러스터링이 성능에 영향을 미칠 수 있다. 본 논문에서는 하나의 XML 문서를 보다 효율적으로 저장하는 클러스터링 기법을 제안한다. 제안하는 기법은 데이타 노드들의 경로 유사도를 기반으로 클러스터링을 수행하여 질의 요청에 대한 결과를 반환할 때 발생하는 페이지 I/O를 줄인다. 또한 경로 질의 처리시 필요한 클러스터만을 이용하여 질의 처리를 수행하는 방법을 제안한다. 이는 질의 처리과정에서 불필요한 데이타를 제외함으로써 결과적으로 탐색 공간의 크기를 줄일 수 있어 보다 효율적인 경로 질의 처리를 가능하게 한다. 이밖에 본 논문에서는 기존의 다른 클러스터링 기법들과 제안한 기법들과의 성능 비교를 수행하고, 이를 통해 적절한 클러스터링 기법을 이용하면 XML 저장소의 성능을 향상시킬 수 있음을 보인다.

Keywords

References

D. Barbosa, A. Barta, A. Mendelzon, G. Mihaila, F. Rizzolo, and P. Rodriquez-Gianolli, ToX-The Toronto XML Engine, in: Proc. WIIW '01, (Brazil, 2001) 66-73
Alin Deutsch, Mari Fernandez, and Dan Suciu, Storing semistructured data with STORED. In: Proc. SIGMOD 1999, (ACM Press, Philadelphia, 1999) 431-442 https://doi.org/10.1145/304182.304220
Excelon Corp, Excelon-the EBusiness Information Server, http://www.exln.com/
D. Florescu and D. Kossman, Storing and Querying XML Data Using an RDBMS, IEEE Data Engineering Bulletin 22(3) (1999) 27-34
H. V. Jagadish, Shurug AI-Khalifa, Adriane Chapman, Laks V.S. Lakshmanan, Andrew Nierman, Stelios Paparizos, Jignesh M. Patel, Divesh Srivastava, Nuwee Wiwatwattana, Yuqing Wu and Cong Yu, TIMBER: A Native XML Database, VLDB 11(4) (2002) 274-291 https://doi.org/10.1007/s00778-002-0081-x
C. C. Kanne and G. Moerkotte, Efficient storage of XML data, in: Proc. ICDE '00 (IEEE Computer Society, San Diego, 2000) 198
J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom, Lore: A Database management system for semistructured data, SIGMOD Record 26(3) (1997) 54-66 https://doi.org/10.1145/262762.262770
Jayavel Shanmugasundaram, Kristin Tufte, Chun Zhang, Gang He, David J. DeWitt, Jeffrey F. Naughton, Relational Databases for Querying XML Documents: Limitations and Opportunities, in Proc: VLDB '99 (Morgan Kaufmann, Edinburgh, 1999) 302-314
Software AG. Tamino Information Server for Electronic Business, Technical Whitepaper, http://www.amiono.com/taminolDownload/tamino.pdf
Elisa Bertino , Amani A. Saad , M. A. Ismail, Clustering techniques in object bases: a survey, Data & Knowledge Engineering 12(3) (1994) 255-275 https://doi.org/10.1016/0169-023X(94)90028-0
J. Banerjee, W. Kim, S.-J. Kim, and J. F. Garaza, Clustering a DAG for CAD databases, IEEE Transactions on Software Engineering 14(11) (1988) 1684-1699 https://doi.org/10.1109/32.9055
Feng Tian, David J. DeWitt .Jianjun Chen, Chun Zhang, The Design and Performance Evaluation of Alternative XML Storage Strategies, SIGMOD Record 31(1) (2002) 5-10 https://doi.org/10.1145/507338.507341
Xiaofeng Meng, Daofeng Luo, Mong Li Lee, Jing An, OrientStore: A Schema Based Native XML Storage System, in: Proc: VLDB '03, (Morgan Kaufmann, Berlin, 2003) 1057-1060
Jiefeng Cheng, Ge Yu, Guoren Wang, Jeffrey Xu Yu, PathGuide: An Efficient Clustering Based Indexing Method for XML Path Expressions, in: Proc. DASFAA '03, (IEEE Computer Society, Kyoto, 2003) 257-
Carsten Gerlhof, Alfons Kemper, Christoph Kilzer, Guido Merkotte, Partition-Based Clustering in Object Bases, in: Proc. FODO'93, Lecture Notes in Computer Science, vol. 730 (Springer, Chicago, 1993) 301-316
Manolis M. Tsangaris and Jeffrey F. Naughton, A Stochastic Approach for Clustering in Object Bases, in: Proc: SIGMOD 1991, (ACM Press, Colorado, 1991) 12-21 https://doi.org/10.1145/115790.115792
Roy Goldman and Jennifer Widom, DataGuides: enabling query formulation and optimization in semistructured databases, in: Proc. VLDB '97, (Morgan Kaufmann, Athens, 1997) 436-445
V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Dokl. 10(8) (1966) 707-710
R. A. Wagner, M. J. Fischer, The string-to-string correction problem, ACM 21(1) (1974) 168-173 https://doi.org/10.1145/321796.321811
Albrecht Schmidt, Florian Waas, Martin L. Kersten, Michael J. Carey, Ioana Manolescu, Ralph Busse, XMark: A Benchmark for XML Data Management, in Proc. VLDB '02, (2002) 974-985
Jongik Kim, Ilhwan Choi, Hyun-Sook Lee and Hyoung- Joo Kim, XDBox: Impelementation of XML object repository, in Proc: KISS Spring (Jeju, 2003)
A. Berglund, S. Baog, D. Chamberlin, et aI, XML Path Language(XPath), ver. 2.0, W3C Working Draft, Tech. Report, http://www.w3.org/TR/4, 2001

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

A Clustering Method Based on Path Similarities of XML Data

XML 데이타의 경로 유사성에 기반한 클러스터링 기법

Abstract

Keywords

References

Detail Search