Storage and Retrieval of XML Documents Without Redundant Path Information

Lee Hiye-Ja;Jeong Byeong-Soo;Kim Dae-Ho;Lee Young-Koo;

doi:10.3745/KIPSTD.2005.12D.5.663

The KIPS Transactions:PartD (정보처리학회논문지D)

Volume 12D Issue 5 Serial No. 101
/
Pages.663-672
/
2005
/
1598-2866(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Storage and Retrieval of XML Documents Without Redundant Path Information

경로정보의 중복을 제거한 XML 문서의 저장 및 질의처리 기법

이혜자 (용인송담대학 의료정보시스템과) ;
정병수 (경희대학교 전자정보학부 컴퓨터공학과) ;
김대호 (경희대학교 전자정보학부 컴퓨터공학과) ;
이영구 (경희대학교 전자정보학부 컴퓨터공학과)

Published : 2005.10.01

https://doi.org/10.3745/KIPSTD.2005.12D.5.663 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This Paper Proposes an approach that removes the redundancy of Path information and uses an inverted index, as an efficient way to store a large volume of XML documents and to retrieve wanted information from there. An XML document is decomposed into nodes based on its tree structure, and stored in relational tables according to the node type, with path information from the root to each node. The existing methods using path information store data for all element paths, which cause retrieval performance to be decreased with increased data volume. Our approach stores only data for leaf element path excluding internal element paths. As the inverted index is made by the leaf element path only, the number of posting lists by key words become smaller than those of the existing methods. For the storage and retrieval of U data, our approach doesn't require the XML schema information of XML documents and any extension of relational database. We demonstrate the better performance of on approach than the existing approaches within the scope of our experiment.

본 논문에서는 대용량 XML 문서를 저장하고 그로부터 원하는 정보를 효율적으로 찾기 위한 방법으로, 경로정보의 중복을 제거하면서 역 인덱스를 함께 이용한 방법을 제안한다. XML 문서는 트리구조에 기반한 노드로 분해되어, 노드 타입에 따라, 루트에서 각 노드까지의 경로정보와 함께 관계형 테이블에 저장된다. 경로정보를 이용한 기존의 U 질의 기법들에서는 모든 엘리먼트 노드들에 대해 경로정보를 저장함에 따라 정보의 양이 증가하여 질의 처리의 성능을 저하시키는 요인이 되고 있다. 제안 방법에서는 경로정보 중 가장 긴 단말 엘리먼트 노드까지의 경로인 단말 엘리먼트 경로(leaf element path)만 저장하고 내부 엘리먼트 노드까지의 경로인 내부 엘리먼트 경로들(internal element paths)은 저장하지 않는다. 단말 엘리먼트 경로만을 대상으로 하여 역 인덱스를 구성함에 따라, 기존의 역 인덱스 이용 기법에 비해 키워드별 포스팅 리스트(posting lists)의 수를 줄이게 된다. 제안 방법에서는 U 문서의 저장과 질의를 위하여 XML 문서에 대한 스키마 정보가 없어도 되며, 관계형 데이터베이스의 어떤 확장도 요구하지 않는다. 실험을 통해 제안 방법은 실험 범위 내에서 기존 기법들에 비해 좋은 성능을 보인다.

Keywords

References

I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita and C. Zhang, 'Storing and Querying Ordered XML Using a Relational Database System,' ACM SIGMOD 2002
M. Yoshikawa, T Amagasa, T Shimura and S. Uemura 'XRei: A Path-Based Approach to Storage and Retrieval of XML Documents using Relational Databases,' ACM Transactions on Internet Technology, Vol, pp.110-141, August, 2001 https://doi.org/10.1145/383034.383038
WWW Consortium, XML Query Data Model, http:// www.w3.org/
D. Florescu and D. Kossmann, 'Storing and Querying XML Data Using an RDBMS,' IEEE Data Engineering Bulletin 22(3), pp.27-34, 1999
J. Zhang, 'Application of OODB and SGlML Techniques in Text Database: An Electronic Dictionary System,' SIGMOD Record 24, pp.3-8, 1995 https://doi.org/10.1145/202660.202661
H. Jiang, H. Lu, W. Wang and J. Yu, 'XParent: An Efficient RDBMS-Based XML Database System,' ICDE 2002 https://doi.org/10.1109/ICDE.2002.994745
민경섭, 김형주, '상이한 구조의 XML 문서들에서 경로 질의 처리를 위한 RDBMS 기반 역 인덱스 기법', 정보과학회논문지; 데이터베이스, 30(4):420-428, 2003
박영호, 한욱신, 황규영, '정보 검색 기술을 이용한 대규모 이질적인 XML 문서에 대한 효율적인 성형 경로 질의 처리', 정보과학회논문지: 데이터베이스, 31(5) : 540-552, 2004
Q. Li and B. Moon, 'Indexing and Querying XML Data for Regular Path Expression,' VLDB 2001
S. Sundara, Y. Hu, T. Chorma and J Srimivasan, 'Developing an Indexing Scheme XML Document Collections Using the OracleSi Extensibility Framework,' VLDB 2001
M. G. Bauer, F, Ramsak and R. Bayer, 'Multidimensional Mapping and Indexing,'
S. Pal, I. Cseri, O. Seeliger, G. Schaller, L. Giakoumakis, V. Zolotov, 'Indexing XML Data Stored in a Relational Database,' VLDB 2004
] J. McHugh, S. Abiteboul, R. Goldman, D. Quass and J. Widom, 'Lore: A Database Management System for Semi structured Data,' 1997
H. Schoning, 'Tamino - a DBMS Designed for XML'
F. Tian, D. J Dewitt, J Chen and C. Zhang, 'The Design and Performance Evaluation of Alternative XML Storage Strategies'
V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl, 'From Structured Documents and to Novel Query Facilities', ACM SIGMOD 1994
J. Shanmugasundaram et aI., 'Relational Databases for Querying XML Documents: Limitation and Opportunities,' VLDB 1999
B. F. Cooper, N. Sample, M. J Franklin, G. R. Hjaltason and M. Shaclmon, 'A Fast Index for Semi-structured Data,' VLDB 2001
C. Chung, J Min and K Shim, 'APEX: An Adaptive Path Index for XML Data,' ACM SIGMOD 2002
Wisconsin XML Data Set, http://www.cs.wisc.edu/niagara/data.html