A Transformation Technique of XML DTD to Relational Database Schema Based On Extracting Common Structure in XML Documents

Ahn, Sung-Eun;Choi, Hwang-Kyu;

doi:10.3745/KIPSTD.2002.9D.6.999

The KIPS Transactions:PartD (정보처리학회논문지D)

Volume 9D Issue 6
/
Pages.999-1008
/
2002
/
1598-2866(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

A Transformation Technique of XML DTD to Relational Database Schema Based On Extracting Common Structure in XML Documents

공통 문서 구조 추출을 통한 XML DTD의 관계형 데이터 베이스 스키마 변환 기법

안성은 (강원대학교 대학원 컴퓨터정보통신공학과) ;
최황규 (강원대학교 전기전자정보통신학과)

Published : 2002.12.01

https://doi.org/10.3745/KIPSTD.2002.9D.6.999 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

XML is emerging as a standard data format to exchange and to present data on the Web. There are increasing needs to efficiently store and to query XML data. In this paper. we propose a new schema transformation algorithm based on a common structure extracting technique from XML documents. The common structure is shared by all XML documents referenced by DTD and the uncommon structure is ununiformly appeared on all XML documents referenced by DTD. Based on the extracted common and uncommon structures, we transform XML DTD into relational database schema. We conduct a performance evaluation based on the number of the generated tables, the size of the record, query processing time and the number of joins on the query. The performance of our algorithm is compared with the existing algorithms, then in most cates, our algorithm is better than the existing ones with respect to the number of the generated tables and appearance of NULL values in the tables.

XML은 W3C에 제안된 마크업 언어고 HTML의 단순함과 SGML의 복잡함을 극복하여, 웹 상에서 데이터를 표현하고 교환하기 위한 표준으로 등장하고 있다. XML 문서를 질의 처리하기 위한 방법으로 XML 문서 전용 질의 언어가 개발되고 있지만, 데이터의 양이 증가한다면 결국 막대한 양의 데이터를 처리 할 데이터베이스 시스템을 필요하게 된다. 본 논문에서는 XML DTD를 관계형 데이터베이스 시스템 스키마로 변환하는 기법을 제안한다. 제안된 기법은 XML 데이터의 스키마 역할을 하는 DTD의 트리 구조를 생성하여, XML 데이터들의 공통구조와 비공통구조를 추출한 후 관계형 데이터베이스 스키마를 추출하는 기법이다. 추출된 관계형 데이터베이스 스키마는 기존의 방법들에 비해 생성 테이블 수가 적으며, 널(NULL)값의 출현을 감소시킨다. 또한, 제안기법은 XML 데이터를 보다 적은 테이블로 맵핑(mapping)시킴으로써 데이터 검색 시 참조 테이블 수를 감소시킬 수 있으며 질의 처리 시에도 성능 면에서 우수함을 보인다.

Keywords

References

J. Shanmugasundaram, H. Gang, K. Tufte, C. Zhang, D. J. DeWitt, and J. F. Naughton, 'Relational Databases for Querying XML Documents : Limitation and Opportunities,' Proc. of VLDB, Edinburgh, Scotland, pp.302-304, 1999
T. Shimura, M. Yoshikawa, and S. Uemura, 'Storage and Retrieval of XML Documents Using Object-Relational Databases,' DEXA, 1999
A. Deutsch, M. F. Fernandez, and D. Suciu, 'Storing Semi-structured Data with STORED,' Proc. of ACM SIGMOD Conference, 1999
D. Florescu and D. Kossmann, 'Storing and Querying XML Data Using an RDBMS,' Proc. of Int. Conf. on Data Eng., 1999
C. Kanne and G. Moerkotte, 'Efficient Storage of XML Data,' Proc. of Int. Conf. on Data Eng., 1998 https://doi.org/10.1109/ICDE.2000.839412
J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom, 'Lore : A Database Management System for Semi Structured Data,' Technical Report, Stanford University Database Group, February, 1997
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Wisdom, 'The TSIMMIS Project : Integration of Heterogeneous Information Sources,' Proc. of IPSJ Conference, pp.7-18, 1994
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Suciu, 'A Query Language for XML,' Proc. of 8th International World Wide Web Conference, 1999
P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu, 'A Query Language and Optimization Technique for Unstructured Data,' Proc. of ACM SIGMOD International Conference on Management of Data, 1996 https://doi.org/10.1145/235968.233368
J. McHugh and J. Widom, 'Query Optimization for XML,' Proc. of Very Large Data Bases, Edinburgh, U.K., 1999
M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim, 'XTRACT : A System for Extracting Document Type Descriptors from XML Documents,' Proc. of ACM SIGMOD Conference on Management of Data, Dallas, Texas, May, 2000 https://doi.org/10.1145/335191.335409
D. W. Lee and W. W. Chu, 'Constraints-preserving Transformation from XML Documents Type Definition to Relation Schema,' UCLA-CS-TR, 2000
A. Bonifati and S. Ceri, 'Comparative Analysis of Five XML Query Languages,' ACM SIGMOD Record, 29(1), 2000 https://doi.org/10.1145/344788.344822
http://www.rpbourret.com/xmldbms/index.htm/
http://www.cobase.cs.ucla.edu/projects/xpress/
T. Bray, J. Paoli, C. M. Sperberg-McQueen, 'Extensible Markup Language (XML) 1.0 (Second Edition),' http://www.w3.org/TR/REC-xml, W3C Recommendation 6, October, 2000