DTD가 없는 XML 데이터의 효율적인 저장 기법

An Efficient Technique for Storing XML Data Without DTD

  • Park, Gyeong-Hyeon (Electronics and Telecommunications Research Institute) ;
  • Lee, Gyeong-Hyu (Electronics and Telecommunications Research Institute) ;
  • Ryu, Geun-Ho (Dept. of Electrical Elecronic Computer Engineering, Chungbuk National University)
  • 발행 : 2001.10.01


XML이 인터넷상의 데이터 교환의 표준으로 대두되면서 데이터 모델이나 플랫폼에 관계없이 데이터의 전송이 가능하게 되었다. 특히 데이터 중심의 XML문서의 경우 전송시의 부하를 줄이기 위해 DTD 없이 전송되는 경우가 일반적이다. 그러한 이유로 전송받은 XML 데이터를 효율적으로 저장하고 질의를 최적화하며 또한 관계형 데이터베이스에 저장된 기존의 데이터를 XML 형태로 출력하기 위해서는 DTD가 없는 XML 문서로부터 관계형 스키마의 추출이 필수적으로 요구된다. 따라서 이 논문에서는 반구조적 데이터의 스키마 추출기법인 최대/최소 경계 스키마 추출 기법을 이용하여 DTD가 없는 XML문서로부터 관계형 스키마를 생성하고 이를 바탕으로 XML 데이터를 저장하는 방법으로 제시한다. 특히, 반구조적 데이터 최소 경계 스키마를 추출하는데 있어서 기존의 데이터로그보다 효율적인 방법인 시뮬레이션을 제안함으로써 관계형 스키마를 생성하는데 있어서 보다 향상된 방법을 보여준다.

XML makes it possible for data to be exchanged regradless of the data model in which it is represented or the platform on which it is stored, serving as a standard for data exchange format on the internet. Especially, it is natural to send XML data without DTD on the internet when XML is data-centric. Therefore it is needed to extract relational schema to store XML data efficiently, to optimize queries, and to publish data which have been stored in the relational database in the XML format. In this paper, we proposed a method to generate relational database in the XML documents without DTD and store XML data using upper/lower bound schema extraction technique for semistructured data. In extracting a lower bound schema, we especially show an efficient technique for creating relational schema by using simulation with is more advanced than the datalog method.



  1. R. Bourret, C. Bornhovd, A. P. Buchmann. A Generic Load/Extract Utility for Data Transfer between XML Documents and Relational Databases. WECWIS'00, San Jose, California, June pp.8-9, 2000
  2. Ronald Bourret. XML and Databases. Technical University of Darmstadt, 2000. http://www.rpbourret.com/xml/XMLAndDatabases.htm
  3. Michael J. Carey, Daniela Florescu, Zachary G. Ives, Ying Lu, Jayavel Shanmugasundaram, Eugene J. Shekita, Subbu N. Subramanian : XPERANTO : Publishing Object-Relational Data as XML. WebDB (Informal Proceedings), pp.105-110, 2000
  4. M. Fernandez, WangChiew Tan, Dan Suciu. SilkRoute : Trading between Relations and XML. WWWg, 2000
  5. P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding structure to unstructured data. In Proc. of the ICDT, 1997
  6. Roy Goldman, Jason McHugh, Jennifer Widom. From Semistructured Data to XML : Migrating the Lore Data Model and Query Language. WebDB (Informal Proceedings), 1999
  7. S. Nestorov, S. Abiteboul, R. Motwani. Extracting Schema from Semistructured Data. In SIGMOD, pp.295-306, 1995 https://doi.org/10.1145/276304.276331
  8. D. Calvanese, G. Giacomo, and M. Lenzerini. What can Knowledge representation do for semi-structured data? In Proc. of the 15th National Conf. on Artificial Intelligence (AAAI-98), 1998
  9. S. Abiteboul. Querying semi-structured data. In Proc. of th e Intl. Conf. on Database Theory (ICDT), 1997
  10. The World Wide Web Consortium (W3C)'s DOM(Document Object Model) web page, 2000. http://www.w3c.org/dom
  11. M. Fernandez and D. Suciu. Optimizing regular path expressions using graph schemas. In Proc. of the Intl. Conf. on Database Theory(ICDT), 1997 https://doi.org/10.1109/ICDE.1998.655753
  12. S. Abiteboul, P. Bunneman, D. Suciu. Data on the Web : From Relations to Semistructured Data and XML. Morgan Kaufmann, 1999
  13. M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, K. Shim. XTRACT : A System for Extracting Document Type Descriptors from XML Documents. In Proc. of the ACM SIGMOD international Conf. on Mangement of Data, Dallas, Texas, 2000 https://doi.org/10.1145/342009.335409
  14. R. Goldman, J. Widom. DataGuides : Enabling Query Formulation and Optimization In Semistructured Databases. In Proc. of the 23rd VLDB Conference Athens, Greece, 1997
  15. P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In SIGMOD, pp.505-516, Montreal, 1996
  16. M. Henzinger, T. Henzinger, and P. Kopke. Computing simulation on finite and infinite graphs. In Proc. of the 20th Symposium on Foundations of Computer Science, pp.453-462, 1995 https://doi.org/10.1109/SFCS.1995.492576
  17. P. Kilpelainen, H. Mannila, and E. Ukkonen. MDL learning of unions of simple pattern languages from positive exmamples. In Proc. of the European Conf. on Computational Learning Theroy, 1995
  18. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J Widom. Lore : A Database Management System for Semistructured Data. SIGMOD Record, 26(3), September, 1997 https://doi.org/10.1145/262762.262770
  19. S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe. Representative Objects : Concise Representation of Semistructrued Hierarchical Data. ICDE, 1997
  20. A. Brazma. Efficient identification of regular expressions from representative examples. In Proc. of the Ann. Conf. on Computational Learing Theroy, 1993 https://doi.org/10.1145/168304.168340
  21. The World Wide Web Consortium (W3C)'s XML web page, 1998. http://www.w3c.org/XML/