Shredding XML Documents into Relations using Structural Redundancy

구조적 중복을 사용한 XML 문서의 릴레이션으로의 분할저장

  • 김재훈 (서강대학교 컴퓨터학과) ;
  • 박석 (서강대학교 컴퓨터학과)
  • Published : 2005.04.01

Abstract

In this paper, we introduce a structural redundancy method. It reduces the query processing cost incurred when reconfiguring an XML document from divided XML data in shredding XML documents into relations. The fundamental idea is that query performance can be enhanced by analyzing query patterns and replicating data essential for the query performance. For the practical and effective structural redundancy, we analyzed three types of ID, VALUE, and SUBTREE replication. In addition, if given XML data and queries are very large and complex, it can be very difficult to search optimal redundancy set. Therefore, a heuristic search method is introduced in this paper. Finally, XML query processing cost arising by employing the structural redundancy, and the efficiency of proposed search method arc analyzed experimentally It is manifest that XML read query is performed more quick]y but XML update query is performed more slowly due to the additional update consistency cost for replicas. However, experimental results showed that in-place ID replication is useful even in having excessive update cost. It was also observed that multiple-place SUBTREE replication can enhance read query performance remarkably if only update cost is not excessive.

본 논문에서는 XML 데이타를 릴레이션으로 분할 저장할 경우, 분할된 XML 데이타로부터 질의 결과 XML 문서를 재구성하는데 소모되는 질의 처리비용을 줄이기 위한 구조적 중복 방법을 소개한다. 기본 아이디어는 주어진 질의 패턴을 분석하여, 적절한 데이타들을 중복시킴으로서 질의 처리 성능을 향상시키는 것이다. 이러한 구조적 중복 방법으로 실질적으로 유효할 수 있는 ID, VALUE, SUBTREE의 세 가지 유형의 특성을 분석하였다. 본 논문에서는 추가적으로 주어진 XML 데이타와 질의들이 매우 크고 복잡할 경우 최적의 중복 집합을 팎는 것이 매우 어려운 작업이 될 수 있으므로, 이를 위한 경험적 탐색 방법을 소개한다. 마지막으로 몇 가지 실험을 통하여, 중복 데이타를 사용함으로 발생하는 XML 질의 처리비용과 제안된 탐색 방법의 효율성을 분석한다. 중복 데이타를 사용함으로 XML 판독 질의는 빨라지지만, XML 갱신 질의는 중복 데이타의 갱신 일관성 비용 때문에 느려지는 것은 당연하다. 하지만 실험 결과는 매우 과도한 갱신 비용의 경우에도 in-place ID 중복은 효율적이며, 갱신 비용이 매우 과도하지만 않다면 multiple-place SUBTREE 중복은 판독 질의 처리 성능을 크게 향상시킬 수 있음을 보여주었다.

Keywords

References

  1. C.C Kanne, G. Moerkotte, 'Efficient Storage of XML Data,' Proceedings of International Conference on DATA ENGINEERING, California, USA, p. 198, 2000 https://doi.org/10.1109/ICDE.2000.839412
  2. A. Deutsch, M. Fernandez, D. Suciu, 'Storing semistructured data with STORED,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 431-442, 1999 https://doi.org/10.1145/304182.304220
  3. J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. De-Witt, J. Naughton, 'Relational databases for querying XML documents: limitations and opportunities,' Proceedings of VLDB, Edinburgh, UK, pp. 302-314, 1999
  4. M. Fernandez, Y. Kadiyska, D. Suciu, A. Morishima, W. Tan, 'SilkRoute: A framework for publishing relational data in XML,' ACM TODS, 27(4): pp. 438-493, 2002 https://doi.org/10.1145/582410.582413
  5. H. I. Kang, B. Y. Lee, J. S. Yoo, 'Design and Implementation of a XML Repository System Using DBMS and IRS,' The Seventh Annual Conference for XML, SGML and markup technologies, XML Asia Pacific 2000, Sydney, 2000
  6. D. Florescu, D. Kossmann, 'Storing and Querying XML document using an RDBMS,' IEEE Data Engineering Bulletin, 22(3), 1999
  7. I. Tatarinov, Z. G. Ives, A. Y. Halevy, D. S. Weld, 'Updating XML,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 413-424, 2001 https://doi.org/10.1145/376284.375720
  8. J. Shanmugasundaram, E. Shekita, R. Barr, M. Carey, B. Lindsay, H. Pirahesh, B. Reinwald, 'Efficiently publishing relational data as XML documents,' Proceedings of VLDB, Cairo, Egypt, pp. 65-76, 2000
  9. Extensible Markup Language (XML) 1.0 (Second Edition), http://www.w3.org/TR/REC-xml#dt-doctype, October 2000
  10. XML Schema Part 0: Primer, http://www.w3.org/TR/xmlschema-0/, May 2001
  11. P. Bohannon, J Freire, P. Roy, J. Simeon, 'From XML Schema to Relations: A Cost-Based Approach to XML Storage,' Proceedings of International Conference on DATA ENGINEERING, San Jose, California, pp. 64-75, 2002 https://doi.org/10.1109/ICDE.2002.994698
  12. J. Bosak, 셰익스피어 연극 XML 데이타, http://www.oasis-open.org/cover/bosakShakespeare200.html
  13. A. Deutsch, V. Tannen, 'MARS: A system for publishing XML from mixed and redundant storage,' Proceedings of VLDB, Berlin, Germany, pp. 201-212, 2003
  14. L. Popa, 'Object/Relational Query Optimization with Chase and Backchase,' PhD thesis, Univ. of Pennsylvania, 2000
  15. P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price, 'Access Path Selection in a Relational Database Management System,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 23-34, 1979 https://doi.org/10.1145/582095.582099
  16. A. Kemper, G. Moerkotte, 'Access support in object bases,' Proceedings of the ACM SIGMOD conference, pp. 364-374, 1990 https://doi.org/10.1145/93597.98745
  17. E. Shekita, M. Carey, 'Performance Enhancement through Replication in an Object-Oriented DBMS,' Proceedings of the ACM SIGMOD conference, pp. 325-336, 1989 https://doi.org/10.1145/66926.66957
  18. H. Gupta, 'Selection of Views to Materialize in a Data Warehouse,' ICDT 1997, pp. 98-112, 1997
  19. A. R. Schmidt, F. Waas, M. L. Kersten, M. J. Carey, I. Manolescu, R. Busse, 'XMark: A Benchmark for XML Data Management,' Proceedings of VLDB, Hong Kong, China, pp. 974-985, 2002
  20. J. Freire, J. R. Haritsa, M. Ramanath, P. Roy, J. Simeon, 'StatiX: Making XML count,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 181-191, 2002
  21. B. F. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, M. Shadmon, 'A Fast Index for Semistructured Data,' Proceedings of VLDB, pp. 341-350, 2001
  22. R. Kaushik, P. Bohannon, J. F Naughton, H. F Korth, 'Covering Indexes for Branching Path Queries,' Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 133-144, 2002 https://doi.org/10.1145/564691.564707