XML Schema Matching based on Ontology Update for the Transformation of XML Documents

XML 문서의 변환을 위한 온톨로지 갱신 기반 XML 스키마 매칭

  • 이경호 (연세대학교 컴퓨터과학과) ;
  • 이준승 (연세대학교 컴퓨터과학과)
  • Published : 2006.12.15

Abstract

Schema matching is important as a prerequisite to the transformation of XML documents. This paper presents a schema matching method for the transformation of XML documents. The proposed method consists of two steps: preliminary matching relationships between leaf nodes in the two XML schemas are computed based on proposed ontology and leaf node similarity, and final matchings are extracted based on a proposed path similarity. Particularly, for a sophisticated schema matching, the proposed ontology is incrementally updated by users' feedback. furthermore, since the ontology can describe various relationships between concepts, the proposed method can compute complex matchings as well as simple matchings. Experimental results with schemas used in various domains show that the proposed method is superior to previous works, resulting in a precision of 97% and a recall of 83 % on the average. Furthermore, the dynamic ontology increased by 9 percent overall.

서로 다른 XML 스키마로 작성된 XML 문서간의 변환을 위해서는 두 스키마 사이의 의미적 연관관계를 계산하는 스키마 매칭 과정이 필수적이다. 본 논문에서는 XML 문서의 변환을 위한 효율적인 스키마 매칭 알고리즘을 제안한다. 제안된 알고리즘은 두 단계로 구성된다. 먼저 제안된 온톨로지와 어휘 유사도에 기반하여 단말노드 사이의 후보매칭을 계산한다. 또한 문맥 정보를 반영하는 제안된 경로 유사도 비교를 통해 후보매칭간계 중에서 최종 매칭 결과를 선택한다. 특히 제안된 방법은 기존 연구와 달리 사용자 피드백에 의해 점증적으로 갱신되는 온톨로지에 기반한다. 제안된 온톨로지는 IsA나 PartOf와 같은 다양한 관계를 표현할 수 있기 때문에 일대일 매칭은 물론이고 다대일 및 일대다 관계의 복합매칭을 계산할 수 있다. 제안된 알고리즘의 성능 평가를 위해 다양한 도메인의 XML 스키마를 대상으로 실험한 결과, 평균 97%의 정확률과 83%의 재현율을 나타내어 기존 연구보다 우수하였다. 특히 제안된 온톨로지의 갱신을 통하여 약 9%의 성능 향상을 확인할 수 있었다.

Keywords

References

  1. World Wide Web Consortium, Extensible Markup Language(XML) 1.0(Second Edition), W3C Recommendation, http://www.w3c.org/TR/REC-xml, 2000
  2. World Wide Web Consortium, XML Schema 1.0, W3C Recommendation, http://www.w3.org/TR/xmlschema-0/, 2001
  3. World Wide Web Consortium, XSL Transformations (XSLT) 1.0, W3C Recommendation, http://www.w3.org/TR/1999/REC-xslt-19991116, 1999
  4. Eila Kuikka, Paula Leinonen, and Martti Penttonen, 'Toward Automating of Document Structure Transformations,' Proc. ACM Symposium Document Engineering, pp. 103-110, 2002 https://doi.org/10.1145/585058.585078
  5. MicroSoft biztalk mapper, http://www.microsoft.com/biztalk/
  6. XSL Wiz, http://www.induslogic.com/
  7. Erhard Rahm and Philip A. Bernstein, 'A Survey of Approaches to Automatic Schema Matching,' Very Large Data Bases Journal, Vol. 10, No. 4, pp. 334-350, 2001 https://doi.org/10.1007/s007780100057
  8. Wen-Syan Li and Chris Clifton, 'Semantic Integration in Heterogeneous Databases Using Neural Networks,' Proc. Int'l Conf. Very Large DataBase, pp. 1-12, 1994
  9. Sonia Bergamaschi, Silvana Castano, Sabrina De Capitani di Vimercati, S. Montanari, and Maurizio Vincini, 'An Intelligent Approach to Information Integration,' Proc. Int'l Conf. Formal Ontology in Information Systems, pp. 253-267, 1998
  10. Tova Milo and Sagit Zohar, 'Using Schema Matching to Simplify Heterogeneous Data Translation,' Proc. Int'l Conf. Very Large Data Bases, pp. 122-133, 1998
  11. Barbara Staudt Lerner, 'A Model for Compound Type Changes Encountered in Schema Evolution,' ACM Transactions Database Systems, Vol. 25, No. 1, pp. 83-127, 2000 https://doi.org/10.1145/352958.352983
  12. AnHai Doan, Pedro Domingos, and Alon Halevy, 'Learning to Match Schemas of Data Sources: A Multistrategy Approach,' Machine Learning, Vol. 50, No. 3, pp. 279-301, 2003 https://doi.org/10.1023/A:1021765902788
  13. Renee J. Miller, Laura M. Haas, Mauricio A. Hernandez, Lingling Yan, C. T. Howard Ho, Ronald Fagin, and Lucian Popa, 'The Clio Project: Managing Heterogeneity,' SIGMOD Record, Vol. 30, No. 1, pp. 78-83, 2001 https://doi.org/10.1145/373626.373713
  14. Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm, 'Generic Schema Matching with Cupid,' Proc. Int'l Conf. Very Large Data Bases, pp, 49-58, 2001
  15. Hong Su, Harumi Kuno, and Elke A. Rundensteiner, 'Automating the Transformation of XML Documents,' Proc. Int'l Workshop Web Information and Data Management, pp. 68-75, 2001 https://doi.org/10.1145/502932.502946
  16. Mong Li Lee, Wynne Hsu, LiangHuai Yang, and Xia Yang, 'XClust: Clustering XML Schemas for Effective Integration,' Proc. Int'l Conf. Information and Knowledge Management, pp, 292-299, 2002 https://doi.org/10.1145/584792.584841
  17. Hong-Hai Do and Erhard Rahm, 'COMA - A System for Flexible Combination of Schema Matching Approaches,' Proc. Int'l Conf. Very Large Data Bases, pp. 610-621, 2002
  18. Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm, 'Similarity Flooding - A Versatile Graph Matching Algorithm,' Proc. Int'l Conf. Data Engineering, pp. 117-128, 2002 https://doi.org/10.1109/ICDE.2002.994702
  19. Li Xu and David W. Embley, 'Discovering direct and indirect matches for schema elements,' Proc. Int'l Conf. Database Systems for Advanced Applications, pp. 39-46, 2003 https://doi.org/10.1109/DASFAA.2003.1192366
  20. Robin Dhamankar, Yoonkyong Lee, AnHai Doan, and Alon Halevy, 'iMAP: Discovering Complex Semantic Mappings between Database Schemas,' Proc. Int'l Conf. SIGMOD, pp. 383-394, 2004 https://doi.org/10.1145/1007568.1007612
  21. George A. Miller, 'WordNet: A Lexical Database for English,' Communications of the ACM, Vol. 38, No. 11, pp, 39-41, 1995 https://doi.org/10.1145/219717.219748
  22. Hong Hai Do, Sergey Melnik, and Erhard Rahm, 'Comparison of Schema Matching Evaluations,' Lecture Notes in Computer Science, Vol. 2593, pp. 221-237, 2002