Browse > Article

A Unification Algorithm for DTDs of XML Documents having a Similar Structure  

유춘식 (전북대학교 전산통계학과)
우선미 (전북대학교 BK21 전자정보사업단)
김용성 (전북대학교 전자정보공학부)
Abstract
There are many cases that many XML documents have different DTDs in spite of having a similar structure and being logically same kind of document. For this reason, It occurs a problem that these XML documents have different database schema and are stored in different databases. So, in this paper, we propose an algorithm that unifies DTDs of these XML documents using the finite automata and the tree structure. The finite automata is suitable for representing repetition operators and connectors of DTD, and is a simple representation method for DTD. By using the finite automata, we are able to reduce the complexity of algorithm. And we apply a proposed algorithm to unify DTDs of science journals.
Keywords
XML DTD Unification; Tree; Finite Automata; Schema Integration;
Citations & Related Records
연도 인용수 순위
  • Reference
1 OmniMark, 'OmniMark : Content Model Algebra,' http://www.exoterica.com/white/ cma/cma.htm
2 Keith E. Shafer, Roger Thompson, 'Translating Mathematical Markup for Electronic Documents,' http://www.oclc.org/fred/docs/www4.htm
3 Anhai Doan, Pedro Domingos, 'Learning to Match the Schemas of Data Sources:A Multistrategy Approach,' Machine Learning, Vol. 50, pp. 279-301, 2003   DOI
4 Elisa Bertino, Giovanna Geunini, Marco Mesiti, 'A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications,' Information Systems, Vol. 29, pp. 23-46, 2004   DOI   ScienceOn
5 Murali Mani, Dongwon Lee, 'XML to Relational Conversion using Theory of Regular Tree Grammars,' 1st VLDB Workshop on Efficiency and Effectiveness of XML Tools, and Techniques (EEXTT 2002), pp. 81-103, Hong Kong, China, Aug. 2002
6 Wolfgang May, Georg Lausen, 'A uniform framework for integration of information from the web,' Information Systems, Vol. 29, pp. 59-91, 2004   DOI   ScienceOn
7 Chantal Reynaud, Jean-Pierre Sirot, Dan Vodislav, 'Semantic Integration of XML Heterogeneous Data Sources,' Int'l Database Engineering & Application Symposium (IDEAS2001), pp. 199-208, Grenoble, France, July 2001   DOI
8 Patricia Rodriguez-Gianelli, John Mylopoulos, 'A Semantic Approach to XML-based Data Integration,' 20th Int'l Conf. on Conceptual Modeling (ER'2001), pp. 117-132, Yokohama, Japan, Nov. 2001
9 Marie-Christine Rousset, Chantal Reynaud, 'Knowledge representation for information integration,' Information Systems, Vol. 29, pp. 3-22, 2004   DOI   ScienceOn
10 XML 1.0(Third Edition), W3C Recommendation, http://www. w3.org/TR/2004/REC-xml-20040204, Feb. 2004
11 Boris Chidlovskii, 'Using Regular Automata as XML schemas,' 4'th IEEE Advances in Digital Libraries Conferencer(ADL 2000), pp. 1-10, Washington, USA, May 2000
12 Ronaldo dos Santos Mello, Silvana Castano, Carlos Alberto Heuser, 'A method for unification of XML schemata,' Information and Software Technology, Vol. 44, No.4, pp, 241-249, 2002   DOI   ScienceOn
13 Chun-Sik Yoo, Seon-Mi Woo, Yong-Sung Kim, 'Automatic Generation Algorithm of Uniform DTD for Structured Documents,' Proc. of IEEE Region 10 Conf. TENCON'99, Vol. II, pp. 1095-1098, 1999   DOI
14 Seung-Jin Lim Yiu-Kai Ng, 'An Automated Integration Approach for Semi-Structured and Structured Data,' 3rd Int'l Symposium on Cooperative Database Systems and Applications(CODAS 2001), pp. 15-24, Beijing, China, Apr. 2001   DOI
15 Euna Jeong, Chun-Nan Hsu, 'Veiw Inference for Heterogeneous XML Information Integration,' Journal of Intelligent Information Systems, Vol. 20, No.1, pp 81-99, 2003   DOI   ScienceOn
16 Helena Ahonen, 'Generating Grammars for Structured Documents Using Grammatical Inference Methods,' University of Helsinki, Ph. D Thesis, 1996