Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2007.14-D.3.285

Efficient Structural Information Extraction for XML Data  

Min, Jun-Ki (한국기술교육대학교 인터넷미디어공학부)
Abstract
There has been an increasing interest in n since it is spotlighted as the standard for data representation and exchange in the Web. The structural information for XML documents serves several important purposes. In spite of its importance, the schema is not mandatory for XML documents. Thus, much research to extract structural information for XML document has been conducted. In this paper, we present a technique for efficient extraction of concise and accurate DTD for XML documents. By restriction of DTD content model using the mixed content model of DTD and XML Schema as well as applying some heuristic rules proposed in this paper, we achieve the efficiency and conciseness. The result of an experiment with real life DTDs shows that our approach is superior to existing approaches.
Keywords
XML; DTD; XML Schema;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Berman and A. Diaz, Data Descriptors by Example (DDbE), IBM alphaworks, http://www.alphaworks. ibm.com/tech/DDbE, 2001
2 T. Bray, C. Frankston, and A. Malhatro, 'Document Content Description for XML,' W3C submission, http://www.w3.org/TR/NOTE-dcd, 1998
3 Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, Francois Yergeau eds., Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, http://www.w3.org/TR/REC-xml, 2006
4 A. Brazma, 'Efficient Identification of Regular Expressions from Representative Examples,' In Proceedings of ACM COLT, 1993   DOI
5 M. Bryan, 'An Introduction to the Standard Generalized Markup Language (SGML),' http://www.personal.u-net.com/~sgml/sgml.html
6 D. C. Fallside, P. Walmsley, XML Schema Part 0, W3C Recommendation, http://www.w3.org/TR/xmlschema-0, 2004
7 R. Goldman, J. Widom, 'DataGuides: Enable Query Formulation and Optimization in Semistructured Databases,' In Proceedings of VLDB Conf., 1997
8 C. S. Wallace, D. M. Boulton, 'An Information Measure for Classification,' Computer Journal, Vol. 11, 1968   DOI
9 Q.Y. Wang, J. X. Yu, and K. -F. Wong, 'Approximate graph schema extraction for semi structured data,' In Proceedings of the International Conference on Extending Data Technology (EDBT), 2000   DOI   ScienceOn
10 R. K. Wong, J. Sankey, 'On Structural Inference for XML Data,' Technical Report UNSW-CSE-TR-0313, The University of New South Wales
11 Juliana Freire, Jayant R. Haritsa, Maya Ramanath, Prasan Roy, Jerome Simeon, 'StatiX: making XML count,' In Proceedings of ACM SIGMOD, 2002
12 Robin Cover. The XML Cover Pages. http://www.oasisopen.org/cover/xml.html, 2001
13 J. Shanmugasundaram, K. Tufte, C. Zhang, H. Gang, D. J. DeWitt, and J. F. Naughton, 'Relational databases for querying XML documents: Limitations and opportunities,' In Proceedings of VLDB Conf., 1999
14 J. Hegewald, F. Naumann, M. Weis, ' XStruct: Efficient Schema Extraction from Multiple and Large XML Document,' In Proceedings of International Conference of DataEngin¬eering Workshop (ICDEW), 2006   DOI
15 C. H. Moh, E. P. Lim, and W. K. Ng, 'DTD Miner: A Tool for Mining DTD from XML Documents,' In Proceedings of International Workshop on Advance Issues of E Commerce and Web Based Information Systems(WECWIS), 2000   DOI
16 S. Nestorov, J. Ullman, J.Wiener, and S. Chawathe, 'Representative Objects: Concise Prepresentation of Semistructured, Hierarchical Data,' In Proceedings of IEEE ICDE, pp.79-90, 1997
17 J. Rissanen, 'Modeling by shortest data description,' Automatica, Vol. 14, 1978
18 D. Angluin, 'Equivenance queries and approximate fingerprints,' In Proceedings of the workshop on computational Learning Theory, 1989
19 A. Bruggemann Klein, D. Wood, 'One-unambiguous regular grammar,' Inf. Comput., 142(2), pp.182-206, 1998   DOI   ScienceOn
20 M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim, 'XTRACT: A System for Extracting Document Type Descriptors from XML Documents,' In Proceedings of ACM SIGMOD, 2000   DOI