Browse > Article

Design and frnplernentation of a Query Processing Algorithm for Dtstributed Semistructlred Documents Retrieval with Metadata hterface  

Choe Cuija (연세대학교 전산학과)
Nam Young-Kwang (연세대학교 전산학과)
Abstract
In the semistructured distributed documents, it is very difficult to formalize and implement the query processing system due to the lack of structure and rule of the data. In order to precisely retrieve and process the heterogeneous semistructured documents, it is required to handle multiple mappings such as 1:1, 1:W and W:1 on an element simultaneously and to generate the schema from the distributed documents. In this paper, we have proposed an query processing algorithm for querying and answering on the heterogeneous semistructured data or documents over distributed systems and implemented with a metadata interface. The algorithm for generating local queries from the global query consists of mapping between g1oba1 and local nodes, data transformation according to the mapping types, path substitution, and resolving the heterogeneity among nodes on a global input query with metadata information. The mapping, transformation, and path substitution algorithms between the global schema and the local schemas have been implemented the metadata interface called DBXMI (for Distributed Documents XML Metadata Interface). The nodes with the same node name and different mapping or meanings is resolved by automatically extracting node identification information from the local schema automatically. The system uses Quilt as its XML query language. An experiment testing is reported over 3 different OEM model semistructured restaurant documents. The prototype system is developed under Windows system with Java and JavaCC compiler.
Keywords
Semistructured; XML; Information Retrieval; Metadata;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Young-Kwang Nam, Joseph Goguen, Guilian Wang. A Metadata Integration Assistant Generator for Heterogeneous Distributed Databases, by, in Proceedings, International Conference on Ontologies, DataBases, and Applications of Semantics for Large Scale Information Systems, Springer, Lecture Notes in Computer Science, Volume 2519, pages:1332-1344, 2002, from a conference held in Irvine CA, 29-31, October 2002
2 XPath(XML Path Language), http://www.w3.org/TR/xpath
3 Arnaud Sahuguet. Kweelt: More than just 'yet another framework to query XML!,' Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001   DOI
4 Yannis Papakonstantinou, Serge Abiteboul, Hector Garcia-Molina. Object Fusion in Mediator Systems, In Proceedings of Very Large Data Bases, pages:413-424, September 1996
5 Peter Buneman, Susan Davidson, Gerd Hillebrand, Dan Suciu. A Query Language and Optimization Techniques for Unstructured Data. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages:505-516, 1996   DOI   ScienceOn
6 Svetlozar Nestorov, Serge Abiteboul, Rajeev Motwani. Inferring Structure in Semistructured Data. In Proceedings of the Workshop on Management of Semistructured Data, 1997
7 Yannis Papakonstantinou, Hector Garcia-Molina, Jeniffer Widom. Object exchange across heterogeneous information sources. In Proceedings of the 11th ICDE, 1995   DOI
8 Peter Buneman, Mary Fernandez, Dan Suiciu. UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion, VLDB Journal manuscript, 2000   DOI
9 Yannis Papakonstantinou, Hector Garcia-Molina, Jeffrey Ullman. MedMaker: A Mediation System Based on Declarative Specifications. Data Engineering(ICDE), 1996
10 Dallan Quass, Anand Rajaraman, Yehoshua Sagiv, Jeffrey Ullman, Jennifer Widom. Querying Semi-structured Heterogeneous Information, Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases, pages:319-344, December 04-07, 1995
11 Chaitanya Baru, Amarnath Gupta, Bertram Ludascher, Richard Marciano, Yannis Papakonstantinou, Pavel Velikhov, Vincent Chu. XML-Based Information Mediation with MIX. Exhibition program, ACM Conf. on Management of Data, SIGMOD'99, Philadelphia, 1999
12 Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom. Lore: A database management systems for semistructured data. SIGMOD Record, 26, 1997
13 Sophie Cluet, Claude Delobel, Jerome Simeon, Katarzyna Smaga. Your Mediators Need Data Conversion! In Proceedings ACM-SIGMOD International Conference on Management of Data, pages:177-188, 1998   DOI   ScienceOn
14 Yannis Papakonstantinou. Query Processing in Heterogeneous Information Sources, Technical Report, Stanford University Thesis, 1996
15 Serge Abiteboul. Querying semistructured data. In Proceedings of ICDT, 1997
16 Jason McHugh, Jennifer Widom, Serge Abiteboul, Qinghan Luo, Anand Rajaraman. Indexing Semi-structured Data, Technical Report, Stanford University, 1998
17 Dan Suciu. Semistructured Data and XML. Information organization and databases, 2000
18 Peter Buneman. Tutorial: Semistructured data. In Proceedings of PODs, 1997
19 Don Chamberlin, Jonathan Robie, Daniela Florescu. Quilt: An XML Query Language for Heterogeneous Data Sources. Proceedings of WebDB 2000 Conference, in Lecture Notes in Computer Science, Springer-Verlag, 2000
20 Antonio Badia, Sanjay Kumer Madria. Handling Partial Matches in Semistructured Data with Cooperative Query Answering Techniques, Confederated International Conferences DOA, CoopIS and ODBASE, Pages:449-467, 2002
21 Dan Suciu. Distributed Query Evaluation on Semistructured Data, ACM Transactions on Database Systems, Vol. 27, No. 1, Pages:1-62, March 2002   DOI   ScienceOn