메타정보 인터페이스를 이용한 이질 구조 분석 XML문서 통합 검색

Integrated Information Retrieval with Metadata Interface for Heterogeneous Distributed XML Documents

  • 발행 : 2004.11.01

초록

본 논문은 구조적, 의미론적 이질성을 가진 분산 XML 문서의 통합 검색을 위해 반자동으로 생성된 인터페이스를 통해 각 지역 문서에 대한 질의를 생성하여 검색하는 방법을 제안한다. 본 시스템에서는 데이타 통합을 위한 메타데이타 인터페이스인 DDXMI(Distributed Documents XML Metadata Interface)를 정의하고, 분산 데이타에 대한 DTD를 입력으로 받아 사용자로 하여금 전역 DTD와 각 지역 DTD 간의 의미 차이를 극복하기 위한 사용자 인터페이스 생성 방법을 제안하였다. 전역 DTD와 지역 DTD의 특성을 고려하여 인덱스 매핑과 그에 필요한 함수 이름의 매핑 정보를 기반으로 DDXMI가 자동으로 생성된다. XML 질의 언어인 Quilt를 사용하여 생성된 DDXMI를 통해 각 지역 문서에 적합한 질의를 생성, 수행한다 사용자는 검색 대상 문서의 스키마와 통합스키마의 구조를 잘 알고 있다고 가정하였다. XML로 만들어진 석박사 논문, 논문지, 연구보고서에 대한 소규모, 중규모 전역 DTD를 만들어 실제로 질의를 생성하여 검색 결과를 검증할 수 있도록 하였다. 본 시스템은 JavaCC와 Java 서블릿을 이용하여 개발하였다.

We propose an extremely light DDXMI approach for semi-automated integration of both structurally and semantically heterogeneous distributed XML documents. In the proposed prototype, a DDXMI(Distributed Documents XML Metadata Interface) is defined and a user interface generator is developed. The prototype takes sources' DTDs as inputs and generates a friendly graphical user interface for the application users. The user can easily describe the semantic mapping between the integrated virtual database DTD and sources' DTDs through assigning index numbers and specifying associated function names so that the DDXMI based on the mappings is automatically generated. Quilt is selected as the XML query language which processes user queries according to the DDXMI. It is assumed that the application users know what they want from the different sources, that is, they have their own integrated database schema in their mind, and know the semantics of the involved XML databases. A small-size global DTD and a mid-size global DTB are generated to verify the rluery generation and retrieval results with 3 XML document databases, that is, Master/ph.D thesis, research reports, and journal databases. The system has been developed with JavaCC and Java Servelet.

키워드

참고문헌

  1. C. Parent and S. Spaccapietra, 'Issues and Approaches of Database Integration. Communications of the ACM,' 41(5):166-178, 1998 https://doi.org/10.1145/276404.276408
  2. Amit P. Sheth, 'Changing focus on interoperability in information systems : from system, syntax, structure to semantics,' In M F Goodchild, M J Egenhofer, R Fegeas and C A Kottman (eds), Interoperating Geographic Information Systems. Kluwer. 1998
  3. Online Computer Library Center, 'Dublin Core Metadata Element Set : Reference Description,' 1997, Office of Research and Special Projects, Dublin, Ohio. http://www.oclc.org:5046/research/dublin_core/
  4. Beard K., Smith T, 'A framework for metainformation in digital libraries,' in Sheth A, Klas W (eds) Multimedia Data Management : Using Metadata to Integrate and Apply Digital Media. McGraw Hill: 341-365. 1998
  5. Gnther O, Voisard A, 'Metadata in geographic and environmental data management,' in Sheth A, Klas W (eds) Multimedia Data Management:Using Metadata to Integrate and Apply Digital Media. McGraw Hill: 57-87. 1998
  6. O.J. Reichman, et aI, 'A Knowledge Network for Biocomplexity : Building and Evaluating a Meta-data-based Framework for Integrating Heterogeneous Scientific Data', http://www.nceas.ucsb.edu/
  7. XML Metadata Interchange (XMI), http://omg.org/technology/documents/formal/xmi.html
  8. XQL (XML Query Language), Aug. 1999, http://www.ibiblio.org/xql/xql-proposal.html
  9. S. Ram and V. Ramesh, 'Schema Integration :Past, Current and Future,' in A. Elmagarmid, M. Rusinkiewica, and A. Sheith, editors, Management of Heterogeneous and Autonomous Database Systems, page 119-155. Morgan Kaufmann Publishers, 1999
  10. L.M.Haas, R.J.Miller, B.Niswonger, M.Tork Roth,P. M.Schwarz and E.L.Wimmers, 'Transforming Heterogeneous Data with Database Middleware : Beyond Integration,' Bulletin of IEEE Computer Society Technical Committee on Data Engineering. 1999
  11. R.J.Miller, L.M.Haas and M.A.Hemandez, 'Schema Mapping as Query Discovery,' in Proceedings of the 26th VLDB Conference. Cairo Egypt, 2000
  12. R.J.Miller, 'Using Schematically Heterogeneous Structures,' SIGMOD '98 Seattle WA, USA. ACM 0-89791-995-5. 1998 https://doi.org/10.1145/276304.276322
  13. Sophie Cluet, Claude Delobel, Jerome Simeon, Katarzyna Smaga. Your Mediators Need Data Conversion! In Proceedings ACM-SIGMOD International Conference on Management of Data, pages:177-188, 1998 https://doi.org/10.1145/276305.276321
  14. S.Abiteboul, S.Cluet and T.Milo, 'Correspondence and Translation for Heterogeneous Data,' in Proc. Of the IntI Conf. on Database Theory (ICDT), pages 351-363, 1997
  15. T.Milo and S.Zohar, 'Using Schema Matching to Simplify Heterogeneous Data Translation,' in Proc. of the IntI Conf. on VLDB, pages 122-133, NY, 1998
  16. Y. Papakonstantinou, H. Garcia-Molina and J.Ullman, MedMaker, 'A Mediation System Based on Declarative Specifications'
  17. C. Barn, A. Gupta, B. Ludscher, R. Marciano, Y. Papakonstantinou, P. Velikhov, V. Chu, 'XML-Based Information Mediation with MIX. Exhibition program,' in ACM Conf. on Management of Data, SIGMOD '99, Philadelphia, 1999
  18. The Extensible Stylesheet Language (XSL), http://www.w3.org/Style/XSL/
  19. Sheth A, 'Data semantics : What, where, and how', in Meersman R, Mark L (eds) Database Application Semantics, Chapman and Hall 601-610. 1997
  20. Brickley D, R.V Guha, eds, 'Resource Description Framework (RDF) Schema Specification,' W3C Proposed Recommendation, March 1999, http://www.w3.org/1999/TR/PR-rdf-schema
  21. Knowledge Sharing Effort, http://www-ksl.stanford.edu/knowledge-sharing
  22. Intelligent Integration of Information, http://mole.dc.isx.com/I3
  23. Digital Librabry Initiative, http://www.cise.nsf.gov/iis/ dli_home.html
  24. A. Gupta, B. Ludscher, M. E. Martone, 'Knowledge-Based Integration of Neuroscience Data Sources,' 12th Intl, Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany, IEEE Computer Society, July, 2000 https://doi.org/10.1109/SSDM.2000.869777
  25. B. Ludscher, A. Gupta, M. E. Martone, 'Model-Based Mediation with Domain Maps,' 17th IntI. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society, April 2001
  26. B. Ludascher, R. Himmeroder, G. Lausen, W. May, and C. Schlepphorst, 'Managing Semistructured Data with FLORID : A Deductive Object-Oriented Perspective,' Information Systems, 23(8):589-613, 1998 https://doi.org/10.1016/S0306-4379(98)00030-1
  27. A. Farquhar, R.Fikes and J.Rice, 'The Ontiliqua Server : A Tool for Collaborative Ontology Construction,' International Journal of Human-Computer Studies, 46 : 707-728, 1997 https://doi.org/10.1006/ijhc.1996.0121
  28. V. K. Chaudhri, A. Farquhar, et al. 'OKBC : Open Knowledge Base Connectivity 2.0,' Technical report KSL-98-06, Knowledge System Laboratory, Stanford, July 1997
  29. 이명철 외, DataBlender : 'XML 기반 가상 데이타베이스 통합 시스템', 데이타베이스연구, 제19권 1호, 2003. 3. pp 15-27
  30. 이경하 외, 'XML: XML 기반 분산 이질 정보 자원의 통합 프레임워크', KDBC2000, pp. 262-270, 2000
  31. 정도헌, '시맨틱웹을 위한 온톨로지 언어와 구현사례 연구', 정보관리연구, vol 34, no. 3, 2003, pp. 87-109