• Title/Summary/Keyword: large XML document

Search Result 49, Processing Time 0.025 seconds

Document Filtering Algorithm for Efficient Preprocessing of XML Information Retrieval (XML 정보검색의 효율적 전처리를 위한 문서여과 알고리즘)

  • Kong Yong-Hae;Kim Myung-Sook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.6 no.1
    • /
    • pp.1-11
    • /
    • 2005
  • The paper proposes a preprocessing method for efficient processing of XML queries in information retrieval with a large amount of XML documents. The conventional preprocessing methods filter out XML documents by parsing XML document for keyword of query or by comparing query signatures with signatures of XML document to be generated. But these methods are dependent on a query and are very in efficient for a large amount of XML documents. For this, we generate a universal DTD based on ontology of a domain. The universal DTD is applicable to the XML documents when they contain information of a same domain even when they have different structures and attributes. Then, using the universal DTD, we filter out the XML documents that are not bounded in the domain. We evaluate the performance of this method through experiments.

  • PDF

An RDBMS-based Inverted Index Technique for Path Queries Processing on XML Documents with Different Structures (상이한 구조의 XML문서들에서 경로 질의 처리를 위한 RDBMS기반 역 인덱스 기법)

  • 민경섭;김형주
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.420-428
    • /
    • 2003
  • XML is a data-oriented language to represent all types of documents including web documents. By means of the advent of XML-based document generation tools and grow of proprietary XML documents using those tools and translation from legacy data to XML documents at an accelerating pace, we have been gotten a large amount of differently-structured XML documents. Therefore, it is more and more important to retrieve the right documents from the document set. But, previous works on XML have mainly focused on the storage and retrieval methods for a large XML document or XML documents had a same DTD. And, researches that supported the structural difference did not efficiently process path queries on the document set. To resolve the problem, we suggested a new inverted index mechanism using RDBMS and proved it outperformed the previous works. And especially, as it showed the higher efficiency in indirect containment relationship, we argues that the index structure is fit for the differently-structured XML document set.

Clustering XML Documents Considering The Weight of Large Items in Clusters (클러스터의 주요항목 가중치 기반 XML 문서 클러스터링)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.1-8
    • /
    • 2007
  • As the web document of XML, an exchange language of data in the advanced Internet, is increasing, a target of information retrieval becomes the web documents. Therefore, there we researches on structure, integration and retrieval of XML documents. This paper proposes a clustering method of XML documents based on frequent structures, as a basic research to efficiently process query and retrieval. To do so, first, trees representing XML documents are decomposed and we extract frequent structures from them. Second, we perform clustering considering the weight of large items to adjust cluster creation and cluster cohesion, considering frequent structures as items of transactions. Third, we show the excellence of our method through some experiments which compare which the previous methods.

Design and Implementation of BADA-IV/XML Query Processor Supporting Efficient Structure Querying (효율적 구조 질의를 지원하는 바다-IV/XML 질의처리기의 설계 및 구현)

  • 이명철;김상균;손덕주;김명준;이규철
    • The Journal of Information Technology and Database
    • /
    • v.7 no.2
    • /
    • pp.17-32
    • /
    • 2000
  • As XML emerging as the Internet electronic document language standard of the next generation, the number of XML documents which contain vast amount of Information is increasing substantially through the transformation of existing documents to XML documents or the appearance of new XML documents. Consequently, XML document retrieval system becomes extremely essential for searching through a large quantity of XML documents that are storied in and managed by DBMS. In this paper we describe the design and implementation of BADA-IV/XML query processor that supports content-based, structure-based and attribute-based retrieval. We design XML query language based upon XQL (XML Query Language) of W3C and tightly-coupled with OQL (a query language for object-oriented database). XML document is stored and maintained in BADA-IV, which is an object-oriented database management system developed by ETRI (Electronics and Telecommunications Research Institute) The storage data model is based on DOM (Document Object Model), therefore the retrieval of XML documents is executed basically using DOM tree traversal. We improve the search performance using Node ID which represents node's hierarchy information in an XML document. Assuming that DOW tree is a complete k-ary tree, we show that Node ID technique is superior to DOM tree traversal from the viewpoint of node fetch counts.

  • PDF

Implementation of Form-based XML Document Editor (Form 기반의 XML 문서 편집기 구현)

  • Go, Tak-Hyeon;Hwang, In-Jun
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.267-276
    • /
    • 2002
  • Existing XML editors, which are usually tree-based, require knowledge on the XML from users. But this requirement should be removed in order for any user to create XML documents easily. In this paper, we developed a new XML editor which provides both the usual tree-based interface and the form-based interface derided from the original document. Editing XML documents through forms will be especially effective in the places such as enterprise or municipal office where a large amount of documents of same format need to be generated. Forms, which are HTML documents, are generated automatically through the XSLT using both template XML document and XSL document, and displayed on the built-in HTML browser. When a form is filled out by user, it will he transformed into its corresponding XML document and stored into the XML repository.

A Clustering Technique using Common Structures of XML Documents (XML 문서의 공통 구조를 이용한 클러스터링 기법)

  • Hwang, Jeong-Hee;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.6
    • /
    • pp.650-661
    • /
    • 2005
  • As the Internet is growing, the use of XML which is a standard of semi-structured document is increasing. Therefore, there are on going works about integration and retrieval of XML documents. However, the basis of efficient integration and retrieval of documents is to cluster XML documents with similar structure. The conventional XML clustering approaches use the hierarchical clustering algorithm that produces the demanded number of clusters through repeated merge, but it have some problems that it is difficult to compute the similarity between XML documents and it costs much time to compare similarity repeatedly. In order to address this problem, we use clustering algorithm for transactional data that is scale for large size of data. In this paper we use common structures from XML documents that don't have DTD or schema. In order to use common structures of XML document, we extract representative structures by decomposing the structure from a tree model expressing the XML document, and we perform clustering with the extracted structure. Besides, we show efficiency of proposed method by comparing and analyzing with the previous method.

Similarity Measure based on XML Document's Structure and Contents (XML 문서의 구조와 내용을 고려한 유사도 측정)

  • Kim, Woo-Saeng
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.8
    • /
    • pp.1043-1050
    • /
    • 2008
  • XML has become a standard for data representation and exchange on the Internet. With a large number of XML documents on the Web, there is an increasing need to automatically process those structurally rich documents for information retrieval, document management, and data mining applications. In this paper, we propose a new method to measure the similarity between XML documents by considering their structures and contents. The similarity of document's structure is found by a simple string matching technique and that of document's contents is found by weights taking into account of the names and positions of elements. The overall algorithm runs in time that is linear in the combined size of the two documents involved in comparison evaluation.

  • PDF

A Hierarchical Clustering Technique of XML Documents based on Representative Path (대표 경로에 기반한 XML 문서의 계층 군집화 기법)

  • Kim, Woo-Saeng
    • Journal of Internet Computing and Services
    • /
    • v.10 no.3
    • /
    • pp.141-150
    • /
    • 2009
  • XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for accessing, querying, and storing XML documents. In this paper, we propose a new method to cluster XML documents efficiently. A new prepresentative path called a virtul path which can represent both the structure and the contents of a XML document is proposed for the feature of a XML document. A method to apply the well known hierarchical clustering techniques to the representative paths to cluster XML documents is also proposed. The experiment shows that the true clusters are formed in a compact shape when a virtual path is used for the feature of a XML document.

  • PDF

Clustering Techniques for XML Data Using Data Mining

  • Kim, Chun-Sik
    • Proceedings of the CALSEC Conference
    • /
    • 2005.03a
    • /
    • pp.189-194
    • /
    • 2005
  • Many studies have been conducted to classify documents, and to extract useful information from documents. However, most search engines have used a keyword based method. This method does not search and classify documents effectively. This paper identifies structures of XML document based on the fact that the XML document has a structural document using a set theory, which is suggested by Broder, and attempts a test for clustering XML document by applying a k-nearest neighbor algorithm. In addition, this study investigates the effectiveness of the clustering technique for large scaled data, compared to the existing bitmap method, by applying a test, which reveals a difference between the clause based documents instead of using a type of vector, in order to measure the similarity between the existing methods.

  • PDF

Design and implementation of an XML Repository System supporting Document Version (버전을 지원하는 XML 저장관리 시스템 설계 및 구현)

  • Son, Chung-Beom;Oh, Kyoung-Keun;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.13-22
    • /
    • 2003
  • Recently, as the Importance of the management on internet documents has highly increased, the research of an XML repository system has been actively made to store, retrieve and manage large XML documents. The version management for XML documents is required in the XML applications such as patent documents, software design and system manual that the modified documents have to be managed. In this paper, we propose a data model based on a fragmentation model that supports document versioning. We also design and implement an XML repository system supporting document versioning. It is shown through Performance evaluation that our system outperforms the existing repository system.