• Title/Summary/Keyword: XML Document Management

Search Result 183, Processing Time 0.027 seconds

Classification Techniques for XML Document Using Text Mining (텍스트 마이닝을 이용한 XML 문서 분류 기술)

  • Kim Cheon-Shik;Hong You-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.15-23
    • /
    • 2006
  • Millions of documents are already on the Internet, and new documents are being formed all the time. This poses a very important problem in the management and querying of documents to classify them on the Internet by the most suitable means. However, most users have been using the document classification method based on a keyword. This method does not classify documents efficiently, and there is a weakness in the category of document that includes meaning. Document classification by a person can be very correct sometimes and often times is required. Therefore, in this paper, We wish to classify documents by using a neural network algorithm and C4.5 algorithms. We used resume data forming by XML for a document classification experiment. The result showed excellent possibilities in the document category. Therefore, We expect an applicable solution for various document classification problems.

  • PDF

A Study on the Performance of Structured Document Retrieval Using Node Information (노드정보를 이용한 문서검색의 성능에 관한 연구)

  • Yoon, So-Young
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.103-120
    • /
    • 2007
  • Node is the semantic unit and a part of structured document. Information retrieval from structured documents offers an opportunity to go subdivided below the document level in search of relevant information, making any element in an structured document a retrievable unit. The node-based document retrieval constitutes several similarity calculating methods and the extended node retrieval method using structure information. Retrieval performance is hardly influenced by the methods for determining document similarity The extended node method outperformed the others as a whole.

(Design and Implementation of DTD Authoring Tools for XML Documents) (XML 문서를 위한 DTD 저작 도구의 설계 및 구현)

  • 김현주
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.8
    • /
    • pp.1093-1104
    • /
    • 2002
  • XML is a markup language which has been accepted in various fields such as digital libraries, electronic commerce, and web applications. Research for creation, storage, management, and retrieval of XML documents is essential to develope XML application systems. This paper presents design and implementation details of powerful and convenient DTD authoring tools for XML documents. The design principles are authoring convenience, semi-automatic creation of valid and reliable document DTD by systematic guidance to reduce the possibility of syntax errors, and visualization of document structures.

  • PDF

Rule Based Document Conversion and Information Extraction on the Word Document (워드문서 콘텐츠의 사용자 XML 콘텐츠로의 변환 및 저장 시스템 개발)

  • Joo, Won-Kyun;Yang, Myung-Seok;Kim, Tae-Hyun;Lee, Min-Ho;Choi, Ki-Seok
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.555-559
    • /
    • 2006
  • This paper will intend to contribute to extracting and storing various form of information on user interests by using structural rules user makes and XML-based word document converting techniques. The system named PPE consists of three essential element. One is converting element which converts word documents like HWP, DOC into XML documents, another is extracting element to prepare structural rules and extract concerned information from XML document by structural rules, and the other is storing element to make final XML document or store it into database system. For word document converting, we developed OCX based word converting daemon. Helping user to extracting information, we developed script language having native function/variable processing engine extended from XSLT. This system can be used in the area of constructing word document contents DB or providing various information service based on RAW word documents. We really applied it to project management system and project result management system.

  • PDF

Incremental Clustering of XML Documents based on Similar Structures (유사 구조 기반 XML 문서의 점진적 클러스터링)

  • Hwang Jeong Hee;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.699-709
    • /
    • 2004
  • XML is increasingly important in data exchange and information management. Starting point for retrieving the structure and integrating the documents efficiently is clustering the documents that have similar structure. The reason is that we can retrieve the documents more flexible and faster than the method treating the whole documents that have different structure. Therefore, in this paper, we propose the similar structure-based incremental clustering method useful for retrieving the structure of XML documents and integrating them. As a novel method, we use a clustering algorithm for transactional data that facilitates the large number of data, which is quite different from the existing methods that measure the similarity between documents, using vector. We first extract the representative structures of XML documents using sequential pattern algorithm, and then we perform the similar structure based document clustering, assuming that the document as a transaction, the representative structure of the document as the items of the transaction. In addition, we define the cluster cohesion and inter-cluster similarity, and analyze the efficiency of the Proposed method through comparing with the existing method by experiments.

A Experimental Study on the Usefulness of Structure Hints in the Leaf Node Language Model-Based XML Document Retrieval (단말노드 언어모델 기반의 XML문서검색에서 구조 제한의 유용성에 관한 실험적 연구)

  • Jung, Young-Mi
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.209-226
    • /
    • 2007
  • XML documents format on the Web provides a mechanism to impose their content and logical structure information. Therefore, an XML processor provides access to their content and structure. The purpose of this study is to investigate the usefulness of structural hints in the leaf node language model-based XML document retrieval. In order to this purpose, this experiment tested the performances of the leaf node language model-based XML retrieval system to compare the queries for a topic containing only content-only constraints and both content constrains and structure constraints. A newly designed and implemented leaf node language model-based XML retrieval system was used. And we participated in the ad-hoc track of INEX 2005 and conducted an experiment using a large-scale XML test collection provided by INEX 2005.

Schema of Maintenance Data Exchange and Implementation Applied To Ship & Offshore Platform

  • Son, Gum Jun;Lee, Jang Hyun
    • Journal of Advanced Research in Ocean Engineering
    • /
    • v.4 no.3
    • /
    • pp.96-104
    • /
    • 2018
  • The importance of data management for the efficient maintenance and operation of offshore structures is becoming increasingly important. This paper has discussed the data schema and business rules that standardize the data exchange between ship design, operation and maintenance. Technical documentation that meets the international standards of ShipDex and S1000D for exchanging the operation and management data in neutral or standard formats has been introduced into the life cycle management of ships. The schema of the data exchange is represented by XML (eXtensible Markup Language) and the lifecycle data is implemented by a structured document. Lifecycle data is represented as data modules defined by XML schema. Given the feasible data generation, an example of a technical document is introduced by a general XML authoring tool.

DEDMS : Distributed Environment Document Management System Model based on the XML-RPC (XML-RPC 기반의 분산환경 문서관리 시스템 모델)

  • 고혁준;김정희;곽호영
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.2
    • /
    • pp.394-406
    • /
    • 2004
  • Even the document resources offered from web server can be represented in the form of URL/URI, it can not necessarily be guaranteed that corresponding resources exist due to a dynamic change of sewer environment In this paper, integrated document administration system is therefore proposed and modeled using the XML-RPC technology which guarantees the reliance of resources, and handles a dynamic server resource management and request of clients. The proposed system is composed of middleware and server systems. The former system manages dynamic server resources, and the latter reports the updated information of documentations stored in server by client from the server to middleware system. As a result, effective storing management of dynamic resource in distributed server could be archived and building cost of a new web server could be reduced due to an applicability to current web sewer. In addition platform independent and efficient data management was obtained by using the XML-RPC protocol.

The Path Inverted Index Technique for XML Document Retrieval (XML 문서 검색을 위한 경로 역 색인 기법)

  • Moon, Kyung-Won;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.17D no.2
    • /
    • pp.103-110
    • /
    • 2010
  • Recently, many XML document management systems using the advantage of RDBMS have been actively developed for the storage, processing and retrieval of XML documents. However, fractional pattern-matching query such as the LIKE operations cannot take the advantage of the index of RDBMS because these operations have deteriorated retrieval performance through its inefficient comparison processing. The hierarchical XML storage technique which stores XML documents in RDBMS efficiently, and the path inverted index technique are proposed in this paper. It regards the element of an XML document as a keyword, and focuses on organizing a posting file with path identifiers and sequences to reduce the retrieval time of path based query. Through simulations, our methods have shown about 60% better performance than the conventional method using RDBMS in searching.

Digitalizing Technical Documents of Construction Projects Based on Database and XML (데이터베이스와 XML에 기반한 건설프로젝트 기술문서 전자화)

  • Jung Jong-Hyun
    • Korean Journal of Construction Engineering and Management
    • /
    • v.6 no.4 s.26
    • /
    • pp.190-198
    • /
    • 2005
  • This study describes the digitalization of technical documents of construction projects using database for storage and XML for exchange format on the web. First, for this purpose, the requirements for effective digitalization are identified. Second, the strategies for using database and XML are presented. These strategies include the way to store and search for the technical documents, to draw up the XML document for some parts of the technical documents, to arrange the components in their proper hierarchy, to manage the graphics and mathematical expressions in database and XML documents. Finally we discussed the validities of the results of this study by partial implementation for structural design sheets which has all the characteristics of technical documents.