• Title/Summary/Keyword: Document Processing System

Search Result 397, Processing Time 0.023 seconds

A Study on Cluster Hierarchy Depth in Hierarchical Clustering (계층적 클러스터링에서 분류 계층 깊이에 관한 연구)

  • Jin, Hai-Nan;Lee, Shin-won;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.673-676
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.

  • PDF

Effectiveness of Fuzzy Graph Based Document Model

  • Aswathy M R;P.C. Reghu Raj;Ajeesh Ramanujan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2178-2198
    • /
    • 2024
  • Graph-based document models have good capabilities to reveal inter-dependencies among unstructured text data. Natural language processing (NLP) systems that use such models as an intermediate representation have shown good performance. This paper proposes a novel fuzzy graph-based document model and to demonstrate its effectiveness by applying fuzzy logic tools for text summarization. The proposed system accepts a text document as input and identifies some of its sentence level features, namely sentence position, sentence length, numerical data, thematic word, proper noun, title feature, upper case feature, and sentence similarity. The fuzzy membership value of each feature is computed from the sentences. We also propose a novel algorithm to construct the fuzzy graph as an intermediate representation of the input document. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric is used to evaluate the model. The evaluation based on different quality metrics was also performed to verify the effectiveness of the model. The ANOVA test confirms the hypothesis that the proposed model improves the summarizer performance by 10% when compared with the state-of-the-art summarizers employing alternate intermediate representations for the input text.

An Automatic Classification System of Korean Documents Using Weight for Keywords of Document and Word Cluster (문서의 주제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템)

  • Hur, Jun-Hui;Choi, Jun-Hyeog;Lee, Jung-Hyun;Kim, Joong-Bae;Rim, Kee-Wook
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.447-454
    • /
    • 2001
  • The automatic document classification is a method that assigns unlabeled documents to the existing classes. The automatic document classification can be applied to a classification of news group articles, a classification of web documents, showing more precise results of Information Retrieval using a learning of users. In this paper, we use the weighted Bayesian classifier that weights with keywords of a document to improve the classification accuracy. If the system cant classify a document properly because of the lack of the number of words as the feature of a document, it uses relevance word cluster to supplement the feature of a document. The clusters are made by the automatic word clustering from the corpus. As the result, the proposed system outperformed existing classification system in the classification accuracy on Korean documents.

  • PDF

A Diagram System based on XML (XML 기반 다이어그램 시스템)

  • Kim Sung-Keun;Kim Young-Chul;Youn Tae Hee;Yoo Chae-Woo
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.447-454
    • /
    • 2005
  • Generally, Diagram Systems related XML document are designed for certain purpose. It is also difficult to create DML document, because there is no definition of diagram component. In this paper, we design and implement the diagram system to execute the XML document. This diagram system defines the diagram component with WYSIWIG concept ,md it is designed to generate DML document automatically. Therefore, it is possible to develop diagram efficiently and maintain consistency by definition of syntax about diagram with DTD. And this system uses the concept of VPL(Visual Programming Language) to check syntax and semantic about diagram sentence. Though this system, DML documents can be generated easily, and it can also check syntax and perform semantic.

The Design and Implementation of the System for Processing Well-Formed XML Document on the Client-side (클라이언트 상의 Well-Formed XML 문서 처리 시스템의 설계 및 구현)

  • Song, Jong-Chul;Moon, Byung-Joo;Hong, Gi-Chai;Cheong, Hyun-Soo;Kim, Gyu-Tae;Lee, Soo-Youn
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.10
    • /
    • pp.3236-3246
    • /
    • 2000
  • XML is a meta-language as SGML and also can be xonsructed as an Internet versionof simplified SGML being used in confunction with XLL. Xpointer and XSL. Also W3C established DTDless Well-Formed XML document to use XML document on the Web. But it isnt offered system that consists of browsing, link and DTD generating facihty, and efficiently processes DTDless Well-Formed XML document. This paper studies on an implementation and design of system to process DTDless Well-Formed XML document on the client-side. This system consists of Well-Formed XML viewer displaying Well-Formed XML documet, XLL Processor processing Xll and Auto DTD generator constructing automatically DTDs based on multiple documents of the same class. This study focuses on automatic DTD generation during hyperlink navigation and an implementation of extended links based on XLL and Xpointer. ID and Xpointer location address are used as the address mode in the links. As a result of implement of this system, it conforms to validationof extended link facihties, extracts DTD from Well-Fromed XML Documents including same root element at the same class and constructs generalized DTD.

  • PDF

Design and Implementation of Conversion System from UML Class Diagram to XML DTD (UML 클래스 다이어그램을 XML DTD로의 변환 시스템 설계 및 구현)

  • Hong, Do-Seok;Ha, Yan;Kim, Yong-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3829-3839
    • /
    • 2000
  • The UML(Unified Modeling Language) Class Diagram which is a part of structure of UML is fit for Object Modeling, and more recently, as the appearance of UXF(UML eXchange Format) UML Class Diagram by itself, can be exchanged in many other different system document. So this paper suggest the conversion system from UML Class Diagram to XML DTD. As this we can easily transformation and saving the UML Class Diagram that is the standard of Modeling Language to XML document which is so reusable. Also it can give a flexible method for the representation to the logical structure of document in various way because of converting XML DTD.

  • PDF

Document Structure Understanding on Subjects Registration Table

  • Ito, Yuichi;Ohno, Masanaga;Tsuruoka, Shinji;Yoshikawa, Tomohiro;Tsuyoshi, Shinogi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.571-574
    • /
    • 2003
  • This research is aimed to automate the generating process of the database from paper based table forms like this work. The registration table has so complicate table structures, ana in this research we used the registration tables as an example of general table structure understanding. We propose a table structure understanding system for some table types, and it has some steps. The first step is that the document images on paper are read from the image scanner. The second step is that a document image segments into some tables. In the third step, the character strings is extracted using image processing technology and the property of the character strings is determined. And the structured database is generated automatically. The proposed system consists of two systems. "Master document generation system" is used for the table form definition, and it doesn′t include the handwritten characters. "Structure analysis system for complete d table" is used for the written form, and it analyzes the table form filled in the handwritten character. We implemented the system using MS Visual C++ on Windows, and it can get the correct extraction rate 98% among 51 registration tables written by the different students.

  • PDF

CALS System Development Methodology Using Document Trace Diagram and IDEF Model (Document Trace Diagram 과 IDEF 모델을 이용한 CALS 시스템 개발 방법론)

  • Kim, Soung-Hie;Cho, Sung-Sik;Lee, Jae-Kwang;Han, Chang-Hee;Yoon, Young-Suk
    • Asia pacific journal of information systems
    • /
    • v.8 no.3
    • /
    • pp.37-49
    • /
    • 1998
  • The basic goal of CALS is to improve transactions and relationships among organizations through information sharing and integration. CALS is an information strategy which needs strong cooperation between organizations or between users and developers in design step. However, current design methodologies using IDEF models, that are considered to be standard for CALS system development, has some limitations. For example, it is difficult for system developers to communicate with counterparts by IDEF model since IDEF models are difficult for counterparts to understand. In this paper, we suggest a development methodology for GALS systems by complementing IDEF model with Document Trace Diagram, which we developed as a communication tool, The concept of Document Trace Diagram stems from the fact that most information exchanged within or between organizations is in the form of documents and most standard operating procedures of organizations are about processing the documents. It helps system developers identify functions and their ICOMs (Input, Control, Output, Mechanism) with ease and little communication cost. With this methodology, we have constructed the GALS prototype system for construction industry.

  • PDF

MissCW:Multiuser Interactive System for Synchronous Collaborative Writing (MissCW:다중 사용자 동기적 공동 저작 시스템)

  • Seong, Mi-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.7
    • /
    • pp.1697-1706
    • /
    • 1996
  • This paper presents the design and the implementation of a MissCW(Multi user Interactive System for Synchronous Collaborative Writing). The document model DMDA(Distributed Multimedia Document Architecture) of MissCW consists of the logical structure, presentation style object, and mark object. The windows. The collaborative editor of this system proposes a structure oriented editing mechanism to combine distrbuted objects into one document. The middleware SOM(Shared Object Manager) maintains shared objects consistently and helps application programs use objects efficiently. The infrastructure of this system is a hybrid structure of replicated and centralized architectures, that is to maintain shared objects consistently inside of SOM and to reduce the overhead of network traffic. The central part is a virtual node which corresponds to the Object Controller of SOMwith the SOT(Shared Objet Table).

  • PDF

Design of XQL Query Processing System for Structural information retrieval (구조적 정보 검색을 위한 XQL 질의 처리 시스템 설계)

  • 김상영;김철원;김광현;박종훈;정현철
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.892-896
    • /
    • 2003
  • XML is used in various fields such as interface format for data swapping between application between several various system passing over thing to mark to web browser simply. Accordingly, a lot of studies about system that can manage effectively and search XML document with formation of information, reusability, disposal and durability, portability are proceeding. In this paper, explain about XQL and document structure processor and language processor of quality and make contents of XML document by tree structure, structure information presents method that find XML document tree structure information that is correct on question using XQL while do parsing. Through this, described for design and embodiment of efficient XML document search system that use XQL that compose structure information of document in tree structure and is proposed in language of quality after do parsing absorbing XML document that is scattered on web.

  • PDF