• Title/Summary/Keyword: 문서지

Search Result 2,043, Processing Time 0.023 seconds

A Study on Ambiguity Resolving for Pen-based Proofreading of Web Documents (펜 기반 웹 문서 교정을 위한 모호성 문제 해결에 관한 연구)

  • Sohn, Won-Sung
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.1
    • /
    • pp.107-116
    • /
    • 2007
  • To produce accurate editing results, the ambiguity of editing scopes related to marked correction signs should be solved. Proofreading the web document modifies the document structures, and the modified structures should be robustly valid for the defined DTD. This paper presents a pen-based proof-reading interface in the XML document. In the proposed interface, correction signs are free-drawn, and the editing scopes are recognized and revised based on the contexts of the document to minimize the ambiguity of the editing scopes. The proposed interface provides both implicit and explicit modification methods for document structures. As a result, the editing scopes processed in the proposed interface are more accurate, and the document structures are maintained valid for DTD after the editing.

  • PDF

Extracting Logical Structure from Web Documents (웹 문서로부터 논리적 구조 추출)

  • Lee Min-Hyung;Lee Kyong-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.10
    • /
    • pp.1354-1369
    • /
    • 2004
  • This paper presents a logical structure analysis method which transforms Web documents into XML ones. The proposed method consists of three phases: visual grouping, element identification, and logical grouping. To produce a logical structure more accurately, the proposed method defines a document model that is able to describe logical structure information of topic-specific document class. Since the proposed method is based on a visual structure from the visual grouping phase as well as a document model that describes logical structure information of a document type, it supports sophisticated structure analysis. Experimental results with HTML documents from the Web show that the method has performed logical structure analysis successfully compared with previous works. Particularly, the method generates XML documents as the result of structure analysis, so that it enhances the reusability of documents.

  • PDF

Document Clustering Methods using Hierarchy of Document Contents (문서 내용의 계층화를 이용한 문서 비교 방법)

  • Hwang, Myung-Gwon;Bae, Yong-Geun;Kim, Pan-Koo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2335-2342
    • /
    • 2006
  • The current web is accumulating abundant information. In particular, text based documents are a type used very easily and frequently by human. So, numerous researches are progressed to retrieve the text documents using many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both subject and semantic of documents. So, to overcome the previous problems, we propose the document similarity method for semantic retrieval of document users want. This is the core method of document clustering. This method firstly, expresses a hierarchy semantically of document content ut gives the important hierarchy domain of document to weight. With this, we could measure the similarity between documents using both the domain weight and concepts coincidence in the domain hierarchies.

Document Clustering Method using Coherence of Cluster and Non-negative Matrix Factorization (비음수 행렬 분해와 군집의 응집도를 이용한 문서군집)

  • Kim, Chul-Won;Park, Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.12
    • /
    • pp.2603-2608
    • /
    • 2009
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the clustering method based NMF(non-negative matrix factorization) and refinement of documents in cluster by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.

Structure Recognition Method of Invoice Document Image for Document Processing Automation (문서 처리 자동화를 위한 인보이스 이미지의 구조 인식 방법)

  • Dong-seok Lee;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.11-19
    • /
    • 2023
  • In this paper, we propose the methods of invoice document structure recognition and of making a spreadsheet electronic document. The texts and block location information of word blocks are recognized by an optical character recognition engine through deep learning. The word blocks on the same row and same column are found through their coordinates. The document area is divided through arrangement information of the word blocks. The character recognition result is inputted in the spreadsheet based on the document structure. In simulation result, the item placement through the proposed method shows an average accuracy of 92.30%.

Accelerating Keyword Search Processing over XML Documents using Document-level Ranking (문서 단위 순위화를 통한 XML 문서에 대한 키워드 검색 성능 향상)

  • Lee, Hyung-Dong;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.538-550
    • /
    • 2006
  • XML Keyword search enables us to get information easily without knowledge of structure of documents and returns specific and useful partial document results instead of whole documents. Element level query processing makes it possible, but computational complexity, as the number of documents grows, increases significantly overhead costs. In this paper, we present document-level ranking scheme over XML documents which predicts results of element-level processing to reduce processing cost. To do this, we propose the notion of 'keyword proximity' - the correlation of keywords in a document that affects the results of element-level query processing using path information of occurrence nodes and their resemblances - for document ranking process. In benefit of document-centric view, it is possible to reduce processing time using ranked document list or filtering of low scored documents. Our experimental evaluation shows that document-level processing technique using ranked document list is effective and improves performance by the early termination for top-k query.

Sparse Document Data Clustering Using Factor Score and Self Organizing Maps (인자점수와 자기조직화지도를 이용한 희소한 문서데이터의 군집화)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.2
    • /
    • pp.205-211
    • /
    • 2012
  • The retrieved documents have to be transformed into proper data structure for the clustering algorithms of statistics and machine learning. A popular data structure for document clustering is document-term matrix. This matrix has the occurred frequency value of a term in each document. There is a sparsity problem in this matrix because most frequencies of the matrix are 0 values. This problem affects the clustering performance. The sparseness of document-term matrix decreases the performance of clustering result. So, this research uses the factor score by factor analysis to solve the sparsity problem in document clustering. The document-term matrix is transformed to document-factor score matrix using factor scores in this paper. Also, the document-factor score matrix is used as input data for document clustering. To compare the clustering performances between document-term matrix and document-factor score matrix, this research applies two typed matrices to self organizing map (SOM) clustering.

An Efficient Versioning Method for XML Document Repository System (XML 문서 저장관리 시스템을 위한 효율적인 버전닝 기법)

  • 손충범;배양석;유재수
    • Journal of Internet Computing and Services
    • /
    • v.3 no.4
    • /
    • pp.37-50
    • /
    • 2002
  • XML document repositary system(XDRS) should be able to manage vertical and horizontal versions of documents to store, update and manage XML documents without loss of information, However, most of existing XDRSs do not support a versioning method Although a few systems support versioning method. they only manage vertical versions of XML documents, While the vertical versioning preserves the update history of documents. the horizontal versioning branches a document to many other versions of documents so that users can easily create new documents from the original version and edit them to have different meanings. In this paper, we propose a new version numbering scheme to support both vertical and horizontal versioning efficiently. We also design a schema that supports versioning and preserves the paradigm of structure information.

  • PDF

Dynamic Generation of SMIL based Multimedia Documents on the Web (웹에서 SMIL 기반 멀티미디어 문서의 동적 생성)

  • 김경덕
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.5
    • /
    • pp.439-445
    • /
    • 2001
  • In this paper, we suggest a method for dynamic generation of SMIL documents by user profiles on the web. Generated multimedia documents are based on the SMIL (Synchronized Multimedia Integration Language) that are recommended by the W3C. The method generates automatically XSLT documents according to user profiles. SMIL documents are produced on real-time by integration of the XSLT documents and the XML documents that are made already. Most of conventional web-based documents are based on the HTML that is difficult to support reusability of documents are relation among multimedia abject. However, the suggested method is based on the XML, and so it supports reusability of documents and produces efficiently various SMIL-based multimedia documents. Application for the suggested method are as follows; Electronic commerce, tele-lecture, a web-based document editing, etc.

  • PDF

Implementation and Design of Document Class Editor based on ODA (ODA에 근거한 문서 클래스 에디터 설계 및 구현)

  • 정회경;이수연
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.12
    • /
    • pp.1412-1422
    • /
    • 1992
  • This paper describes an implementation of the document class editor based on ODA(Open Document Architecture). For processing, we divided document structure into generic logical structure and generic layout structure as ODA standard. Also this editor could edit document profile. Using the utility which was implemented to investigate the composed document by object. we confirmed the document. And we could verify the ODIF stream data of the document. We designed this editor based on DAP level 2 of international functional standard. This system was implemented in environment of the X window system and the Motif as graphical user interface. This document class editor will be used to create real document having specific document structure.

  • PDF