• Title/Summary/Keyword: Document Processing System

Search Result 394, Processing Time 0.034 seconds

Document Image Segmentation and Classification using Texture Features and Structural Information (텍스쳐 특징과 구조적인 정보를 이용한 문서 영상의 분할 및 분류)

  • Park, Kun-Hye;Kim, Bo-Ram;Kim, Wook-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.215-220
    • /
    • 2010
  • In this paper, we propose a new texture-based page segmentation and classification method in which table region, background region, image region and text region in a given document image are automatically identified. The proposed method for document images consists of two stages, document segmentation and contents classification. In the first stage, we segment the document image, and then, we classify contents of document in the second stage. The proposed classification method is based on a texture analysis. Each contents in the document are considered as regions with different textures. Thus the problem of classification contents of document can be posed as a texture segmentation and analysis problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. Our method does not assume any a priori knowledge about content or language of the document. As we can see experiment results, our method gives good performance in document segmentation and contents classification. The proposed system is expected to apply such as multimedia data searching, real-time image processing.

Keyword Spotting on Hangul Document Images Using Image-to-Image Matching (영상 대 영상 매칭을 이용한 한글 문서 영상에서의 단어 검색)

  • Park Sang Cheol;Son Hwa Jeong;Kim Soo Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.357-364
    • /
    • 2005
  • In this paper, we propose an accurate and fast keyword spotting system for searching user-specified keyword in Hangul document images by using two-level image-to-image matching. The system is composed of character segmentation, creating a query image, feature extraction, and matching procedure. Two different feature vectors are used in the matching procedure. An experiment using 1600 Hangul word images from 8 document images, downloaded from the website of Korea Information Science Society, demonstrates that the proposed system is superior to conventional image-based document retrieval systems.

A proposal for managing electronic document of the government (정부기관의 전자문서관리 방향)

  • Lee, Jae-Ha;Yoon, Dai-Hyun
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.1 no.1
    • /
    • pp.245-257
    • /
    • 2001
  • Recently, The government has been brought in the electronic document system. So, It's been increasing the job processing with the electronic approval and the distribution business. However because of the variety of the storage type of electronic document, it's expected many difficulties in the public-usage and etemity-preservation of the information later. also, There are several problems to manage electronic document system, for example, absence of the important function for managing records, etc. So, We propose the methodology as a way to solve several problems of managing electronic document in this paper. It grows the business which is produced in processing the electronic records management. The kind of document file produced by the government is various. Through introducing the standard format of document file, hereafter it has an effect on helpfulness in standardizing the electronic document system, and people recognize the situation of problem to append the important function of the preservation and usage for the electronic document system. The key task is to make the document system with keeping records and following functions according to the law of records management. As applying the standard electronic document system to manage records, the records of the processing section to the data center and then the records of the data center transfer to the government records and archives center. So, the records which be transferred can be preservative and available. The record, such as visual and auditory record which is not easy to digitalize, can be digitally preservative and available in the government records and archives center.

An XML Structure Translation System using Schema Structure Data Mapping (스키마 구조 데이타 매핑을 이용한 XML 구조변환 시스템)

  • 송종철;김창수;정회경
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.5
    • /
    • pp.406-418
    • /
    • 2004
  • Last days, various kinds of applications and system were individually introduced into specific groups or enterprises by different objective without considering interoperability among those. However, the environment for data processing is changing rapidly in these days. And now the necessity is growing to integrate and couple applications and system in the process dimension for more flexible and quicker data processing on these application programs and system. When integrating these application programs or system, an integration based on XML is recommended as it is one of good methods which will the additional cost and satisfy the requirements of the integration. This is because the XML is not only device-independent data type which can be used any platform, but also it uses XSLT, the document conversion standard established by W3C, which allows easy data conversion from one to another type on occasion of demands. This paper studies a design and implementation of system to convert XML structure. This system shows the structure of source- side providing data and destination-side processing data with using XML schema that defines structural information of a XML document. And this system defines the structure relationship of desired form as mapping structural information and data. This system creates the XSLT document that defines conversion rule between two structures based information which is defined. The XSLT document which is created as described above will convert data to be appropriate to the structure of the destination- side. By implementing this system, it is able to apply a document into various kinds of structure without considering specific system or platform and it is able to construct XSLT document to which meaning of desired form can be given. This paper aims to offer a process conversion between documents and to improve interoperability and scalability, so that we can contribute to build XML document processing environment

A Study of Patent Document Processing by SGML (SGML을 이용한 특허정보처리 연구)

  • Kwon, Young-Sook
    • Journal of Information Management
    • /
    • v.30 no.3
    • /
    • pp.44-54
    • /
    • 1999
  • A description of SGML(Standard Generalized Markup Language) is given together with a detailed description of WIPO Standard ST.32. The benefits of the use of SGML are highlighted-its system Independence and flexibility in building publication systems and full-text databases. A structure of WIPO Standard ST,32 based patent content is defined by DTD(document type definition) written in ST.32, and full-text itself is described with generalized markup depending on DTD. This article explains how to represent a document structure : a hierarchy structure like a entire document, a specific, sub-document, a paragraph, or non-hirarchy structure like a table drawings, or chemical structures. Merits of SGML In patent document processing are also discussed.

  • PDF

Style Control of Structured Documents using DSSSL

  • Lee, Kyong-Ho;Lee, Jin-Ho;Choy, Yoon-Chul
    • Proceedings of the Korea Database Society Conference
    • /
    • 1997.10a
    • /
    • pp.455-462
    • /
    • 1997
  • SGML(Standard Generalized Markup Language) is the ISO standard fer describing the logical structure of documents and is also adopted as the CALS standard for document description. Since then, there have been growing interests in SGML application in a variety of fields. However because SGML doesn't provide a standard method for describing various processing informations, ie, formatting and transformation, most applications have applied methods that are system dependent. Recently, ISO defined DSSSL(Document Style Semantics and Specification Language) as a standard mechanism to specify the formatting, transformation and retrieval of structured documents. Therefore, in this paper, we present a DSSSL processing system far style control of structured documents such as SGML documents. The system processes DSSSL style sheet that describes layout of documents and browses the result of its application to a SGML document. We have conducted tests on a lot of SGML documents and DSSSL style sheets successfully. Now, we are developing the SGML document management system that supports creation, editing, storage and retrieval of SGML document based upon the DSSSL processor and the SGML parser which we have developed.

  • PDF

Design of Templating System for Web Publication (웹 출판을 위한 템플릿 시스템의 설계)

  • Abdallah, Hisham;Koo, Heung-Seo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11c
    • /
    • pp.1777-1780
    • /
    • 2002
  • This paper presents a well-designed templating system for CMS web Publication using XML/XSL technology. The primary motivation is the need of Web CMS to separate content from layout and logic. Our system provides GUI XSLT editor (x-editor) to create and modify XSLT stylesheet documents easily. These documents are used to add "layout" and "look and feel" information to XML document which contains content and functionality. The modified XML document is processed by XML-template engine to produce dynamic or static web sites.

  • PDF

Research and Development of Document Recognition System for Utilizing Image Data (이미지데이터 활용을 위한 문서인식시스템 연구 및 개발)

  • Kwag, Hee-Kue
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.125-138
    • /
    • 2010
  • The purpose of this research is to enhance document recognition system which is essential for developing full-text retrieval system of the document image data stored in the digital library of a public institution. To achieve this purpose, the main tasks of this research are: 1) analyzing the document image data and then developing its image preprocessing technology and document structure analysis one, 2) building its specialized knowledge base consisting of document layout and property, character model and word dictionary, respectively. In addition, developing the management tool of this knowledge base, the document recognition system is able to handle the various types of the document image data. Currently, we developed the prototype system of document recognition which is combined with the specialized knowledge base and the library of document structure analysis, respectively, adapted for the document image data housed in National Archives of Korea. With the results of this research, we plan to build up the test-bed and estimate the performance of document recognition system to maximize the utilization of full-text retrieval system.

A Study on the online of PDF Electronic Documents System (인터넷 원거리출판의 응용과 PDF의 인쇄활용에 관한 연구)

  • 유영수;강영립;김병현;이광수
    • Proceedings of the Korean Printing Society Conference
    • /
    • 2001.06a
    • /
    • pp.63-77
    • /
    • 2001
  • PDF(Portable Document Format) is a file format that Adobe advances postscritp technique and use in managing document information or electric publishing(internet, CD-ROM, DVD). PDF is a devised document type for being able to read and print anywhere, independent of OS, printer type, resolution, and the kind of computer etc. Because this includes a compressing function, it transfers document through a small size of file in internet or intranet. In addition, that is a file format has various advantages-sharing of information and transfering documents in on line or off line environment. In this paper, we developed electronic document system using PDF format. Electronic document system consists of filter, automatic indexing, special searching system and web server. The information used in this paper is database made using Zwon\`s DocuCom. The filter recognizes various kinds of document structure. And according to property of document, it produces ASCII output. In addition to processing various formats of document, the filter can extract keywords in documents of MS WORD, Excel, Powerpoint, PDF, CAD etc. This filter uses the structure of window printer drive and can extract the information for text, page, font type and size from relevant document. The automatic indexing recognizes the formatted tag of document form ASCII text produced by filter and extracts adequate keyword to structure and property of document. PDF electronic document systems proposed in this paper can be used in Internet, PC communication. Users can choose and read electronic documents by two ways. First, users can choose and read relevant books using PDF electronic document homepage. Second, users can use PDF integrated-search system. User can search after inputing keyword and choose reference field and type of data. But, now, PDF products of Adobe can\`t support the Korean character. If this problem is resolved, we thick that PDF applications system looks active. Although there is limited function in case of using Zwon DocuCom used in this study, we think that there isn\`t a great deal of difficulty in electronic document and building digital database.

  • PDF

A Method of Object Identification from Procedural Programs (절차적 프로그램으로부터의 객체 추출 방법론)

  • Jin, Yun-Suk;Ma, Pyeong-Su;Sin, Gyu-Sang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.10
    • /
    • pp.2693-2706
    • /
    • 1999
  • Reengineering to object-oriented system is needed to maintain the system and satisfy requirements of structure change. Target systems which should be reengineered to object-oriented system are difficult to change because these systems have no design document or their design document is inconsistent of source code. Using design document to identifying objects for these systems is improper. There are several researches which identify objects through procedural source code analysis. In this paper, we propose automatic object identification method based on clustering of VTFG(Variable-Type-Function Graph) which represents relations among variables, types, and functions. VTFG includes relations among variables, types, and functions that may be basis of objects, and weights of these relations. By clustering related variables, types, and functions using their weights, our method overcomes limit of existing researches which identify too big objects or objects excluding many functions. The method proposed in this paper minimizes user's interaction through automatic object identification and make it easy to reenginner procedural system to object-oriented system.

  • PDF