• Title/Summary/Keyword: Intelligent document processing

Search Result 44, Processing Time 0.022 seconds

A Study on Intelligent Document Processing Management using Unstructured Data (비정형 데이터를 활용한 지능형 문서 처리 관리에 관한 연구)

  • Kyoung Hoon Park;Kwang-Kyu Seo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.2
    • /
    • pp.71-75
    • /
    • 2024
  • This research focuses on processing unstructured data efficiently, containing various formulas in document processing and management regarding the terms and rules of domestic insurance documents using text mining techniques. Through parsing and compilation technology, document context, content, constants, and variables are automatically separated, and errors are verified in order of the document and logic to improve document accuracy accordingly. Through document debugging technology, errors in the document are identified in real time. Furthermore, it is necessary to predict the changes that intelligent document processing will bring to document management work, in particular, the impact on documents and utilization tasks that are double managed due to various formulas and prepare necessary capabilities in the future.

  • PDF

A Study on the Introduction of Intelligent Document Processing and Change of Record Management (지능형 문서처리 도입과 기록관리 변화에 관한 연구)

  • Ryu, Hanjo;Lee, Kyungnam;Hwang, Jinhyun;Yim, Jinhee
    • The Korean Journal of Archival Studies
    • /
    • no.68
    • /
    • pp.41-72
    • /
    • 2021
  • In order to analyze big data, documents should be converted to a open standard format to increase machine readability. It also need natural language processing tools. This study focused on the background of intelligent document processing and the status of research in the public sector, and predicted the changes in work that intelligent document processing would bring. This study noted the changes that intelligent document processing would bring to the archival work, and also considered changes in the role of archivist and their required competencies. Changes in archival work could be anticipated across a wide range of Records Management work and Archives Management work. In particular, it was expected to have a significant impact on the automation of repetitive archival tasks or the description and utilization of records. This study proposed the need to prepare new archival work procedures, methods, and necessary competencies in response to these change in archival work.

Document Summarization via Convex-Concave Programming

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.293-298
    • /
    • 2016
  • Document summarization is an important task in various areas where the goal is to select a few the most descriptive sentences from a given document as a succinct summary. Even without training data of human labeled summaries, there has been several interesting existing work in the literature that yields reasonable performance. In this paper, within the same unsupervised learning setup, we propose a more principled learning framework for the document summarization task. Specifically we formulate an optimization problem that expresses the requirements of both faithful preservation of the document contents and the summary length constraint. We circumvent the difficult integer programming originating from binary sentence selection via continuous relaxation and the low entropy penalization. We also suggest an efficient convex-concave optimization solver algorithm that guarantees to improve the original objective at every iteration. For several document datasets, we demonstrate that the proposed learning algorithm significantly outperforms the existing approaches.

A Knowledge Service Using Automatic Document Sharing based on Intelligent OMDR (지능형 OMDR 기반의 자동 문서 공유 에이전트를 이용한 지식서비스)

  • Su-Kyoung Kim;Kee-Hong Ahn
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.747-750
    • /
    • 2008
  • 본 연구는 온톨로지, 자연어 처리, 메타데이터 등의 시맨틱 웹 기반 기술들을 이용하여 시맨틱 웹 응용을 위한 전체적인 기술 적용과 그의 활용에 목적을 두고 있다. 이를 위해 OWL을 기반으로 조직이나 기관의 지식 주제별 도메인 온톨로지와, 기존 워드넷(WordNet)이나 더브린 코어 메타데이터(Dublin Core Meta Data)와 조직에 정의된 데이터베이스의 스키마를 MDR로 구축하여 상호 연결하여 온톨로지가 갖는 지능적 추론과 규칙 서비스와 표준화된 메타데이터의 결합 방법을 제공한다. 이는 기존에 온톨로지와 메타데이터의 재활용과 연결(Alignment)에 있어 연구적으로 높은 가치가 있다. 그리고 조직의 사용자가 문서를 작성할 때 문서의 내용에 대해 자연어 처리 기술과 온톨로지의 기술을 이용해 적합한 용어나 메타데이터를 자동으로 제공하여 작성된 문서의 공유와 재사용성을 높이고, 작성된 문서를 XML 형식으로 구성되는 XML 기반 지능 문서 데이터베이스(XMB Based Intelligent Document Database)에 저장하여 유사한 문서를 작성하거나 사용할 필요가 있는 사용자에게 문서 등록과 검색 에이전트(Document Registry and Retrieval Agent)를 통해 이러한 제공하여 문서 지식의 사유화를 최소화 하고, 유사 문서의 재작성과 또는 특정 문서의 작성에 필요한 시간이나 경비를 줄이게 된다. 또한 웹상이나 PDA 같은 개인 휴대장치를 통해서도 서 등록과 검색 에이전트를 통해 문서를 검색하고 사용할 수 있게 한다면 언제 어디서나 해당 서비스를 활용하는 유비쿼터스와 시맨틱 웹의 실질적 응용을 거둘 수도 있으리라 사료된다.

Active Documents: Programs by Form Designers (능동문서: 서식설계자의 프로그램)

  • Nam, Chul-Ki;Bae, Jae-Hak;Yoo, Hae-Young
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.599-610
    • /
    • 2003
  • The Web plays an important role as information source and most Web applications are document-centric. A document implies an intention of its own designer, which can be utilized actively in automation of business processes. Through an understanding of an intrinsic nature of a document function, we can see a document as an executable computer program in a special case. For this approach, we propose an active document model that is composed of form, knowledge base, rules, and queries. For reusability and interoperability of a document, each component of the proposed model is uniformly represented in XML. The proposed active document not only plays a passive role in providing user interfaces, but also is a document that a machine can infer and process with reading a procedure of document processing and business rules intended by document designers. Through this approach, document can interact with machines and can cooperate with other applications. For applicability of our active document, we show a case study for the processing of purchase orders in a B2B e-Commerce system. This paper is expected to provide the framework of accelerating the development of intelligent applications through our approach regards form document as a computer program. In short, the proposed active document contains knowledge representation and processing method, consequently our document will play an important role in providing a concept of document of pursuing in Semantic Web.

Document Structure Understanding on Subjects Registration Table

  • Ito, Yuichi;Ohno, Masanaga;Tsuruoka, Shinji;Yoshikawa, Tomohiro;Tsuyoshi, Shinogi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.571-574
    • /
    • 2003
  • This research is aimed to automate the generating process of the database from paper based table forms like this work. The registration table has so complicate table structures, ana in this research we used the registration tables as an example of general table structure understanding. We propose a table structure understanding system for some table types, and it has some steps. The first step is that the document images on paper are read from the image scanner. The second step is that a document image segments into some tables. In the third step, the character strings is extracted using image processing technology and the property of the character strings is determined. And the structured database is generated automatically. The proposed system consists of two systems. "Master document generation system" is used for the table form definition, and it doesn′t include the handwritten characters. "Structure analysis system for complete d table" is used for the written form, and it analyzes the table form filled in the handwritten character. We implemented the system using MS Visual C++ on Windows, and it can get the correct extraction rate 98% among 51 registration tables written by the different students.

  • PDF

A Study on the Document Length Normalization of Extended Vector Model Using the Information of Location (위치 정보를 이용한 확장 벡터 모델의 문서 길의 정규화에 관한 연구)

  • Kim, Kwang-Young;Seo, Jerry;Lee, Min-Ho;Joo, Won-Kyun;Jeong, Chang-Hoo;You, Beom-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05c
    • /
    • pp.1623-1626
    • /
    • 2003
  • 인터넷의 발달과 인터넷 이용자수의 급격한 증가로 정보 검색 시스템의 필요성이 커지고 있다. 또한 대용량의 문서에서 사용자가 원하는 정보를 정확하게 찾기가 점점 어려워지고 있다. 현재 대부분의 검색 시스템들은 문서 길이에 대한 정규화를 처리하고 있다. 현재 문서 길이 정보도 검색 시스템의 검색성능에 기여를 하고 있다. 일반적으로 TREC이나 HANTEC2.0을 이용한 검색 성능 평가를 했을 때문서 길의 정규화를 하지 않는 것보다 한 것이 우수한 성능을 보여 주고 있다. 본 논문에서는 KISTAL2000을 이용하여 위치 정보를 사용하여 문서 길의 정규화 방법에 제시하고 이에 대한 실험하였다.

  • PDF

Methods of Classification and Character Recognition for Table Items through Deep Learning (딥러닝을 통한 문서 내 표 항목 분류 및 인식 방법)

  • Lee, Dong-Seok;Kwon, Soon-Kak
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.5
    • /
    • pp.651-658
    • /
    • 2021
  • In this paper, we propose methods for character recognition and classification for table items through deep learning. First, table areas are detected in a document image through CNN. After that, table areas are separated by separators such as vertical lines. The text in document is recognized through a neural network combined with CNN and RNN. To correct errors in the character recognition, multiple candidates for the recognized result are provided for a sentence which has low recognition accuracy.

The Character Area Extraction and the Character Segmentation on the Color Document (칼라 문서에서 문자 영역 추출믹 문자분리)

  • 김의정
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.9 no.4
    • /
    • pp.444-450
    • /
    • 1999
  • This paper deals with several methods: the clustering method that uses k-means algorithm to abstract the area of characters on the image document and the distance function that suits for the HIS coordinate system to cluster the image. For the prepossessing step to recognize this, or the method of characters segmentate, the algorithm to abstract a discrete character is also proposed, using the linking picture element. This algorithm provides the feature that separates any character such as the touching or overlapped character. The methods of projecting and tracking the edge have so far been used to segment them. However, with the new method proposed here, the picture element extracts a discrete character with only one-time projection after abstracting the character string. it is possible to pull out it. dividing the area into the character and the rest (non-character). This has great significance in terms of processing color documents, not the simple binary image, and already received verification that it is more advanced than the previous document processing system.

  • PDF

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF