• Title/Summary/Keyword: Document Image Processing

Search Result 105, Processing Time 0.032 seconds

A Study on the Improvement of Retrieval Efficiency Based on the CRFMD (공통기술표현포맷에 기반한 다매체자료의 검색효율 향상에 관한 연구)

  • Park, Il-Jong;Jeong, Ki-Tai
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.5-21
    • /
    • 2006
  • In recent years, theories of image and sound analysis have been proposed to work with text retrieval systems and have progressed quickly with the rapid progress in data processing speeds. This study proposes a common representation format for multimedia documents (CRFMD) composed of both images and text to form a single data structure. It also shows that image classification of a given test set is dramatically improved when text features are encoded together with image features. CRFMD might be applicable to other areas of multimedia document retrieval and processing, such as medical image retrieval, World Wide Web searching, and museum collection retrieval.

A Novel Text to Image Conversion Method Using Word2Vec and Generative Adversarial Networks

  • LIU, XINRUI;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.401-403
    • /
    • 2019
  • In this paper, we propose a generative adversarial networks (GAN) based text-to-image generating method. In many natural language processing tasks, which word expressions are determined by their term frequency -inverse document frequency scores. Word2Vec is a type of neural network model that, in the case of an unlabeled corpus, produces a vector that expresses semantics for words in the corpus and an image is generated by GAN training according to the obtained vector. Thanks to the understanding of the word we can generate higher and more realistic images. Our GAN structure is based on deep convolution neural networks and pixel recurrent neural networks. Comparing the generated image with the real image, we get about 88% similarity on the Oxford-102 flowers dataset.

Skew Correction of Document Images using Edge (에지를 이용한 문서영상의 기울기 보정)

  • Ju, Jae-Hyon;Oh, Jeong-Su
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.7
    • /
    • pp.1487-1494
    • /
    • 2012
  • This paper proposes an algorithm detecting the skew of the degraded as well as the clear document images using edge and correcting it. The proposed algorithm detects edges in a character region selected by image complexity and generates projection histograms by projecting them to various directions. And then it detects the document skew by estimating the edge concentrations in the histograms and corrects the skewed document image. For the fast skew detection, the proposed algorithm uses downsampling and 3 step coarse-to-fine searching. In the skew detection of the clear and the degraded images, the maximum and the average detection errors in the proposed algorithm are about 50% of one in a conventional similar algorithm and the processing time is reduced to about 25%. In the non-uniform luminance images acquired by a mobile device, the conventional algorithm can't detect skews since it can't get valid binary images, while the proposed algorithm detect them with the average detection error of 0.1o or under.

Word Extraction from Table Regions in Document Images (문서 영상 내 테이블 영역에서의 단어 추출)

  • Jeong, Chang-Bu;Kim, Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.369-378
    • /
    • 2005
  • Document image is segmented and classified into text, picture, or table by a document layout analysis, and the words in table regions are significant for keyword spotting because they are more meaningful than the words in other regions. This paper proposes a method to extract words from table regions in document images. As word extraction from table regions is practically regarded extracting words from cell regions composing the table, it is necessary to extract the cell correctly. In the cell extraction module, table frame is extracted first by analyzing connected components, and then the intersection points are extracted from the table frame. We modify the false intersections using the correlation between the neighboring intersections, and extract the cells using the information of intersections. Text regions in the individual cells are located by using the connected components information that was obtained during the cell extraction module, and they are segmented into text lines by using projection profiles. Finally we divide the segmented lines into words using gap clustering and special symbol detection. The experiment performed on In table images that are extracted from Korean documents, and shows $99.16\%$ accuracy of word extraction.

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Guidelines for Cardiovascular Magnetic Resonance Imaging from the Korean Society of Cardiovascular Imaging (KOSCI) - Part 3: Perfusion, Delayed Enhancement, and T1- and T2 Mapping

  • Im, Dong Jin;Hong, Su Jin;Park, Eun-Ah;Kim, Eun Young;Jo, Yeseul;Kim, Jeong Jae;Park, Chul Hwan;Yong, Hwan Seok;Lee, Jae Wook;Hur, Jee Hye;Yang, Dong Hyun;Lee, Bae-Young
    • Investigative Magnetic Resonance Imaging
    • /
    • v.24 no.1
    • /
    • pp.1-20
    • /
    • 2020
  • This document is the third part of the guidelines for the interpretation and post-processing of cardiac magnetic resonance (CMR) studies. These consensus recommendations have been developed by a Consensus Committee of the Korean Society of Cardiovascular Imaging (KOSCI) to standardize the requirements for image interpretation and post-processing of CMR. This third part of the recommendations describes tissue characterization modules, including perfusion, late gadolinium enhancement, and T1- and T2 mapping. Additionally, this document provides guidance for visual and quantitative assessment, consisting of "What-to-See," "How-To," and common pitfalls for the analysis of each module. The Consensus Committee hopes that this document will contribute to the standardization of image interpretation and post-processing of CMR studies.

The development of CAI systems for an efficient education of image processing (효율적인 영상처리 교육을 위한 통합 환경 개발에 관한 연구)

  • 이정헌;안용학;채옥삼
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.127-135
    • /
    • 2004
  • With the wide-spread use of multimedia technology, the demand for the image processing engineer is increasing in various fields. But there are few engineers who can develop practical applications in the image processing area. To teach practical image processing techniques, we need an integrated education environment which can efficiently present the image processing theory and, at the same time, provide interactive experiments for the theory presented. In this paper, we propose an integrated education environment for the image processing, which is called MTES. It consists of the theory presentation systems and the experiment systems. The theory presentation systems support multimedia data, web document and Microsoft Powerpoint$^{TM}$ file. It is tightly integrated with the experiment systems which are developed based on the integrated image processing algorithm development system, called Hello-Vision.n.

Deriving TrueType Features for Letter Recognition in Word Images (워드이미지로부터 영문인식을 위한 트루타입 특성 추출)

  • SeongAh CHIN
    • Journal of the Korea Society for Simulation
    • /
    • v.11 no.3
    • /
    • pp.35-48
    • /
    • 2002
  • In the work presented here, we describe a method to extract TrueType features for supporting letter recognition. Even if variously existing document processing techniques have been challenged, almost few methods are capable of recognize a letter associated with its TrueType features supporting OCR free, which boost up fast processing time for image text retrieval. By reviewing the mechanism generating digital fonts and birth of TrueType, we realize that each TrueType is drawn by its contour of the glyph table. Hence, we are capable of deriving the segment with density for a letter with a specific TrueType, defined by the number of occurrence over a segment width. A certain number of occurrence appears frequently often due to the fixed segment width. We utilize letter recognition by comparing TrueType feature library of a letter with that from input word images. Experiments have been carried out to justify robustness of the proposed method showing acceptable results.

  • PDF

Segmentation and Contents Classification of Document Images Using Local Entropy and Texture-based PCA Algorithm (지역적 엔트로피와 텍스처의 주성분 분석을 이용한 문서영상의 분할 및 구성요소 분류)

  • Kim, Bo-Ram;Oh, Jun-Taek;Kim, Wook-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.377-384
    • /
    • 2009
  • A new algorithm in order to classify various contents in the image documents, such as text, figure, graph, table, etc. is proposed in this paper by classifying contents using texture-based PCA, and by segmenting document images using local entropy-based histogram. Local entropy and histogram made the binarization of image document not only robust to various transformation and noise, but also easy and less time-consuming. And texture-based PCA algorithm for each segmented region was taken notice of each content in the image documents having different texture information. Through this, it was not necessary to establish any pre-defined structural information, and advantages were found from the fact of fast and efficient classification. The result demonstrated that the proposed method had shown better performances of segmentation and classification for various images, and is also found superior to previous methods by its efficiency.

Speed-up of Document Image Binarization Method Based on Water Flow Model (Water flow model에 기반한 문서영상 이진화 방법의 속도 개선)

  • 오현화;김도훈;이재용;김두식;임길택;진성일
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.4
    • /
    • pp.75-86
    • /
    • 2004
  • This paper proposes a method to speed up the document image binarization using a water flow model. The proposed method extracts the region of interest (ROI) around characters from a document image and restricts pouring water onto a 3-dimensional terrain surface of an image only within the ROI. The amount of water to be filed into a local valley is determined automatically depending on its depth and slope. The proposed method accumulates weighted water not only on the locally lowest position but also on its neighbors. Therefore, a valley is filed enough with only one try of pouring water onto the terrain surface of the ROI. Finally, the depth of each pond is adaptively thresholded for robust character segmentation, because the depth of a pond formed at a valley varies widely according to the gray-level difference between characters and backgrounds. In our experiments on real document images, the Proposed method has attained good binarization performance as well as remarkably reduced processing time compared with that of the existing method based on a water flow model.