• Title/Summary/Keyword: Document Images

Search Result 181, Processing Time 0.027 seconds

A Keyword Matching for the Retrieval of Low-Quality Hangul Document Images

  • Na, In-Seop;Park, Sang-Cheol;Kim, Soo-Hyung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.47 no.1
    • /
    • pp.39-55
    • /
    • 2013
  • It is a difficult problem to use keyword retrieval for low-quality Korean document images because these include adjacent characters that are connected. In addition, images that are created from various fonts are likely to be distorted during acquisition. In this paper, we propose and test a keyword retrieval system, using a support vector machine (SVM) for the retrieval of low-quality Korean document images. We propose a keyword retrieval method using an SVM to discriminate the similarity between two word images. We demonstrated that the proposed keyword retrieval method is more effective than the accumulated Optical Character Recognition (OCR)-based searching method. Moreover, using the SVM is better than Bayesian decision or artificial neural network for determining the similarity of two images.

History Document Image Background Noise and Removal Methods

  • Ganchimeg, Ganbold
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.2
    • /
    • pp.11-24
    • /
    • 2015
  • It is common for archive libraries to provide public access to historical and ancient document image collections. It is common for such document images to require specialized processing in order to remove background noise and become more legible. Document images may be contaminated with noise during transmission, scanning or conversion to digital form. We can categorize noises by identifying their features and can search for similar patterns in a document image to choose appropriate methods for their removal. In this paper, we propose a hybrid binarization approach for improving the quality of old documents using a combination of global and local thresholding. This article also reviews noises that might appear in scanned document images and discusses some noise removal methods.

Keyword Spotting on Hangul Document Images Using Image-to-Image Matching (영상 대 영상 매칭을 이용한 한글 문서 영상에서의 단어 검색)

  • Park Sang Cheol;Son Hwa Jeong;Kim Soo Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.357-364
    • /
    • 2005
  • In this paper, we propose an accurate and fast keyword spotting system for searching user-specified keyword in Hangul document images by using two-level image-to-image matching. The system is composed of character segmentation, creating a query image, feature extraction, and matching procedure. Two different feature vectors are used in the matching procedure. An experiment using 1600 Hangul word images from 8 document images, downloaded from the website of Korea Information Science Society, demonstrates that the proposed system is superior to conventional image-based document retrieval systems.

A Block Classification and Rotation Angle Extraction for Document Image (문서 영상의 영역 분류와 회전각 검출)

  • Mo, Moon-Jung;Kim, Wook-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.509-516
    • /
    • 2002
  • This paper proposes an efficient algorithm which recognizes the mixed document image consisting of the images, texts, tables, and straight lines. This system is composed of three steps. The first step is the detection of rotation angle for complementing skewed images, the second is detection of erasing an unnecessary background region and last is the classification of each component included in document images. This algorithm performs preprocessing of detecting rotation angles and correcting documents based on the detected rotation angles in order to minimize the error rate by skewness of the documentation. We detected the rotation angie using only horizontal and vertical components in document images and minimized calculation time by erasing unnecessary background region in the detecting process of component of document. In the next step, we classify various components such as image, text, table and line area included in document images. we applied this method to various document images in order to evaluate the performance of document recognition system and show the successful experimental results.

A Service Strategy of Binary Document Images based on JBIG in Digital Library (전자도서관에서의 JBIG 기반 이전 문서영상 서비스 방안)

  • 한영미;김민환
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.1
    • /
    • pp.37-44
    • /
    • 1998
  • While the SGML(standard generalized markup language) tend to be used in multimedia document management systems, still binary document images are widely used in servicing the information of printed documents at digital libraries. But the printed documents are scanned in 200 dpi resolution and the scanned binary document images are compressed by the ITU-T T.6 method, so they have difficulties in representing them in good quality and compressing them very efficiently. In this paper, by considering quality of the binary document images and expandability and effectiveness of database of them, we show that the suitable scanning resolution of them is 600 dpi and the best compression method is the JBIG. A staged service strategy of them is also suggested to solve the difficulty caused from long decompression time of the JBIG by analyzing characteristics of retrieving the binary document images in monitor and printer. In experiments for several typical binary document images, high compression rate of the JBIG and effectiveness of the staged service strategy are verified.

  • PDF

Mongolian Traditional Stamp Recognition using Scalable kNN

  • Gantuya., P;Mungunshagai., B;Suvdaa., B
    • International journal of advanced smart convergence
    • /
    • v.4 no.2
    • /
    • pp.170-176
    • /
    • 2015
  • The stamp is one of the crucial information of traditional historical and cultural for nations. In this paper, we purpose to detect official stamps from scanned document and recognize the Mongolian traditional, historical stamps. Therefore we performed following steps: first, we detect official stamps from scanned document based on red-color segmentation and document standard. Then we collected 234 traditional stamp images with 6 classes and 100 official stamp images from scanned document images. Also we implemented the processing algorithms for noise removing, resize and reshape etc. Finally, we proposed a new scale invariant classification algorithm based on KNN (k-nearest neighbor). In the experimental result, our proposed a method had shown proper recognition rate.

Character Shape Distortion Correction of Camera Acquired Document Images (카메라 획득 문서영상에서의 글자모양 왜곡보정)

  • Jang Dae-Geun;Kim Eui-Jeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.4
    • /
    • pp.680-686
    • /
    • 2006
  • Document images captured by scanners have only skewing distortion. But camera captured document images have not only skew but also vignetting effect and geometric distortion. Vignetting effect, which makes the border areas to be darker than the center of the image, make it difficult to separate characters from the document images. But this effect has being decreased, as the lens manufacturing skill is developed. Geometric distortion, occurred by the mismatch of angle and center position between the document image and the camera, make the shape of characters to be distorted, so that the character recognition is more difficult than the case of using scanner. In this paper, we propose a method that can increase the performance of character recognition by correcting the geometric distortion of document images using a linear approximation which changes the quadrilateral region to the rectangle one. The proposed method also determine the quadrilateral transform region automatically, using the alignment of character lines and the skewed angles of characters located in the edges of each character line. Proposed method, therefore, can correct the geometric distortion without getting positional information from camera.

Distortion Corrected Black and White Document Image Generation Based on Camera (카메라기반의 왜곡이 보정된 흑백 문서 영상 생성)

  • Kim, Jin-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.11
    • /
    • pp.18-26
    • /
    • 2015
  • Geometric distortion and shadow effect due to capturing angle could be included in document copy images that are captured by a camera in stead of a scanner. In this paper, a clean black and white document image generation algorithm by distortion correction and shadow elimination based on a camera, is proposed. In order to correct geometric distortion such as straightening un-straight boundary lines occurred by camera lens radial distortion and eliminating outlying area included by camera direction, second derivative filter based document boundary detection method is developed. Black and white images have been generated by adaptive binarization method by eliminating shadow effect. Experimental results of the black and white document image generation algorithm by recovering geometrical distortion and eliminating shadow effect for the document images captured by smart phone camera, shows very good processing results.

A Watermarking for Text Document Images using Edge Direction Histograms (에지 방향 히스토그램을 이용한 텍스트 문서 영상의 워터마킹)

  • 김영원;오일석
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.2
    • /
    • pp.203-212
    • /
    • 2004
  • The watermarking is a method to achieve the copyright protection of multimedia contents. Among several media, the left documents show very peculiar properties: block/line/word patterning, clear separation between foreground and background areas. So algorithms specific to the text documents are required that meet those properties. This paper proposes a novel watermarking algorithm for the grayscale text document images. The algorithm inserts the watermark signals through the edge direction histograms. A concept of sub-image consistency is developed that the sub-images have similar shapes in terms of edge direction histograms. Using Korean, Chinese, and English document images, the concept is evaluated and proven to be valid over a wide range of document images. To insert watermark signals, the edge direction histogram is modified slightly. The experiments were performed on various document images and the algorithm was evaluated in terms of imperceptibility and robustness.

Noise Removal using Support Vector Regression in Noisy Document Images

  • Kim, Hee-Hoon;Kang, Seung-Hyo;Park, Jai-Hyun;Ha, Hyun-Ho;Lim, Dong-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.4
    • /
    • pp.669-680
    • /
    • 2012
  • Noise removal of document images is a necessary step during preprocessing to recognize characters effectively because it has influences greatly on processing speed and performance for character recognition. We have considered using the spatial filters such as traditional mean filters and Gaussian filters, and wavelet transformed based methods for noise deduction in natural images. However, these methods are not effective for the noise removal of document images. In this paper, we present noise removal of document images using support vector regression. The proposed approach consists of two steps which are SVR training step and SVR test step. We construct an optimal prediction model using grid search with cross-validation in SVR training step, and then apply it to noisy images to remove noises in test step. We evaluate our SVR based method both quantitatively and qualitatively for noise removal in Korean, English and Chinese character documents, and compare it to some existing methods. Experimental results indicate that the proposed method is more effective and can get satisfactory removal results.