• Title/Summary/Keyword: document image binarization

Search Result 24, Processing Time 0.034 seconds

History Document Image Background Noise and Removal Methods

  • Ganchimeg, Ganbold
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.2
    • /
    • pp.11-24
    • /
    • 2015
  • It is common for archive libraries to provide public access to historical and ancient document image collections. It is common for such document images to require specialized processing in order to remove background noise and become more legible. Document images may be contaminated with noise during transmission, scanning or conversion to digital form. We can categorize noises by identifying their features and can search for similar patterns in a document image to choose appropriate methods for their removal. In this paper, we propose a hybrid binarization approach for improving the quality of old documents using a combination of global and local thresholding. This article also reviews noises that might appear in scanned document images and discusses some noise removal methods.

Distortion Corrected Black and White Document Image Generation Based on Camera (카메라기반의 왜곡이 보정된 흑백 문서 영상 생성)

  • Kim, Jin-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.11
    • /
    • pp.18-26
    • /
    • 2015
  • Geometric distortion and shadow effect due to capturing angle could be included in document copy images that are captured by a camera in stead of a scanner. In this paper, a clean black and white document image generation algorithm by distortion correction and shadow elimination based on a camera, is proposed. In order to correct geometric distortion such as straightening un-straight boundary lines occurred by camera lens radial distortion and eliminating outlying area included by camera direction, second derivative filter based document boundary detection method is developed. Black and white images have been generated by adaptive binarization method by eliminating shadow effect. Experimental results of the black and white document image generation algorithm by recovering geometrical distortion and eliminating shadow effect for the document images captured by smart phone camera, shows very good processing results.

Speed-up of Document Image Binarization Method Based on Water Flow Model (Water flow model에 기반한 문서영상 이진화 방법의 속도 개선)

  • 오현화;김도훈;이재용;김두식;임길택;진성일
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.4
    • /
    • pp.75-86
    • /
    • 2004
  • This paper proposes a method to speed up the document image binarization using a water flow model. The proposed method extracts the region of interest (ROI) around characters from a document image and restricts pouring water onto a 3-dimensional terrain surface of an image only within the ROI. The amount of water to be filed into a local valley is determined automatically depending on its depth and slope. The proposed method accumulates weighted water not only on the locally lowest position but also on its neighbors. Therefore, a valley is filed enough with only one try of pouring water onto the terrain surface of the ROI. Finally, the depth of each pond is adaptively thresholded for robust character segmentation, because the depth of a pond formed at a valley varies widely according to the gray-level difference between characters and backgrounds. In our experiments on real document images, the Proposed method has attained good binarization performance as well as remarkably reduced processing time compared with that of the existing method based on a water flow model.

Rectification of Document Image on Smartphone Using MSER-b Binarization (MSER-b 이진화 기법을 이용한 스마트폰 문서 이미지 보정 기법)

  • Yu, Young-Jung;Moon, Sang-Ho;Park, Seong-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.201-207
    • /
    • 2015
  • The smartphone with camera can easily generate an image instead of a scanner. However the document image through a smartphone can have distortions related rotation or perspective. In this paper, we proposed a method to generate the document image in that distortions are reduced from the captured document image through a smartphone. For this, the original document image through a smartphone is preprocessed using the MSER-b technique to reduce the light effect. Then, the text area contour is extracted using the characteristics of the document image. Lastly, rotation or perspective distortions are reduced using the extracted text area contour. For experiments, the proposed method is compared two other products. Through experiments, we show that the distortions within the captured document image through smartphone can be effectively reduced.

Document Image Binarization Using a Water Flow Model (Water Flow Model을 이용한 문서 영상의 이진화)

  • Kim, In-Gwon;Jeong, Dong-Uk;Song, Jeong-Hui;Park, Rae-Hong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.1
    • /
    • pp.19-32
    • /
    • 2001
  • This paper proposes a local adaptive thresholding method based on a water flow model, in which an image surface is considered as a 3-dimensional (3-D) terrain. To extract characters from backgrounds, we pour water onto the terrain surface. Water flows down to the lower regions of the terrain and fills valleys. Then, the amount of filled water is thresholded, in which the proposed thresholding method is applied to gray level document images consisting of characters and backgrounds. The proposed method based on a water flow model shows the property of locally adaptive thresholding. Computer simulation with synthetic and real document images shows that the proposed method yields effective adaptive thresholding results for binarization of document images.

  • PDF

A Method for Optimal Binarization using Bit-plane Pattern (비트평면 패턴을 이용한 최적 이진화 방법)

  • Kim, Ha-Sik;Kim, Kang;Cho, Kyung-Sik;Jeon, Jong-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.6 no.4
    • /
    • pp.1-5
    • /
    • 2001
  • A new approach for determining global threshold value for image binarization is proposed in this paper. In the proposed algorithm, bit-plane information which involve the shapes of original image is used for dividing image into two parts object and background, and then compared each average values. Optimal threshold value are selected in center of two averages. Proposed method is relatively simple but robust and achieved good results in continuous tone images and document image.

  • PDF

Adaptive Binarization using Integral Image (적분영상을 이용한 적응적 이진화)

  • Lee, Yeon-Kyung;Yoo, Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.109-110
    • /
    • 2012
  • In this paper, we propose an adaptive thresholding method to binarize two-dimensional barcode images. Adaptive thresholding methods are applied to document image binarization. Thus, they inappropriate to use in recognition of two-dimensional barcode images. To overcome the problem, we propose a new adaptive threshold method using the integral image. To show the effectiveness of our method, we compared our method with the well-known existing method in terms of visual quality and processing time. The experimental result indicates that the proposed method is superior to the existing method.

  • PDF

Segmentation and Contents Classification of Document Images Using Local Entropy and Texture-based PCA Algorithm (지역적 엔트로피와 텍스처의 주성분 분석을 이용한 문서영상의 분할 및 구성요소 분류)

  • Kim, Bo-Ram;Oh, Jun-Taek;Kim, Wook-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.377-384
    • /
    • 2009
  • A new algorithm in order to classify various contents in the image documents, such as text, figure, graph, table, etc. is proposed in this paper by classifying contents using texture-based PCA, and by segmenting document images using local entropy-based histogram. Local entropy and histogram made the binarization of image document not only robust to various transformation and noise, but also easy and less time-consuming. And texture-based PCA algorithm for each segmented region was taken notice of each content in the image documents having different texture information. Through this, it was not necessary to establish any pre-defined structural information, and advantages were found from the fact of fast and efficient classification. The result demonstrated that the proposed method had shown better performances of segmentation and classification for various images, and is also found superior to previous methods by its efficiency.

Line Edge-Based Type-Specific Corner Points Extraction for the Analysis of Table Form Document Structure (표 서식 문서의 구조 분석을 위한 선분 에지 기반의 유형별 꼭짓점 검출)

  • Jung, Jae-young
    • Journal of Digital Contents Society
    • /
    • v.15 no.2
    • /
    • pp.209-217
    • /
    • 2014
  • It is very important to classify a lot of table-form documents into the same type of classes or to extract information filled in the template automatically. For these, it is necessary to accurately analyze table-form structure. This paper proposes an algorithm to extract corner points based on line edge segments and to classify the type of junction from table-form images. The algorithm preprocesses image through binarization, skew correction, deletion of isolated small area of black color because that they are probably generated by noises.. And then, it processes detections of edge block, line edges from a edge block, corner points. The extracted corner points are classified as 9 types of junction based on the combination of horizontal/vertical line edge segments in a block. The proposed method is applied to the several unconstraint document images such as tax form, transaction receipt, ordinary document containing tables, etc. The experimental results show that the performance of point detection is over 99%. Considering that almost corner points make a correspondence pair in the table, the information of type of corner and width of line may be useful to analyse the structure of table-form document.

Text Line Segmentation of Handwritten Documents by Area Mapping

  • Boragule, Abhijeet;Lee, GueeSang
    • Smart Media Journal
    • /
    • v.4 no.3
    • /
    • pp.44-49
    • /
    • 2015
  • Text line segmentation is a preprocessing step in OCR, which can significantly influence the accuracy of document analysis applications. This paper proposes a novel methodology for the text line segmentation of handwritten documents. First, the average width of the connected components is used to form a 1-D Gaussian kernel and a smoothing operation is then applied to the input binary image. The adaptive binarization of the smoothed image forms the final text lines. In this work, the segmentation method involves two stages: firstly, the large connected components are labelled as a unique text line using text line area mapping. Secondly, the final refinement of the segmentation is performed using the Euclidean distance between the text line and small connected components. The group of uniquely labelled text candidates achieves promising segmentation results. The proposed approach works well on Korean and English language handwritten documents captured using a camera.