• Title/Summary/Keyword: Document Images

Search Result 181, Processing Time 0.027 seconds

Keyword Spotting on Hangul Document Images Using Character Feature Models (문자 별 특징 모델을 이용한 한글 문서 영상에서 키워드 검색)

  • Park, Sang-Cheol;Kim, Soo-Hyung;Choi, Deok-Jai
    • The KIPS Transactions:PartB
    • /
    • v.12B no.5 s.101
    • /
    • pp.521-526
    • /
    • 2005
  • In this Paper, we propose a keyword spotting system as an alternative to searching system for poor quality Korean document images and compare the Proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to remove the connectivity between adjacent characters and a character segmentation method by making the variance of character widths minimum. In the query creation step, feature vector for the query is constructed by a combination of a character model by typeface. In the matching step, word-to-word matching is applied base on a character-to-character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on the Korean document images, especially when the quality of documents is quite poor and point size is small.

Fast Skew Detection of Document Images by Extraction of Center Points of Blank Lines (공백행의 중심점 추출에 의한 고속 문서 기울기 검출)

  • Jeong, Jae-Yeong;Kim, Mun-Hyeon
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.11
    • /
    • pp.1342-1349
    • /
    • 1999
  • 본 논문에서는 문서 내의 인접한 두 행 사이에는 일정한 두께의 공백 행이 존재하며 그 공백 행의 기울기는 실제 문서의 기울어진 정도를 반영한다는 사실에 기반하여, 선형적으로 기울어진 문서 영상의 기울기 추정을 위한 고속의 알고리즘을 제안한다. 먼저, 간단한 모폴로지 연산(dilation)을 이용하여 문자행 영역과 공백행 영역을 분리한 후, 이를 일정 간격으로 수직 샘플링하여 수직선 상에 있는 모든 공백행의 중심점(행간점)을 찾는다. 동일한 공백 행 상에 있는 인접한 두 행간점 간에 기울기를 계산하고, 전체 영상으로부터 이들의 분포를 조사하여 최대 빈도를 가지는 기울기를 입력 문서의 기울기로 추정한다. 실험에서는 제안한 알고리즘을 필기체 및 인쇄체를 포함하는 다양한 형태의 가로쓰기 문서에 적용한 결과를 보인다.Abstract In this paper, we propose a fast algorithm to estimate the skew angle of linearly skewed document images. This paper is based on the fact that there is a blank line with uniform thickness between two adjacent text lines and the slope of the line is the same as that of the document. Firstly, we apply a dilation operation to the image to separate blank lines from text lines, and we detect center points of blank lines along the vertically sampled lines. Then we calculate the slope between neighboring center points in the same blank line. Calculated slopes for the entire image are accumulated on the histogram to display the distribution of them. Finally, the peak in the histogram is detected and estimated as the slope of the document image. In the experiments, we adopted a lot of images of various format with hand-printed or machine-printed document to verify our algorithm.

A Study on the Recognition of Mixed Documents Consisting of Texts and Graphic Images (텍스트와 그래픽으로 구성된 혼합문서 인식에 관한 연구)

  • 함영국;김인권;정홍규;박래홍;이창범;김상중;윤병남
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.7
    • /
    • pp.76-90
    • /
    • 1994
  • In this paper, an efficient algorithm is proposed which recognizes the mixed document consisting of the printed Korean/alphanumeric texts and graphic images. In the preprocessing step an input document is aligned if necessary by rotating it. We obtain the rotation angle using the Hough transform and align the input document horizontally. Then we separate graphic image parts from text parts by considering chain codes of connected components. We further separate each character using vertical and horizontal projections. In the recognition step Korean and alphanumeric characters are classified and each of them is recognized hierarchically using several features. In summary an efficient recognition algorithm for mixed documents is proposed and its performance is demonstrated via computer simulations.

  • PDF

A Hangul Document Image Retrieval System Using Rank-based Recognition (웨이브렛 특징과 순위 기반 인식을 이용한 한글 문서 영상 검색 시스템)

  • Lee Duk-Ryong;Kim Woo-Youn;Oh Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.2
    • /
    • pp.229-242
    • /
    • 2005
  • We constructed a full-text retrieval system for the scanned Hangul document images. The system consists of three parts; preprocessing, recognition, and retrieval components. The retrieval algorithm uses recognition results up to k-ranks. The algorithm is not only insensitive to the recognition errors, but also has the advantage of user-controllable recall and precision. For the objective performance evaluation, we used the scanned images of the Journal of Korea Information Science Society provided by KISTI. The system was shown to be practical through theevaluationofrecognitionandretrievalrates.

  • PDF

Skew Correction for Document Images Using Block Transformation (블록 변환을 이용한 문서 영상의 기울어짐 교정)

  • Gwak, Hui-Gyu;Kim, Su-Hyeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11
    • /
    • pp.3140-3149
    • /
    • 1999
  • Skew correction for document images can be using a rotational transformation of pixel coordinates. In this paper we propose a method which corrects the document skew, by an amount of $\theta$ degrees, using block information, where the block is defined as a rectangular area containing adjacent black pixels. Processing speed of the proposed method is faster than that of the method using pixel transformation, since the number of floating-point operations can be reduced significantly. In the proposed method, we rotate only the four corner points of each block, and then identify the pixels inside the block. Two methods for inside pixel identification are proposed; the first method finds two points intersecting the boundary of the rotated block in each row, and determines the pixels between the two intersection points as the inside pixel. The second method finds boundary points based on Bresenham's line drawing algorithm, using fixed-point operation, and fills the region surrounded by these boundaries as black pixels. We have measured the performance of the proposed method by experimenting it with 2,016 images of various English and Korean documents. We have also proven the superiority of our algorithm through performance comparison with respect to existing methods based on pixel transformation.

  • PDF

Document Layout Analysis Using Coarse/Fine Strategy (Coarse/fine 전략을 이용한 문서 구조 분석)

  • 박동열;곽희규;김수형
    • Proceedings of the IEEK Conference
    • /
    • 2000.06d
    • /
    • pp.198-201
    • /
    • 2000
  • We propose a method for analyzing the document structure. This method consists of two processes, segmentation and classification. The segmentation first divides a low resolution image, and then finely splits the original document image using projection profiles. The classification deterimines each segmented region as text, line, table or image. An experiment with 238 documents images shows that the segmentation accuracy is 99.1% and the classification accuracy is 97.3%.

  • PDF

A Study of Distorted Document Image Restoration using Structured Light (Structured Light를 이용한 왜곡된 문서 영상 복원에 관한 연구)

  • 곽규섭;채옥삼
    • Proceedings of the IEEK Conference
    • /
    • 2000.11d
    • /
    • pp.235-238
    • /
    • 2000
  • This paper describes the implementation of document image restoration system for the geometric distortion using structured light. To get accurate document images, the bounded book must be flattened by pushing down the book with a class plate. However, most of ancient documents are too fragile to be pushed. The proposed system restores the distorted character image due to geometric distortion.

  • PDF

Feature Extraction Method for the Character Recognition of the Low Resolution Document

  • Kim, Dae-Hak;Cheong, Hyoung-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.525-533
    • /
    • 2003
  • In this paper we introduce some existing preprocessing algorithm for character recognition and consider feature extraction method for the recognition of low resolution document. Image recognition of low resolution document including fax images can be frequently misclassified due to the blurring effect, slope effect, noise and so on. In order to overcome these difficulties in the character recognition we considered a mesh feature extraction and contour direction code feature. System for automatic character recognition were suggested.

  • PDF

A Study on the Characteristics and Images of Rainbow Colors For Fashion Design (패션 디자인을 위한 무지개 색의 특성과 이미지 고찰)

  • 김지언;김영인
    • Journal of the Korean Society of Costume
    • /
    • v.54 no.5
    • /
    • pp.125-138
    • /
    • 2004
  • This study has the aim of defining the special characteristics and images of the rainbow colored fashion by understanding the theoretical bases of rainbow colors and analyzing rainbow colored fashion images in historical materials. western and folk costumes. modern fashion design. Giving careful consideration to the rainbow colored fashion makes it possible to develope the innovative way of fashion design to satisfy the needs of color usual for designers and colorists. To obtain the purposes, document study and survey study have been executed. The results of this study are as follows. In document studies, the beginning of rainbow colored fashion went back in ancient Egypt. Also saikdong of korea, poncho of indians are the examples of the rainbow colored fashion. The rainbow colored fashion were put on a man of position in principle ceremony for ornaments. In survey studies, the clothing perception characteristics in rainbow colored fashion were analyzed. Main factors of perception characteristics In the rainbow colored fashion are 'closed form', 'whole', 'indeterminate', 'rounded', 'planar separation' The factors that affect the perception of rainbow colored fashion are 'closed form' and 'indeterminate' characteristics. And rainbow colored fashion images and clothing perception characteristics can be classified into four main images : Vigorous, Colorful/fairy, Fresh, Mysterious/brilliant. Therefore. this study is to systematize the characteristics and images of rainbow colors. Based on the results makes it possible to adapt rainbow colors to fashion design efficiently, for the suggested design elements and color palettes include basic three fashion design elements color. texture. form.

Design and Implementation of Digital Jikin using Smartphone Application

  • Hong, Daewon;Kang, Miju;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.18 no.5
    • /
    • pp.87-94
    • /
    • 2017
  • Due to the recent advances of IT industry, many companies and institutions have been used electronic documents rather than original paper copies. However, the characteristic of electronic document allows it to be readily damaged from proscribed copying, counterfeit, and falsification. These can cause the serious security problems for electronic documents. Conventional security methods for digital documents involve adding a separated image or marker, but these methods can reduce the readability of document. Therefore, we proposed a digital Jikin (Korean traditional stamp) which is normally used to identify the source or author of a document in asia. The proposed digital Jikin can preserve the readability of electronic document while protecting the document from proscribed copying, counterfeit, or falsification using image processing approach. In this paper, a digital Jikin application is designed and implemented under android platform and it converts the critical information of document onto the digital Jikin. The proposed digital Jikin contains important information in the boundary of Jikin not only about the author of documents or source, but also keywords, number of images, and many more. Therefore, the authenticity of document or whether the document has been altered or not by other person can be evaluated by the server. The proposed digital Jikin can be sent to a server through the wireless networks and can be stored using PHP and MySQL. We believe that the proposed method can offer the better and simple solution for strengthening the security of electronic document.