• Title/Summary/Keyword: printed font

Search Result 28, Processing Time 0.024 seconds

Construction of Printed Hangul Character Database PHD08 (한글 문자 데이터베이스 PHD08 구축)

  • Ham, Dae-Sung;Lee, Duk-Ryong;Jung, In-Suk;Oh, Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.11
    • /
    • pp.33-40
    • /
    • 2008
  • The application of OCR moves from traditional formatted documents to the web document and natural scene images. It is usual that the new applications use not only standard fonts of Myungjo and Godic but also various fonts. The conventional databases which have mainly been constructed with standard fonts have limitations in applying to the new applications. In this paper, we generate 243 image samples for each of 2350 Hangul character classes which differs in font size, quality, and resolution. Additionally each sample was varied according to binarization threshold and rotational transformation. Through this process 2187 samples were generated for each character class. Totally 5,139,450 samples constitutes the printed Hangul character database called the PHD08. In addition, we present the characteristics and recognition performance by an commercial OCR software.

A CHARACTER RECOGNITION SYSTEM BASED ON SYNTACTIC APPROACH (인쇄체 영문의 구문론적 인식)

  • Park, Dong-Choon;Park, Sung-Han
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1598-1601
    • /
    • 1987
  • This paper proposes a new set of topological features (primitives) for use with a syntactic recognizer for high-accuracy recognition of printed alphanumeric characters. The recognition is accomplished on nine character groups, where each group has different combinations of four feature points. A skeleton enhancement eliminating isolated points and smoothing irregular points is developed. The tree automata processed in parallel enables the realization of high-recognition speeds and font-type independent recognition. The proposed character recognition system is tested for alphanumeric character fonts of dot matrix printer and plotter using IBM-PC/XT.

  • PDF

Optical Font Recognition For Printed Korean Characters Using Serif Pattern of Strokes

  • Kim, Soo-Hyung;Kim, Sam-Soo;Kwag, Hee-Kue;Lee, Guee-Sang
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.916-919
    • /
    • 2002
  • This paper introduces the problem of typeface classification of Hangul characters and proposes features for typeface classification among Serif and Sans-serif classes. Serif classes have a small decorative stroke around the beginning of vertical strokes, while Sans-serif classes have no serif. Therefore, the serif part is first segmented from the vertical strokes, and the direction of the serif is computed as the feature for Hangul typeface identification. To evaluate the performance of the proposed system, we used 3,000 characters extracted from Korean documents - 1,500 from Serif fonts, other 1,500 from Sans-serif fonts.

  • PDF

High Speed Character Recognition by Multiprocessor System (멀티 프로세서 시스템에 의한 고속 문자인식)

  • 최동혁;류성원;최성남;김학수;이용균;박규태
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.2
    • /
    • pp.8-18
    • /
    • 1993
  • A multi-font, multi-size and high speed character recognition system is designed. The design principles are simpilcity of algorithm, adaptibility, learnability, hierachical data processing and attention by feed back. For the multi-size character recognition, the extracted character images are normalized. A hierachical classifier classifies the feature vectors. Feature is extracted by applying the directional receptive field after the directional dege filter processing. The hierachical classifier is consist of two pre-classifiers and one decision making classifier. The effect of two pre-classifiers is prediction to the final decision making classifier. With the pre-classifiers, the time to compute the distance of the final classifier is reduced. Recognition rate is 95% for the three documents printed in three kinds of fonts, total 1,700 characters. For high speed implemention, a multiprocessor system with the ring structure of four transputers is implemented, and the recognition speed of 30 characters per second is aquired.

  • PDF

A Study on Printed Hangeul Recognition with Dynamic Jaso Segmentation and Neural Network (동적자소분할과 신경망을 이용한 인쇄체 한글 문자인식기에 관한 연구)

  • 이판호;장희돈;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.11
    • /
    • pp.2133-2146
    • /
    • 1994
  • In this paper, we present a method for dynamic Jaso segmentation and Hangeul recognition using neural network. It uses the feature vector which is extracted from the mesh depending on the segmentation result. At first, each character is converted to 256 dimension feature vector by four direction contributivity and $8\times8$ mesh. And then, the character is classified into 6 class by neural network and is segmented into Jaso using the classification result the statistic vowel location information and the structural information. After Jaso segmentation, Hanguel recognition using neural network is performed. We experiment on four font of which three fonts are used for training the neural net and the rest is used of testing. Each font has the 2350 characters which are comprised in KS C 5601. The overall recognition rates for the training data and the testing data are 97,4% and 94&% respectively. This result shows the effectivness of proposed method.

  • PDF

Legibility evaluation of the safety and health information used in pesticides (농약 표시 글자 크기 가이드라인 설정을 위한 가독성 평가)

  • Lim, Chang-Wook;Hwang, Rae-Young;Song, Young-Woong
    • Journal of the Korea Safety Management & Science
    • /
    • v.13 no.3
    • /
    • pp.29-35
    • /
    • 2011
  • Safety and health related information for the proper use and handling of pesticides is usually printed on the surface of the pesticide products (bottle type or bag type) in the form of texts. But, the guidelines or standards for the appropriate presentation of the texts for the pesticide products are most vague or not practical. Thus, this study aimed to provide the preliminary guidelines for the text sizes based on the legibility experiments. Total twenty subjects from two age groups (young: n=10, old: n=10, five males and five females in each group) participated in the experiment. First, subjects read the text cards presented in the distance of 50cm from the eyes of the subjects. Eight different text card sets were prepared for different font type(thick gothic-type and fine gothic-type), thickness of font(plain and bold), and number of syllables (2 and 3 syllables). When subjects read the cards, the correctness of reading (correct or wrong) was recorded and the degree of discomfort (from 1: no discomfort at all to 4: can't read at all) was also evaluated for all the text sizes. Results showed that the character size should be 4 pt or larger for the young subjects to read at least one word correctly in all the text conditions. For the old subjects to read at least one word correctly, the character size should be five pt or larder. The average of the minimum character sizes for 100% correct answer is 6.1 pt for young subjects and 10.5 pt for old subjects, respectively.

The Font Recognition of Printed Hangul Documents (인쇄된 한글 문서의 폰트 인식)

  • Park, Moon-Ho;Shon, Young-Woo;Kim, Seok-Tae;Namkung, Jae-Chan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.8
    • /
    • pp.2017-2024
    • /
    • 1997
  • The main focus of this paper is the recognition of printed Hangul documents in terms of typeface, character size and character slope for IICS(Intelligent Image Communication System). The fixed-size blocks extracted from documents are analyzed in frequency domain for the typeface classification. The vertical pixel counts and projection profile of bounding box are used for the character size classification and the character slope classification, respectively. The MLP with variable hidden nodes and error back-propagation algorithm is used as typeface classifier, and Mahalanobis distance is used to classify the character size and slope. The experimental results demonstrated the usefulness of proposed system with the mean rate of 95.19% in typeface classification. 97.34% in character size classification, and 89.09% in character slope classification.

  • PDF

Character Segmentation on Printed Korean Document Images Using a Simplification of Projection Profiles (투영 프로파일의 간략화 방법을 이용한 인쇄체 한글 문서 영상에서의 문자 분할)

  • Park Sang-Cheol;Kim Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.89-96
    • /
    • 2006
  • In this paper, we propose two approaches for the character segmentation on Korean document images. One is an improved version of a projection profile-based algorithm. It involves estimating the number of characters, obtaining the split points and then searching for each character's boundary, and selecting the best segmentation result. The other is developed for low quality document images where adjacent characters are connected. In this case, parts of the projection profile are cut to resolve the connection between the characters. This is called ${\alpha}$-cut. Afterwards, the revised former segmentation procedure is conducted. The two approaches have been tested with 43,572 low-quality Korean word images punted in various font styles. The segmentation accuracies of the former and the latter are 91.81% and 99.57%, respectively. This result shows that the proposed algorithm using a ${\alpha}$-cut is effective for low-quality Korean document images.

A Study on Type Classification and Subpattern Extraction Using Structural Information of Radical in Printed Hanja (인쇄체 한자에서 Radical의 구조적 정보를 이용한 형식분류 및 부분패턴 추출에 관한 연구)

  • 김정한;조용주;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.3
    • /
    • pp.232-247
    • /
    • 1991
  • This paper proposes a new classification algorithm using characteristic and structural information of printed Hanja as preliminary stages of Hanja-character recognition. Hanja is difficult for not only recognition but classification as many character and complicated structure. In this paper, to solve thie problem, extracted common subpattern in classified pattern after processing type classification fot Hanja pattern. First, we extracted subpattern, after we process preprecessing about input of character pattern, extracting directional segment, labeling on 4-directional pattern and 12 type classified using structural information based on the subpattern existing region of character pattern. Though the experiment, this study obtained that classified rate of Hanja is 93.07% on 1800 character of educational Hanja and 90.12% on 4888 character of KS C5601 standard TRIGEM LBP Hanja font and saw that as extracting subpattern at classified data was this paper possibly applied to the recognition.

  • PDF

Precise Detection of Car License Plates by Locating Main Characters

  • Lee, Dae-Ho;Choi, Jin-Hyuk
    • Journal of the Optical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.376-382
    • /
    • 2010
  • We propose a novel method to precisely detect car license plates by locating main characters, which are printed with large font size. The regions of the main characters are directly detected without detecting the plate region boundaries, so that license regions can be detected more precisely than by other existing methods. To generate a binary image, multiple thresholds are applied, and segmented regions are selected from multiple binarized images by a criterion of size and compactness. We do not employ any character matching methods, so that many candidates for main character groups are detected; thus, we use a neural network to reject non-main character groups from the candidates. The relation of the character regions and the intensity statistics are used as the input to the neural network for classification. The detection performance has been investigated on real images captured under various illumination conditions for 1000 vehicles. 980 plates were correctly detected, and almost all non-detected plates were so stained that their characters could not be isolated for character recognition. In addition, the processing time is fast enough for a commercial automatic license plate recognition system. Therefore, the proposed method can be used for recognition systems with high performance and fast processing.