• Title/Summary/Keyword: 문자 분류

Search Result 349, Processing Time 0.024 seconds

한글 문자의 서체 분류

  • Kim, Sam-Su;Kim, Su-Hyeong
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.05a
    • /
    • pp.113-118
    • /
    • 2002
  • 본 논문에서는 한글 문자의 세리프(serif) 계열과 산세리프(sans-serif) 계열의 분류를 위한 특징을 제안한다. 한글의 서체는 세로획의 시작 부분에 장식 세리프(돌기)가 있는 세리프 계열과 그렇지 않은 산세리프 계열로 나눌 수 있다. 제안하는 한글 문자의 서체 분류 방법은 세리프 형태에서 추출한 특징을 이용하여 세리프 또는 산세리프 클래스로 분류하고, 각 클래스별로 적합한 특징 및 분류기를 학습하여 보다 다양한 서체를 인식하도록 계층적으로 설계한다. 제안한 특징의 유용성을 입증하기 위한 실험은 명조, 바탕, 궁서, 고딕, 돋움, 굴림 서체의 3,000개 낱자 영상에 적용하였다.

  • PDF

Detailed Recognition of Similar Characters Based on Optimum Linear Transform (최적선형변환에 의한 유사문자의 상세분류인식)

  • 김형원;김성원;양윤모
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.493-495
    • /
    • 2001
  • 본 논문에서는 문자 인식에서 두 단계의 식별과정을 통하여 인식률을 향상시키는 방법에 대하여 연구하였다. 한글 문자인식에서의 어려움은 인식대상 클래스가 많고 유사문자가 많은 반면, 여러 폰트의 글자를 하나의 글자를 하나의 클래스로 할 경우에는 그 문자의 분산이 더욱 커지게 되는 점이다. 따라서 본 연구에서는 문자의 분포를 고려하여 거리를 계산하는 Bayes에 의한 식별 함수를 1단계 인식과정에서 사용하여 1위 후보문자를 인식하였다. 2단계에서는 미리 준비된 1위 후부문자의 유사문자세트의 최적선형변환 공간에서 상세분류를 행하였다. 결과적으로 1단계의 Bayes거리반에 의한 인식률(91.1%)보다도, 또한 처음부터 모든 클래스에 대하여 최적선형변환에 의한 인식률(87.9%)보다 좋은 결과(92.9%)를 얻게되었다. 이로서 1단계의 대규모 문자세트에 대한 대분류에서는 문자의 분포를 고려하는 Bayes에 의한 인식이 유효하고, 2단계의 최적선형변환에 의한 인식은 소수의 유사문자들에 대한 변별력을 높이는데 유효함을 입증하였다.

  • PDF

$\emph{A Priori}$ and the Local Font Classification (연역적이고 국부적인 영문자의 폰트 분류법)

  • 정민철
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.3 no.4
    • /
    • pp.245-250
    • /
    • 2002
  • This paper presents a priori and the local font classification method. The font classification uses ascenders, descenders, and serifs extracted from a word image. The gradient features of those sub-images are extracted, and used as an input to a neural network classifier to produce font classification results. The font classification determines 2-font styles (upright or slant), 3-font groups (serif, sans serif, or typewriter), and 7-font names (PostScript fonts such as Avant Garde, Helvetica, Bookman, New Century Schoolbook, Palatino, Times, or Courier). The proposed a priori and local font classification method allows an OCR system consisting of various font-specific character segmentation tools and various mono-font character recognizers.

  • PDF

A Study on the Preprocessing for Manchu-Character Recognition (만주문자 인식을 위한 전처리 방법에 관한 연구)

  • Choi, Minseok;Lee, Choong-Ho
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.2
    • /
    • pp.90-94
    • /
    • 2013
  • Research for Manchu character digitalization is at an early stage. This paper proposes a preprocessing algorithm for Manchu character recognition. This algorithm improves the existing Hilditch thinning algorithm so that it corrects thinning error for Manchu characters. The existing algorithm separates the characters into the left-hand side and right-hand side, while our alogorithm uses the central point between the points that strokes exist when it classifies each of characters. The experimentation results show that this method is valid for thinning and classification of Manchu characters.

Front Classification using Back Propagation Algorithm (오류 역전파 알고리즘을 이용한 영문자의 폰트 분류 방법에 관한 연구)

  • Jung Minchul
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.65-77
    • /
    • 2004
  • This paper presents a priori and the local font classification method. The font classification uses ascenders, descenders, and serifs extracted from a word image. The gradient features of those sub-images are extracted, and used as an input to a neural network classifier to produce font classification results. The font classification determines 2 font styles (upright or slant), 3 font groups (serif sans-serif or typewriter), and 7-font names (Postscript fonts such as Avant Garde, Helvetica, Bookman, New Century Schoolbook, Palatine, Times, and Courier). The proposed a priori and local font classification method allows an OCR system consisting of various font-specific character segmentation tools and various mono-font character recognizers. Experiments have shown font classification accuracies reach high performance levels of about 95.4 percent even with severely touching characters. The technique developed for tile selected 7 fonts in this paper can be applied to any other fonts.

  • PDF

Hangul Character Recognition Using Fuzzy Reasoning:Hangul Character Type Classification by Maximum Run Length Projenction (퍼지추론을 이용한 한글 문자 인식:최대 길이 투영에 의한 한글 문자 유형 분류)

  • 이근수;최형일
    • Korean Journal of Cognitive Science
    • /
    • v.3 no.2
    • /
    • pp.249-270
    • /
    • 1992
  • The purpose of this paper is to classify the types of input characters,printed Hangul characters,using Maximum Run Length Projection(MRLP)that is used to extract features of input character.Because the number of Hangul characters is large and its structure is complex,there exists close similarities among characters.This paper,therefore,tried to increment the type classification rate using fuzzy resoning.The Maximum Run Length Projection is very immune to noise,and also useful to extracting the demanding information efficiently.In a test case with the most frequently use 917 printed Hangul characters,it achieved 98.58%correct classification rate.

Recognition of Printed Hangeul Characters Based on the Stable Structure Information and Neural Networks (안정된 구조정보와 신경망을 기반으로 한 인쇄체 한글 문자 인식)

  • 장희돈;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.11
    • /
    • pp.2276-2290
    • /
    • 1994
  • In this paper, we propose an algorithm for character recognition using the subdivided type and the stable structure information. The subdivided type of character is acquired from the stable structure information of character which is extracted from an input character. Firstly, the character is obtained from a scanner and classified into on of 6 types by using directional density vector. And then, the stable structure information is extracted from each character and the character is subdivided into on of 26 types. Finally, the classified character is recognized by using neural network which is inputted the directional density vector equivalent to JASO area or recognized direct. Aa a result of experiment with KS C 5601 2350 printed Hangeul characters, we obtain the recognition rate of 94%.

  • PDF

Hybrid Word-Character Neural Network Model for the Improvement of Document Classification (문서 분류의 개선을 위한 단어-문자 혼합 신경망 모델)

  • Hong, Daeyoung;Shim, Kyuseok
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1290-1295
    • /
    • 2017
  • Document classification, a task of classifying the category of each document based on text, is one of the fundamental areas for natural language processing. Document classification may be used in various fields such as topic classification and sentiment classification. Neural network models for document classification can be divided into two categories: word-level models and character-level models that treat words and characters as basic units respectively. In this study, we propose a neural network model that combines character-level and word-level models to improve performance of document classification. The proposed model extracts the feature vector of each word by combining information obtained from a word embedding matrix and information encoded by a character-level neural network. Based on feature vectors of words, the model classifies documents with a hierarchical structure wherein recurrent neural networks with attention mechanisms are used for both the word and the sentence levels. Experiments on real life datasets demonstrate effectiveness of our proposed model.

The Region Analysis of Document Images Based on One Dimensional Median Filter (1차원 메디안 필터 기반 문서영상 영역해석)

  • 박승호;장대근;황찬식
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.3
    • /
    • pp.194-202
    • /
    • 2003
  • To convert printed images into electronic ones automatically, it requires region analysis of document images and character recognition. In these, regional analysis segments document image into detailed regions and classifies thee regions into the types of text, picture, table and so on. But it is difficult to classify the text and the picture exactly, because the size, density and complexity of pixel distribution of some of these are similar. Thu, misclassification in region analysis is the main reason that makes automatic conversion difficult. In this paper, we propose region analysis method that segments document image into text and picture regions. The proposed method solves the referred problems using one dimensional median filter based method in text and picture classification. And the misclassification problems of boldface texts and picture regions like graphs or tables, caused by using median filtering, are solved by using of skin peeling filter and maximal text length. The performance, therefore, is better than previous methods containing commercial softwares.

Typographical Analyses and Classes of Characters and Words in Optical Character Recognition (문자 인식에서 단어 간의 활자 인쇄선 위치 분석과 클래스 분류)

  • Jung Minchul
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.337-342
    • /
    • 2005
  • This paper presents a typographical analyses and classes. Typographical analysis is an indispensable tool for machine-printed character recognition in English. This analysis is a preliminary step for character segmentation in OCR(Optical Character Recognition). This paper is divided into two parts. In the first part, word typographical classes from words are defined by the word typographical analysis. In the second part, character typographical classes from connected components are defined by the character typographical analysis. The character typographical classes are used in the character segmentation.