• Title/Summary/Keyword: 문자 분류

Search Result 349, Processing Time 0.024 seconds

A Method for Automatic Detection of Character Encoding of Multi Language Document File (다중 언어로 작성된 문서 파일에 적용된 문자 인코딩 자동 인식 기법)

  • Seo, Min Ji;Kim, Myung Ho
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.4
    • /
    • pp.170-177
    • /
    • 2016
  • Character encoding is a method for changing a document to a binary document file using the code table for storage in a computer. When people decode a binary document file in a computer to be read, they must know the code table applied to the file at the encoding stage in order to get the original document. Identifying the code table used for encoding the file is thus an essential part of decoding. In this paper, we propose a method for detecting the character code of the given binary document file automatically. The method uses many techniques to increase the detection rate, such as a character code range detection, escape character detection, character code characteristic detection, and commonly used word detection. The commonly used word detection method uses multiple word database, which means this method can achieve a much higher detection rate for multi-language files as compared with other methods. If the proportion of language is 20% less than in the document, the conventional method has about 50% encoding recognition. In the case of the proposed method, regardless of the proportion of language, there is up to 96% encoding recognition.

A Study on Optical Changes and Sequence Discrimination of Toner-printed Text and Writing Text (토너 출력문자와 필기구류 기재문자 간 광학적 변화와 선후관계에 관한 연구)

  • Lee, Ka Young;Yoon, Do-Young;Lee, Joong
    • Korean Chemical Engineering Research
    • /
    • v.55 no.1
    • /
    • pp.135-140
    • /
    • 2017
  • This paper is on a study for discrimination on relative sequence as a most actively discussed topic in forensic document fields. This paper describes the application of the visual spectral comparator and infinite focus microscope as observation methods for overlapping region of printing and writing lines. As a result, we could categorize overlapping region images and identify the sequence of printing and writing lines by various inks.

Recognition of Various Printed Hangul Images by using the Boundary Tracing Technique (경계선 기울기 방법을 이용한 다양한 인쇄체 한글의 인식)

  • 백승복;강순대;손영선
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.12a
    • /
    • pp.357-360
    • /
    • 2002
  • 본 논문에서는 CCD 흑백 카메라를 이용하여 입력되는 인쇄체 한글 이미지의 문자를 인식하여 편집 가능한 텍스트 문서로 변환하는 시스템을 구현하였다. 문자 인식에 있어서 잡음에 강한 경계선 기울기 방법을 이용함으로써 문자의 구조적 특성에 근거한 윤곽선 정보를 추출할 수 있었다. 이를 이용하여 각 문자 이미지의 수평 및 수직 모음을 인지하고 6가지 유형으로 분류한 후, 자소 단위로 분리하고 최대 길이투영을 사용하여 모음을 인식하였다 분리된 자음은 경계선이 변화되는 위상의 형태를 미리 저장된 표준패턴과 비교하여 인식하였다. 인식된 문자는 KS 한글 완성형 코드로 문서 편집기에 출력되어 사용자에 제공되는 시스템을 구현하였다.

Normalization of Clinical Medical Records by Disambiguating Abbreviations and Acronyms (약어와 두문자어의 모호성 해결을 통한 임상 의무기록의 정규화)

  • Inho Bae;Jin-Sang Kim;Yoon-Nyun Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.676-678
    • /
    • 2008
  • 임상 의무기록에 나타나는 많은 두문자어들은 기계적인 처리과정에서 의무기록의 모호성을 크게 증가시키기 때문에, 정보추출이나 텍스트 마이닝을 하기 전에 전처리 과정으로 의무기록이 정규화 되어야 한다. 본 연구에서는 임상 의무기록 중 하나인 퇴원요약지에 사용된 약어와 두문자어들의 모호성을 제거하기 위한 정규화 시스템을 설계하고 구현했다. 정규화를 위해 문맥정보를 이용하여 의무기록의 종류와 기록내 위치정보를 파악하였고 이를 이용하여 약어와 두문자어의 의미를 학습하고 분류하였다. 본 연구에서 구현한 정규화 시스템은 실험에서 6가지 두문자어들이 가지는 16가지 의미들에 대해 94.7%의 정확률을 얻었다.

Analysis of the Preference in Expression Style for the Library Weekly Poster (도서관 주간 포스터의 표현 방식에 대한 선호도 분석)

  • Lim, Seong-Kwan
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.2
    • /
    • pp.85-106
    • /
    • 2021
  • The purpose of this study is to analyze the form and color of the letter layout among 31 expression methods of official posters during the library week from 1964, for 57 years, and to propose the direction of poster production in the future by revealing what citizens prefer most by conducting a survey based on the results. As a result of analyzing the poster expression form for achieving the purpose of the study, the most common character layout was 'the top position of the character' in 13 out of 31 (41.9%), and in color, 'chromatic color' in 30 out of 31 (96.8%), and the temperature of color with 'cold color', 'fading color' and 'shrink color' in 18 out of 31 (58.1%), respectively. The results of the survey showed that the most preferred was 'top position of characters' for letter layout, and 'chromatic color' and 'cold color', 'advance color' and 'expansion color' were the most preferred for the color classification. Therefore, the character layout of the weekly poster for the library needs to be produced using 'the center position of the letter', and the colors need to be made using 'chromatic color', 'warm color', 'advance color' and 'expansion color'.

High Precision Character Recognition System using The Chaos Theory (카오스 이론을 이용한 고정도 문자 인식 시스템)

  • 손영우
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.6
    • /
    • pp.518-523
    • /
    • 2001
  • This paper proposes the new method which is adopted in extracting character features and recognizing characters using fractal dimension of the Chaos theory which highly recolonizes a minute difference with strange attractor created from Henon system. This paper implements a high precision character recognition system. firstly, it gets features of mesh, projection and cross distance feature from character images. And their feature is converted into data of time series. Then using modified Henon system suggested in this paper, each characters attractor about standard Korean Character, KSC 5601 is reconstructed. Secondly, in order to analyze the Chaotic degree of each characters attractor, it gets last features of character image after calculating box-counting Dimension, Natural Measure, Information Bit, Information Dimension which are meant fractal dimension. An experimental result shows 97.49% character classification rates for 2350 Korean characters using proposed method in this paper.

  • PDF

A Syllable Kernel based Sentiment Classification for Movie Reviews (음절 커널 기반 영화평 감성 분류)

  • Kim, Sang-Do;Park, Seong-Bae;Park, Se-Young;Lee, Sang-Jo;Kim, Kweon-Yang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.2
    • /
    • pp.202-207
    • /
    • 2010
  • In this paper, we present an automatic sentiment classification method for on-line movie reviews that do not contain explicit sentiment rating scores. For the sentiment polarity classification, positive or negative, we use a Support Vector Machine classifier based on syllable kernel that is an extended model of string kernel. We give some experimental results which show that proposed syllable kernel model can be effectively used in sentiment classification tasks for on-line movie reviews that usually contain a lot of grammatical errors such as spacing or spelling errors.

Structure-Adaptive Self-Organizing Neural Network : Application to Hangul Character Recognition (구조적응 자기조직화 신경망 : 한글 문자인식에의 적용)

  • Lee, Kyoung-Mi;Cho, Sung-Bae;Lee, Yill-Byung
    • Annual Conference on Human and Language Technology
    • /
    • 1995.10a
    • /
    • pp.137-142
    • /
    • 1995
  • 코호넨의 SOFM(Self-Organizing Feature Map)온 빠른 검증 학습이 가능하여 다층 퍼셉트론의 단점을 보완할 수 있는 패턴분류기로 부각되고 있다. 그러나 기본적으로 고정된 크기와 구조의 네트워크를 사용하기 때문에 실재 문제에 적용하기가 쉽지 않다는 문제가 있다. 본 논문에서는 패턴에 대한 사전 정보없이 복잡한 패턴공간을 적응적으로 분할하기 위해 구조적응되는 자기조직화 신경망을 소개하고 이를 인쇄체 한글 문자의 인식에 적용한 결과를 보여준다. 여기에서 제안하는 신경망은 SOFM의 각 셀이 좀더 자세한 SOFM으로 확장될 수 있도록하며, 확률분포가 0인 셀을 제거함으로써 패턴 공간에 보다 근사한 분류를 가능하게 한다. 실제로 이러한 방식이 한글과 같은 복잡한 분류 문제에서 어떻게 작동하는지 설명하고, 한글 완성형 2350자에 대해 실험한 결과를 보여준다.

  • PDF

Vehicle Information Recognition and Electronic Toll Collection System with Detection of Vehicle feature Information in the Rear-Side of Vehicle (차량후면부 차량특징정보 검출을 통한 차량정보인식 및 자동과금시스템)

  • 이응주
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.1
    • /
    • pp.35-43
    • /
    • 2004
  • In this paper, we proposed a vehicle recognition and electronic toll collection system with detection and classification of vehicle identification mark and emblem as well as recognition of vehicle license plate to unman toll fee collection system or incoming/outcoming vehicles to an institution. In the proposed algorithm, we first process pre-processing step such as noise reduction and thinning from the rear side input image of vehicle and detect vehicle mark, emblem and license plate region using intensity variation informations, template masking and labeling operation. And then, we classify the detected vehicle features regions into vehicle mark and emblem as well as recognize characters and numbers of vehicle license plate using hybrid and seven segment pattern vector. To show the efficiency of the proposed algorithm, we tested it on real vehicle images of implemented vehicle recognition system in highway toll gate and found that the proposed method shows good feature detection/classification performance regardless of irregular environment conditions as well as noise, size, and location of vehicles. And also, the proposed algorithm may be utilized for catching criminal vehicles, unmanned toll collection system, and unmanned checking incoming/outcoming vehicles to an institution.

  • PDF