• Title/Summary/Keyword: 형식 분류

Search Result 547, Processing Time 0.025 seconds

A Study on Machine Printed Character Recognition Based on Character Type Classification (문자형식 분류 기반의 인쇄체 문자인식에 관한 연구)

  • 임길택;김호연
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.5
    • /
    • pp.266-279
    • /
    • 2003
  • In this paper, we propose machine printed character recognition methods which utilize the character type information and divide the character clusters. The characters are subdivided into a total of seven types, of which six types are for Hangul according to the grapheme combination fashions and one type for English characters, numerals, and symbols. According to the character type, we separate input character image into several recognition units and recognize them by using the direction angle feature. The recognition for each character type is completed by combining recognition units which are recognized by neural networks respectively For combining a total of seven character recognizers, we implemented seven methods such as switching method, integrating method, and their several variants. As experimental results, we obtained 98.2% recognition rate of simple switching method, 90.54% of integrating one, and between 97.35% and 98.65% of five variants.

A FCA-based Classification Approach for Analysis of Interval Data (구간데이터분석을 위한 형식개념분석기반의 분류)

  • Hwang, Suk-Hyung;Kim, Eung-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.19-30
    • /
    • 2012
  • Based on the internet-based infrastructures such as various information devices, social network systems and cloud computing environments, distributed and sharable data are growing explosively. Recently, as a data analysis and mining technique for extracting, analyzing and classifying the inherent and useful knowledge and information, Formal Concept Analysis on binary or many-valued data has been successfully applied in many diverse fields. However, in formal concept analysis, there has been little research conducted on analyzing interval data whose attributes have some interval values. In this paper, we propose a new approach for classification of interval data based on the formal concept analysis. We present the development of a supporting tool(iFCA) that provides the proposed approach for the binarization of interval data table, concept extraction and construction of concept hierarchies. Finally, with some experiments over real-world data sets, we demonstrate that our approach provides some useful and effective ways for analyzing and mining interval data.

A Study on Type Classification and Recognition Using Structural Information in Character Pattern of HANGEUL Shape (한글 Shape 문자 Pattern에서의 구조적 정보를 이용한 형식분류와 인식 관한 연구)

  • 전종익;조용주;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.2
    • /
    • pp.180-195
    • /
    • 1991
  • In this paper, we studied on new method of recognition using structural information to recognize character pattern in orginal shape of Hangeul. First, for the purpose of knowing location of character in input image. it processed Making block. Second, after we investigated. whether vertical vowel exited or not in character image accordingly the center of gravity of Hangeul. each character was classified into Type of Hangeul by searching location and length for horizontal vowel and short pole. Last, we processed it by means of template matching which calculate Uclid's distance on each Jaso in accordance to type classified. This paper made an experiment on 2350 characters and obtained 98.3% classifing rate and 95.2% recognizing rate.

  • PDF

Extent and Intent Lattice on Formal Concept (형식개념의 외연과 내포격자)

  • Yon, yong-ho
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2019.05a
    • /
    • pp.387-388
    • /
    • 2019
  • 형식개념(Formal Concept)은 외연(extent)과 내포(intent)를 이용하여 어떤 대상에 대한 정의를 내리거나, 그 대상들을 분류하여 군집화하기 위한 논리적 도구로 사용되어왔다. 여기에서 외연이란 객체(Object)들의 집합이고, 내포는 그 객체들이 지니고 있는 속성(Attribute)들의 집합이다. 이러한 형식개념은 어떤 문제에 나타나는 다양한 데이터로부터 객체와 속성들을 추출하고 이로부터 개념(Concept)들의 계층구조(hierarchy)를 형성하여 데이터를 분석하는데 적용될 수 있다. 본 논문에서는 형식개념의 정의와 성질을 소개하고, 이를 일반화한 완비격자에서의 형식개념을 정의한다. 또한 이 형식개념에서의 외연과 내포격자에 대한 성질을 알아본다.

  • PDF

A Comparative Study on Feature Combination for MathML Formula Classification (MathML 수식 분류를 위한 자질 조합 비교 연구)

  • Kim, Shin-Il;Yang, Seon;Ko, Young-Joong
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.37-41
    • /
    • 2010
  • 본 논문에서는 Mathematical Markup Language(MathML) 형식으로 작성된 수학식 분류를 위해 필요한 자질과 성능 향상에 기여하는 자질 조합을 비교 평가한다. 이것은 MathML 형식의 수학식을 분석하기 위한 전처리 작업으로, 연산자의 모호성을 해소하기 위한 가장 기본적인 단계에 해당한다고 볼 수 있다. 실험에 사용되는 기본자질(Baseline)은 MathML 태그 정보와 연산자이고, 여기에 다른 자질들을 추가하며 가장 높은 분류 성능을 가지는 자질을 찾는 방식으로 진행하였다. 학습은 지지벡터기기(Support Vector Machine: SVM)를 사용하였고 분류하고자 하는 단원은 '수학의 정석' 책을 토대로 총 12개(집합, 명제, 미분, 적분 등)로 나누었다. 실험을 통해 MathML 문서 안에서 가장 유용한 자질이 '식별자&연산자 바이그램'인 것을 알 수 있었고, 여러 가지 자질들을 조합하여 수학식을 분류한 결과 92.5%의 성능으로 분류하는 것을 확인할 수 있었다.

  • PDF

Study on Automatic Classification System of News based on NewsML (NewsML 기반의 뉴스 자동 분류 시스템에 관한 연구)

  • Tak-Hee Lee;Gumwon Hong
    • Annual Conference of KIPS
    • /
    • 2008.11a
    • /
    • pp.619-622
    • /
    • 2008
  • 뉴스 분류 체계는 각각의 기사에 정치, 경제, 사회 등 가장 적합한 주제별로 분류하는 것으로 언론사별 분류 체계는 통일성이 없이 전혀 다르게 구성되어 사용하고 있다. 이로 인해 방대한 콘텐트를 통합하는데 많은 어려움이 있으며, 그만큼 시스템과 인력에 대해 중복 투자가 되고 있다. 이런 문제점을 개선하기 위해 국제 표준인 NewsML에 기반한 뉴스 분류에 대해 제안한다. NewsML은 XML 기반의 유연성과 확장성이 있는 구조적인 표준 형식으로 다양한 데이터 표현이 가능하여 자동 문서 범주화에 필요한 중요한 자질 선택이 가능하다. 본 논문에서는 NewsML 형식으로 되어 있는 뉴스와 그렇지 않은 뉴스를 구분하여 자동 분류에 대한 비교 실험을 한다. NewsML의 구조화된 정보를 활용한 실험이 뉴스의 제목과 본문만으로 실험한 결과보다 좋은 성능을 보여 주었으며, 그 중에서 자질 공간이 아주 큰 경우에 유용하고 문서 분류에 효과가 뛰어난 지지 벡터 기계 모델이 가장 좋은 성능을 보였다.

Analysis of Effects of Image Format on Detection Performance and Resource Usage in CNN-Based Malware Detection (CNN 기반 악성코드 탐지에서 이미지 형식이 탐지성능과 자원 사용에 미치는 영향 분석)

  • Seong-hyeon Byeon;Young-won Kim;Kwan-seob Ko;Soo-jin Lee
    • Convergence Security Journal
    • /
    • v.21 no.4
    • /
    • pp.69-75
    • /
    • 2021
  • Various image formats are being used when attempting to construct a malware detection model based on CNN. However, most previous studies emphasize only the detection or classification performance, and do not take into account the possible impact of image format on detection performance and resource usage. Therefore, in this paper, we analyze how the input image formats affect detection performance and resources usage when detecting android malware based on CNN. The dataset used in the experiment is the CICAndMal2017 Dataset. Subdataset extracted from the CICAndMal2017 Dataset were converted into images in four formats: BMP, JPG, PNG, and TIFF. We then trained our CNN model and measured malware detection performance and resource usage. As a result, there was no sifnificant difference between detection performance and the GPU/RAM usage, even if the image format changed. However, we found that the file size of the generated images varied by up to six times depending on the image format, and that significant differences occurred in the training time.

Support Vector Machines-based classification of video file fragments (서포트 벡터 머신 기반 비디오 조각파일 분류)

  • Kang, Hyun-Suk;Lee, Young-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.652-657
    • /
    • 2015
  • BitTorrent is an innovative protocol related to file-sharing and file-transferring, which allows users to receive pieces of files from multiple sharer on the Internet to make the pieces into complete files. In reality, however, free distribution of illegal or copyright related video data is counted for crime. Difficulty of regulation on the copyright of data on BitTorrent is caused by the fact that data is transferred with the pieces of files instead of the complete file formats. Therefore, the classification process of file formats of the digital contents should take precedence in order to restore digital contents from the pieces of files received from BitTorrent, and to check the violation of copyright. This study has suggested SVM classifier for the classification of digital files, which has the feature vector of histogram differential on the pieces of files. The suggested classifier has evaluated the performance with the division factor by applying the classifier to three different formats of video files.

A classification for the incomplete block designs according to the structure of multi-nested block circulant pattern matrix (다중순환형식행렬의 구조에 의한 불완비블럭 계획의 분류)

  • 배종성
    • The Korean Journal of Applied Statistics
    • /
    • v.2 no.1
    • /
    • pp.54-64
    • /
    • 1989
  • The paper by Kurkjian and Zelen(1963) introducted the Property A which related to a structural property of concordance matrix of the column incidence matrix. On the other hand, Paik(1985) showed the property of the concordance matrix, which has multinested block circulant pattern matrix, and this structural property was termed Property C by Paik(1985). This paper classifies the incomplete block designs according to the pattern of the concordence matrix which has multi-nested block circulant pattern. The purpose of this classification simplified the solution of reduced normal equation and plan of the design.