• Title/Summary/Keyword: 문자특징 추출

Search Result 252, Processing Time 0.023 seconds

A Study on Enhanced Binarization Method by Using Intensity Information (밝기 정보를 이용한 개선된 이진화 방법에 관한 연구)

  • 박경태;김정원;김광백
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.05b
    • /
    • pp.441-445
    • /
    • 2003
  • 영상의 이진화(image binarization)는 문자 인식, 영상 분석 등의 전처리 과정으로 다양한 분야에 적용되고 있다. 이진화는 임계치의 설정에 따라 작업 성능이 평가되며 대부분의 이진화 방법은 히스토그램을 사용하여 평균 밝기값이나 히스토그램의 골짜기(valley)를 임계치로 결정한다. 이와 같은 방법은 양봉의 특징을 보이지 않거나 특정 영상을 추출할 경우에는 적절한 임계치를 얻기 어렵다. 따라서 본 논문에서는 그레이스케일 영상에서 밝기 값을 여러 구간으로 분해하여 구간 밝기값의 평균값을 구하고, 각 구간의 평균값 사이 공간을, 각 구간의 양극과의 거리 비율로 나누어서 계산된 값을 임계치로 설정한다. 제안된 이진화 방법의 성능을 평가하기 위하여 다양한 영상에 적용한 결과, 기존의 이진화 방법들보다 효율적인 것을 확인하였다.

  • PDF

A Study on the Classification of Hand-written Korean Character Types using Hough Transform (Hough Transform을 이용한 한글 필기체 형식 분류에 관한 연구)

  • 구하성;고경화
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.10
    • /
    • pp.1991-2000
    • /
    • 1994
  • In this paper, an alagorithm with six types of classification is suggested for the recognition system of hand-written Korean characters. After thinning process and truncating process for noise redection. The input images are used generalized by $64\times64$ size. The six type classification is composed of preliminary and secondary classification process by using the learning algoritm of multi-layer perceptron. Subblock Hough transform is used as local feature and sampling Hough transform is used as global feature. Experiment is conducted for 1800 characters which is written 31 times per each type by 10 persons. The 90% recognition rate is resulted by the preliminary classification of detection the final consonant and by the secondary classification of detecting the vowels.

  • PDF

Extracting curved text lines using the chain composition and the expanded grouping method (체인 정합과 확장된 그룹핑 방법을 사용한 곡선형 텍스트 라인 추출)

  • Bai, Nguyen Noi;Yoon, Jin-Seon;Song, Young-Jun;Kim, Nam;Kim, Yong-Gi
    • The KIPS Transactions:PartB
    • /
    • v.14B no.6
    • /
    • pp.453-460
    • /
    • 2007
  • In this paper, we present a method to extract the text lines in poorly structured documents. The text lines may have different orientations, considerably curved shapes, and there are possibly a few wide inter-word gaps in a text line. Those text lines can be found in posters, blocks of addresses, artistic documents. Our method based on the traditional perceptual grouping but we develop novel solutions to overcome the problems of insufficient seed points and vaned orientations un a single line. In this paper, we assume that text lines contained tone connected components, in which each connected components is a set of black pixels within a letter, or some touched letters. In our scheme, the connected components closer than an iteratively incremented threshold will make together a chain. Elongate chains are identified as the seed chains of lines. Then the seed chains are extended to the left and the right regarding the local orientations. The local orientations will be reevaluated at each side of the chains when it is extended. By this process, all text lines are finally constructed. The proposed method is good for extraction of the considerably curved text lines from logos and slogans in our experiment; 98% and 94% for the straight-line extraction and the curved-line extraction, respectively.

A Study on the Pattern Recognition of Korean Characters by Syntactic Method (Syntactic법에 의한 한글의 패턴 인식에 관한 연구)

  • ;安居院猛
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.14 no.5
    • /
    • pp.15-21
    • /
    • 1977
  • The syntactic pattern recognition system of Korean characters is composed of three main functional parts; Preprocessing, Graph-representation, and Segmentation. In preprocessing routine, the input pattern has been thinned using the Hilditch's thinning algorithm. The graph-representation is the detection of a number of nodes over the input pattern and codification of branches between nodes by 8 directional components. Next, segmentation routine which has been implemented by top down nondeterministic parsing under the control of tree grammar identifies parts of the graph-represented Pattern as basic components of Korean characters. The authors have made sure that this system is effective for recognizing Korean characters through the recognition simulations by digital computer.

  • PDF

Analyzing Vocabulary Characteristics of Colloquial Style Corpus and Automatic Construction of Sentiment Lexicon (구어체 말뭉치의 어휘 사용 특징 분석 및 감정 어휘 사전의 자동 구축)

  • Kang, Seung-Shik;Won, HyeJin;Lee, Minhaeng
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.144-151
    • /
    • 2020
  • In a mobile environment, communication takes place via SMS text messages. Vocabularies used in SMS texts can be expected to use vocabularies of different classes from those used in general Korean literary style sentence. For example, in the case of a typical literary style, the sentence is correctly initiated or terminated and the sentence is well constructed, while SMS text corpus often replaces the component with an omission and a brief representation. To analyze these vocabulary usage characteristics, the existing colloquial style corpus and the literary style corpus are used. The experiment compares and analyzes the vocabulary use characteristics of the colloquial corpus SMS text corpus and the Naver Sentiment Movie Corpus, and the written Korean written corpus. For the comparison and analysis of vocabulary for each corpus, the part of speech tag adjective (VA) was used as a standard, and a distinctive collexeme analysis method was used to measure collostructural strength. As a result, it was confirmed that adjectives related to emotional expression such as'good-','sorry-', and'joy-' were preferred in the SMS text corpus, while adjectives related to evaluation expressions were preferred in the Naver Sentiment Movie Corpus. The word embedding was used to automatically construct a sentiment lexicon based on the extracted adjectives with high collostructural strength, and a total of 343,603 sentiment representations were automatically built.

Content-Based Retrieval System Design for Image and Video using Multiple Fetures (다중 특징을 이용한 영상 및 비디오 내용 기반 검색 시스템 설계)

  • Go, Byeong-Cheol;Lee, Hae-Seong;Byeon, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.12
    • /
    • pp.1519-1530
    • /
    • 1999
  • 오늘날 멀티미디어 정보의 양이 매우 빠른 속도로 증가함에 따라 멀티미디어 데이타베이스에 대한 효율적인 관리는 더욱 중요한 의미를 가지게 되었다. 게다가 영상과 같은 비 문자형태의 데이타에 대한 사용자들의 내용기반 검색욕구 증가로 인해 비디오 인덱싱에 대한 관심은 더욱 고조되고 있다. 따라서 본 논문에서는 우선적으로 분할된 샷 경계면에서 추출된 대표 프레임과 정지 영상 데이타베이스로부터 유사 영상과 유사 대표 프레임을 검색할 수 있는 환경을 제공한다. 우선적으로 영상에 의한 질의는 기존에 주로 사용되어온 색상 히스토그램방식을 탈피하여 본 논문에서 제안하는 CS와 GS방식을 이용하여 색상 및 방향성 정보도 고려하도록 설계하였다. 또한 얼굴에 의한 질의는 대표 프레임으로부터 얼굴 영역을 추출해 내고 얼굴의 경계선 값 및 쌍 직교 웨이블릿 변환에 의해 얻어진 2개의 특징값을 이용하여 유사 인물이 포함된 대표 프레임을 검색해 내도록 설계하였다. Abstract There is a rapid increase in the use of digital video information in recent years, it becomes more important to manage multimedia databases efficiently. There is a big concern about video indexing because users require content-based image retrieval. In this paper, we first propose query-by-image system environment which allows to retrieve similar images from the chosen representative frames or images from the image databases. This algorithm considers not only the discretized color histogram but also the proposed directional information called CS & GS method. Finally, we designe another query environment using query-by-face. In this system , user selects a people in the representative frame browser and then system extracts a face region from that frame. After that system retrieves similar representative frames using 2 features, edge information and biorthogonal wavelet transform.

Hierarchical Text Extraction and Localization on Images (이미지로부터 계층적 문자열 추출에 관한 연구)

  • Jun, Byoung-Min;Jun, Woogyoung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.1
    • /
    • pp.609-614
    • /
    • 2018
  • This study was conducted to investigate the effects of turmeric powder on jeung-pyun. Turmeric jeung-pyun containing 0%, 0.5%, 1%, 1.5%, and 2% turmeric powder was prepared and the moisture, pH, sugar, color, texture, DPPH and sensory properties of the samples were measured. Moisture contents of jeung-pyun were 51.26~51.99% and there were significant differences among the samples(p<0.001). The L-values were significantly decreased with increasing turmeric powder content. The b-value was low in the control and there were significant differences among the samples(p<0.05). Texture profile analysis showed that there were no significant differences among the groups in hardness, adhesiveness, springiness, cohesiveness, gumminess, and chewiness. The hardness was the lowest in the control group and increased with increasing turmeric powder content. The antioxidant activities as measured by DPPH increased with increasing turmeric powder content (p<0.001). In the sensory evaluation, 1% addition of turmeric powder showed the highest preference in terms of color, taste, flavor, texture and overall preference(p<0.001). As determined by this study, the addition of 1% turmeric powder was the most favorable method for making use of turmeric powder in the production of jueng-pyun.

System Implement to Identify Copyright Infringement Based on the Text Reference Point (텍스트 기준점 기반의 저작권 침해 판단 시스템 구현)

  • Choi, Kyung-Ung;Park, Soon-Cheol;Yang, Seung-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.77-84
    • /
    • 2015
  • Most of the existing methods make the index key with every 6 words in every sentence in a document in order to identify copyright infringement between two documents. However, these methods has the disadvantage to take a long time to inspect the copyright infringement because of the long indexing time for the large-scale document. In this paper, we propose a method to select the longest word (called a feature bock) as an index key in the predetermined-sized window which scans a document character by character. This method can be characterized by removing duplicate blocks in the process of scanning a document, dramatically reducing the number of the index keys. The system with this method can find the copyright infringement positions of two documents very accurately and quickly since relatively small number of blocks are compared.

Speech Recognition in the Pager System displaying Defined Sentences (문자출력 무선호출기를 위한 음성인식 시스템)

  • Park, Gyu-Bong;Park, Jeon-Gue;Suh, Sang-Weon;Hwang, Doo-Sung;Kim, Hyun-Bin;Han, Mun-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.158-162
    • /
    • 1996
  • 본 논문에서는 문자출력이 가능한 무선호출기에 음성인식 기술을 접목한, 특성화된 한 음성인식 시스템에 대하여 설명하고자 한다. 시스템 동작 과정은, 일단 호출자가 음성인식 서버와 접속하게 되면 서버는 호출자의 자연스런 입력음성을 인식, 그 결과를 문장 형태로 피호출자의 호출기 단말기에 출력시키는 방식으로 되어 있다. 본 시스템에서는 통계적 음성인식 기법을 도입하여, 각 단어를 연속 HMM으로 모델링하였다. 가우시안 혼합 확률밀도함수를 사용하는 각 모델은 전통적인 HMM 학습법들 중의 하나인 Baum-Welch 알고리듬에 의해 학습되고 인식시에는 이들에 비터비 빔 탐색을 적용하여 최선의 결과를 얻도록 한다. MFCC와 파워를 혼용한 26 차원 특징벡터를 각 프레임으로부터 추출하여, 최종적으로, 83 개의 도메인 어휘들 및 무음과 같은 특수어휘들에 대한 모델링을 완성하게 된다. 여기에 구문론적 기능과 의미론적 기능을 함께 수행하는 FSN을 결합시켜 자연발화음성에 대한 연속음성인식 시스템을 구성한다. 본문에서는 이상의 사항들 외에도 음성 데이터베이스, 레이블링 등과 갈이 시스템 성능과 직결되는 시스템의 외적 요소들에 대해 고찰하고, 시스템에 구현되어 있는 다양한 특성들에 대해 밝히며, 실험 결과 및 앞으로의 개선 방향 등에 대해 논의하기로 한다.

  • PDF

Text Detection and Recognition in Outdoor Korean Signboards for Mobile System Applications (모바일 시스템 응용을 위한 실외 한국어 간판 영상에서 텍스트 검출 및 인식)

  • Park, J.H.;Lee, G.S.;Kim, S.H.;Lee, M.H.;Toan, N.D.
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.44-51
    • /
    • 2009
  • Text understand in natural images has become an active research field in the past few decades. In this paper, we present an automatic recognition system in Korean signboards with a complex background. The proposed algorithm includes detection, binarization and extraction of text for the recognition of shop names. First, we utilize an elaborate detection algorithm to detect possible text region based on edge histogram of vertical and horizontal direction. And detected text region is segmented by clustering method. Second, the text is divided into individual characters based on connected components whose center of mass lie below the center line, which are recognized by using a minimum distance classifier. A shape-based statistical feature is adopted, which is adequate for Korean character recognition. The system has been implemented in a mobile phone and is demonstrated to show acceptable performance.