• Title/Summary/Keyword: Character Feature Extraction

Search Result 119, Processing Time 0.021 seconds

Printed Hangul Recognition with Adaptive Hierarchical Structures Depending on 6-Types (6-유형 별로 적응적 계층 구조를 갖는 인쇄 한글 인식)

  • Ham, Dae-Sung;Lee, Duk-Ryong;Choi, Kyung-Ung;Oh, Il-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.1
    • /
    • pp.10-18
    • /
    • 2010
  • Due to a large number of classes in Hangul character recognition, it is usual to use the six-type preclassification stage. After the preclassification, the first consonent, vowel, and last consonent can be classified separately. Though each of three components has a few of classes, classification errors occurs often due to shape similarity such as 'ㅔ' and 'ㅖ'. So this paper proposes a hierarchical recognition method which adopts multi-stage tree structures for each of 6-types. In addition, to reduce the interference among three components, the method uses the recognition results of first consonents and vowel as features of vowel classifier. The recognition accuracy for the test set of PHD08 database was 98.96%.

Detection of Intersection Points of Handwritten Hangul Strokes using Run-length (런 길이를 이용한 필기체 한글 자획의 교점 검출)

  • Jung, Min-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.5
    • /
    • pp.887-894
    • /
    • 2006
  • This paper proposes a new method that detects the intersection points of handwritten Hangul strokes using run-length. The method firstly finds the strokes' width of handwritten Hangul characters using both horizontal and vertical run-lengths, secondly extracts horizontal and vertical strokes of a character utilizing the strokes' width, and finally detects the intersection points of the strokes exploiting horizontal and vertical strokes. The analysis of both the horizontal and the vertical strokes doesn't use the strokes' angles but both the strokes' width and the changes of the run-lengths. The intersection points of the strokes become the candidated parts for phoneme segmentation, which is one of main techniques for off-line handwritten Hangul recognition. The segmented strokes represent the feature for handwritten Hangul recognition.

  • PDF

A Comparative Study on OCR using Super-Resolution for Small Fonts

  • Cho, Wooyeong;Kwon, Juwon;Kwon, Soonchu;Yoo, Jisang
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.95-101
    • /
    • 2019
  • Recently, there have been many issues related to text recognition using Tesseract. One of these issues is that the text recognition accuracy is significantly lower for smaller fonts. Tesseract extracts text by creating an outline with direction in the image. By searching the Tesseract database, template matching with characters with similar feature points is used to select the character with the lowest error. Because of the poor text extraction, the recognition accuracy is lowerd. In this paper, we compared text recognition accuracy after applying various super-resolution methods to smaller text images and experimented with how the recognition accuracy varies for various image size. In order to recognize small Korean text images, we have used super-resolution algorithms based on deep learning models such as SRCNN, ESRCNN, DSRCNN, and DCSCN. The dataset for training and testing consisted of Korean-based scanned images. The images was resized from 0.5 times to 0.8 times with 12pt font size. The experiment was performed on x0.5 resized images, and the experimental result showed that DCSCN super-resolution is the most efficient method to reduce precision error rate by 7.8%, and reduce the recall error rate by 8.4%. The experimental results have demonstrated that the accuracy of text recognition for smaller Korean fonts can be improved by adding super-resolution methods to the OCR preprocessing module.

Development of a Ship's Logbook Data Extraction Model Using OCR Program (OCR 프로그램을 활용한 선박 항해일지 데이터 추출 모델 개발)

  • Dain Lee;Sung-Cheol Kim;Ik-Hyun Youn
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.1
    • /
    • pp.97-107
    • /
    • 2024
  • Despite the rapid advancement in image recognition technology, achieving perfect digitization of tabular documents and handwritten documents still challenges. The purpose of this study is to improve the accuracy of digitizing the logbook by correcting errors by utilizing associated rules considered during logbook entries. Through this, it is expected to enhance the accuracy and reliability of data extracted from logbook through OCR programs. This model is to improve the accuracy of digitizing the logbook of the training ship "Saenuri" at the Mokpo Maritime University by correcting errors identified after Optical Character Recognition (OCR) program recognition. The model identified and corrected errors by utilizing associated rules considered during logbook entries. To evaluate the effect of model, the data before and after correction were divided by features, and comparisons were made between the same sailing number and the same feature. Using this model, approximately 10.6% of errors out of the total estimated error rate of about 11.8% were identified, and 56 out of 123 errors were corrected. A limitation of this study is that it only focuses on information from Dist.Run to Stand Course sections of the logbook, which contain navigational information. Future research will aim to correct more information from the logbook, including weather information, to overcome this limitation.

Spam-Mail Filtering System Using Weighted Bayesian Classifier (가중치가 부여된 베이지안 분류자를 이용한 스팸 메일 필터링 시스템)

  • 김현준;정재은;조근식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.1092-1100
    • /
    • 2004
  • An E-mails have regarded as one of the most popular methods for exchanging information because of easy usage and low cost. Meanwhile, exponentially growing unwanted mails in user's mailbox have been raised as main problem. Recognizing this issue, Korean government established a law in order to prevent e-mail abuse. In this paper we suggest hybrid spam mail filtering system using weighted Bayesian classifier which is extended from naive Bayesian classifier by adding the concept of preprocessing and intelligent agents. This system can classify spam mails automatically by using training data without manual definition of message rules. Particularly, we improved filtering efficiency by imposing weight on some character by feature extraction from spam mails. Finally, we show efficiency comparison among four cases - naive Bayesian, weighting on e-mail header, weighting on HTML tags, weighting on hyperlinks and combining all of four cases. As compared with naive Bayesian classifier, the proposed system obtained 5.7% decreased precision, while the recall and F-measure of this system increased by 33.3% and 31.2%, respectively.

Classification of Handwritten and Machine-printed Korean Address Image based on Connected Component Analysis (연결요소 분석에 기반한 인쇄체 한글 주소와 필기체 한글 주소의 구분)

  • 장승익;정선화;임길택;남윤석
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.10
    • /
    • pp.904-911
    • /
    • 2003
  • In this paper, we propose an effective method for the distinction between machine-printed and handwritten Korean address images. It is important to know whether an input image is handwritten or machine-printed, because methods for handwritten image are quite different from those of machine-printed image in such applications as address reading, form processing, FAX routing, and so on. Our method consists of three blocks: valid connected components grouping, feature extraction, and classification. Features related to width and position of groups of valid connected components are used for the classification based on a neural network. The experiment done with live Korean address images has demonstrated the superiority of the proposed method. The correct classification rate for 3,147 testing images was about 98.85%.

Comparisons of Recognition Rates for the Off-line Handwritten Hangul using Learning Codes based on Neural Network (신경망 학습 코드에 따른 오프라인 필기체 한글 인식률 비교)

  • Kim, Mi-Young;Cho, Yong-Beom
    • Journal of IKEEE
    • /
    • v.2 no.1 s.2
    • /
    • pp.150-159
    • /
    • 1998
  • This paper described the recognition of the Off-line handwritten Hangul based on neural network using a feature extraction method. Features of Hangul can be extracted by a $5{\times}5$ window method which is the modified $3{\times}3$ mask method. These features are coded to binary patterns in order to use neural network's inputs efficiently. Hangul character is recognized by the consonant, the vertical vowel, and the horizontal vowel, separately. In order to verify the recognition rate, three different coding methods were used for neural networks. Three methods were the fixed-code method, the learned-code I method, and the learned-code II method. The result was shown that the learned-code II method was the best among three methods. The result of the learned-code II method was shown 100% recognition rate for the vertical vowel, 100% for the horizontal vowel, and 98.33% for the learned consonants and 93.75% for the new consonants.

  • PDF

Text Detection and Recognition in Outdoor Korean Signboards for Mobile System Applications (모바일 시스템 응용을 위한 실외 한국어 간판 영상에서 텍스트 검출 및 인식)

  • Park, J.H.;Lee, G.S.;Kim, S.H.;Lee, M.H.;Toan, N.D.
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.44-51
    • /
    • 2009
  • Text understand in natural images has become an active research field in the past few decades. In this paper, we present an automatic recognition system in Korean signboards with a complex background. The proposed algorithm includes detection, binarization and extraction of text for the recognition of shop names. First, we utilize an elaborate detection algorithm to detect possible text region based on edge histogram of vertical and horizontal direction. And detected text region is segmented by clustering method. Second, the text is divided into individual characters based on connected components whose center of mass lie below the center line, which are recognized by using a minimum distance classifier. A shape-based statistical feature is adopted, which is adequate for Korean character recognition. The system has been implemented in a mobile phone and is demonstrated to show acceptable performance.

Scanning Determination & Observation Features by Sex shown in the Process of Acquiring Visual Information - With the Object of Subway Station Hall Space - (시각정보획득과정에 나타난 주사판정과 성별 주시특성 - 지하철 홀 공간을 대상으로 -)

  • Kim, Jong-Ha;Choi, Gae-Young
    • Korean Institute of Interior Design Journal
    • /
    • v.23 no.6
    • /
    • pp.115-124
    • /
    • 2014
  • This study has carried out scanning tests in order to figure out the features of scanning search by sex of space users, with the result of which the validity of data has been estimated. In this research, the scanning patterns were set up for verifying the typology of scanning paths and then the reason for determining scanning paths and the validity of estimation method were reviewed. Since the observation features depends on sex, the analysis of visual activities for acquiring any information in a space will reveal the intention and purpose of space users. The findings by analyzing the features of scanning pattern by sex which were found at the determination of scanning patterns can be defined as the followings. First, for estimating the process of space-information search, the movement distance at each point of continuative-observation data from the angle of eye-movement has been extracted, on the ground of which the fixation and movement of eye have been defined for the establishment of scanning-cut characteristics. Second, the scanning times were estimated for the extraction of effective observation data that would be used for comparative analysis, which showed that men had more data (3,398.2/64.4%) than women (2,998.2/55.6%). This enables the acknowledgment that the scanning cut of men was relatively less, which indicates that men will acquire more information on space than women in the process of observing any space. Third, men's scanning times (58.0 times/2.02 seconds) were less than those of women (71.9 times/1.39 seconds) while the scanning time of the former was longer than that of the latter, which shows the feature that it takes longer for men than women in scanning while the scanning times of the former is less than those of the latter. Fourth, the observation features can be determined that the combination of this result with the predominance character by sex for a general viewpoint to be employed indicates that while men employ mixed-scanning for observation activities to acquire space-information spending for longer time, women, by concentrated-scanning, focus on a single point for shorter time or stay at one location for a considerably long time for space-information acquirement.