• Title/Summary/Keyword: character recognition

Search Result 992, Processing Time 0.028 seconds

A Development of Unicode-based Multi-lingual Namecard Recognizer (Unicode 기반 다국어 명함인식기 개발)

  • Jang, Dong-Hyeub;Lee, Jae-Hong
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.117-122
    • /
    • 2009
  • We developed a multi-lingual namecard recognizer for building up a global client management systems. At first, we created the Unicode-based character image database for character recognition and learning of multi languages, and applied many color image processing techniques to get more correct data for namecard images which were acquired by various input devices. And by applying multi-layer perceptron neural network, individual character recognition applied for language types, and post-processing utilizing keyword databases made for individual languages, we increased a recognition rate for multi-lingual namecards.

Handwritten Korean Amounts Recognition in Bank Slips using Rule Information (규칙 정보를 이용한 은행 전표 상의 필기 한글 금액 인식)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Kim, Eun-Jin;Lee, Yill-Byung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2400-2410
    • /
    • 2000
  • Many researches on recognition of Korean characters have been undertaken. But while the majority are done on Korean character recognition, tasks for developing document recognition system have seldom been challenged. In this paper, I designed a recognizer of Korean courtesy amounts to improve error correction in recognized character string. From the very first step of Korean character recognition, we face the enormous scale of data. We have 2350 characters in Korean. Almost the previous researches tried to recognize about 1000 frequently-used characters, but the recognition rates show under 80%. Therefore using these kinds of recognizers is not efficient, so we designed a statistical multiple recognizer which recognize 16 Korean characters used in courtesy amounts. By using multiple recognizer, we can prevent an increase of errors. For the Postprocessor of Korean courtesy amounts, we use the properties of Korean character strings. There are syntactic rules in character strings of Korean courtesy amounts. By using this property, we can correct errors in Korean courtesy amounts. This kind of error correction is restricted only to the Korean characters representing the unit of the amounts. The first candidate of Korean character recognizer show !!i.49% of recognition rate and up to the fourth candidate show 99.72%. For Korean character string which is postprocessed, recognizer of Korean courtesy amounts show 96.42% of reliability. In this paper, we suggest a method to improve the reliability of Korean courtesy amounts recognition by using the Korean character recognizer which recognize limited numbers of characters and the postprocessor which correct the errors in Korean character strings.

  • PDF

Developing an On-line Handwritten Word Recognition System Using Stroke Information and Post-processing Techniques (영문 대문자의 획 정보와 후처리를 이용한 온라인 필기 단어 인식기 구현)

  • 윤인구;김우생
    • Proceedings of the IEEK Conference
    • /
    • 2000.06c
    • /
    • pp.19-22
    • /
    • 2000
  • This paper presents new on-line handwritten algorithm for continuous alphabet uppercase characters. The algorithm is based on the idea that alphabet uppercase character consists of at most 4 strokes. It tries to determine the maximum output for a recognition result among outputs of four recognizers which have the capacity to discriminate the character using from 1 through 4 stroke information. The recognition module has 4 neural network based recognizers, which can recognize from 1 through 4 stroke character. We also use specialized post-processing techniques for improving the recognition performance. Trained on 440 input data and choosing 390 uppercase words for a recognition test we reached a 92% recognition rate.

  • PDF

Design and Implementation of Personal Information Identification and Masking System Based on Image Recognition (이미지 인식 기반 향상된 개인정보 식별 및 마스킹 시스템 설계 및 구현)

  • Park, Seok-Cheon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.1-8
    • /
    • 2017
  • Recently, with the development of ICT technology such as cloud and mobile, image utilization through social networks is increasing rapidly. These images contain personal information, and personal information leakage accidents may occur. As a result, studies are underway to recognize and mask personal information in images. However, optical character recognition, which recognizes personal information in images, varies greatly depending on brightness, contrast, and distortion, and Korean recognition is insufficient. Therefore, in this paper, we design and implement a personal information identification and masking system based on image recognition through deep learning application using CNN algorithm based on optical character recognition method. Also, the proposed system and optical character recognition compares and evaluates the recognition rate of personal information on the same image and measures the face recognition rate of the proposed system. Test results show that the recognition rate of personal information in the proposed system is 32.7% higher than that of optical character recognition and the face recognition rate is 86.6%.

Recognition of Printed Hangeul Characters Based on the Stable Structure Information and Neural Networks (안정된 구조정보와 신경망을 기반으로 한 인쇄체 한글 문자 인식)

  • 장희돈;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.11
    • /
    • pp.2276-2290
    • /
    • 1994
  • In this paper, we propose an algorithm for character recognition using the subdivided type and the stable structure information. The subdivided type of character is acquired from the stable structure information of character which is extracted from an input character. Firstly, the character is obtained from a scanner and classified into on of 6 types by using directional density vector. And then, the stable structure information is extracted from each character and the character is subdivided into on of 26 types. Finally, the classified character is recognized by using neural network which is inputted the directional density vector equivalent to JASO area or recognized direct. Aa a result of experiment with KS C 5601 2350 printed Hangeul characters, we obtain the recognition rate of 94%.

  • PDF

An Implementation Method of the Character Recognizer for the Sorting Rate Improvement of an Automatic Postal Envelope Sorting Machine (우편물 자동구분기의 구분율 향상을 위한 문자인식기의 구현 방법)

  • Lim, Kil-Taek;Jeong, Seon-Hwa;Jang, Seung-Ick;Kim, Ho-Yon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.15-24
    • /
    • 2007
  • The recognition of postal address images is indispensable for the automatic sorting of postal envelopes. The process of the address image recognition is composed of three steps-address image preprocessing, character recognition, address interpretation. The extracted character images from the preprocessing step are forwarded to the character recognition step, in which multiple candidate characters with reliability scores are obtained for each character image extracted. aracters with reliability scores are obtained for each character image extracted. Utilizing those character candidates with scores, we obtain the final valid address for the input envelope image through the address interpretation step. The envelope sorting rate depends on the performance of all three steps, among which character recognition step could be said to be very important. The good character recognizer would be the one which could produce valid candidates with very reliable scores to help the address interpretation step go easy. In this paper, we propose the method of generating character candidates with reliable recognition scores. We utilize the existing MLP(multilayered perceptrons) neural network of the address recognition system in the current automatic postal envelope sorters, as the classifier for the each image from the preprocessing step. The MLP is well known to be one of the best classifiers in terms of processing speed and recognition rate. The false alarm problem, however, might be occurred in recognition results, which made the address interpretation hard. To make address interpretation easy and improve the envelope sorting rate, we propose promising methods to reestimate the recognition score (confidence) of the existing MLP classifier: the generation method of the statistical recognition properties of the classifier and the method of the combination of the MLP and the subspace classifier which roles as a reestimator of the confidence. To confirm the superiority of the proposed method, we have used the character images of the real postal envelopes from the sorters in the post office. The experimental results show that the proposed method produces high reliability in terms of error and rejection for individual characters and non-characters.

  • PDF

A Study on the Fractal Attractor Creation and Analysis of the Printed Korean Characters

  • Shon, Young-Woo
    • Journal of information and communication convergence engineering
    • /
    • v.1 no.1
    • /
    • pp.53-57
    • /
    • 2003
  • Chaos theory is a study researching the irregular, unpredictable behavior of deterministic and non-linear dynamical system. The interpretation using Chaos makes us evaluate characteristic existing in status space of system by tine series, so that the extraction of Chaos characteristic understanding and those characteristics enables us to do high precision interpretation. Therefore, This paper propose the new method which is adopted in extracting character features and recognizing characters using the Chaos Theory. Firstly, it gets features of mesh feature, projection feature and cross distance feature from input character images. And their feature is converted into time series data. Then using the modified Henon system suggested in this paper, it gets last features of character image after calculating Box-counting dimension, Natural Measure, information bit and information dimension which are meant fractal dimension. Finally, character recognition is performed by statistically finding out the each information bit showing the minimum difference against the normalized pattern database. An experimental result shows 99% character classification rates for 2,350 Korean characters (Hangul) using proposed method in this paper.

Oversampling-Based Ensemble Learning Methods for Imbalanced Data (불균형 데이터 처리를 위한 과표본화 기반 앙상블 학습 기법)

  • Kim, Kyung-Min;Jang, Ha-Young;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.10
    • /
    • pp.549-554
    • /
    • 2014
  • Handwritten character recognition data is usually imbalanced because it is collected from the natural language sentences written by different writers. The imbalanced data can cause seriously negative effect on the performance of most of machine learning algorithms. But this problem is typically ignored in handwritten character recognition, because it is considered that most of difficulties in handwritten character recognition is caused by the high variance in data set and similar shapes between characters. We propose the oversampling-based ensemble learning methods to solve imbalanced data problem in handwritten character recognition and to improve the recognition accuracy. Also we show that proposed method achieved improvements in recognition accuracy of minor classes as well as overall recognition accuracy empirically.

Arabic Words Extraction and Character Recognition from Picturesque Image Macros with Enhanced VGG-16 based Model Functionality Using Neural Networks

  • Ayed Ahmad Hamdan Al-Radaideh;Mohd Shafry bin Mohd Rahim;Wad Ghaban;Majdi Bsoul;Shahid Kamal;Naveed Abbas
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1807-1822
    • /
    • 2023
  • Innovation and rapid increased functionality in user friendly smartphones has encouraged shutterbugs to have picturesque image macros while in work environment or during travel. Formal signboards are placed with marketing objectives and are enriched with text for attracting people. Extracting and recognition of the text from natural images is an emerging research issue and needs consideration. When compared to conventional optical character recognition (OCR), the complex background, implicit noise, lighting, and orientation of these scenic text photos make this problem more difficult. Arabic language text scene extraction and recognition adds a number of complications and difficulties. The method described in this paper uses a two-phase methodology to extract Arabic text and word boundaries awareness from scenic images with varying text orientations. The first stage uses a convolution autoencoder, and the second uses Arabic Character Segmentation (ACS), which is followed by traditional two-layer neural networks for recognition. This study presents the way that how can an Arabic training and synthetic dataset be created for exemplify the superimposed text in different scene images. For this purpose a dataset of size 10K of cropped images has been created in the detection phase wherein Arabic text was found and 127k Arabic character dataset for the recognition phase. The phase-1 labels were generated from an Arabic corpus of quotes and sentences, which consists of 15kquotes and sentences. This study ensures that Arabic Word Awareness Region Detection (AWARD) approach with high flexibility in identifying complex Arabic text scene images, such as texts that are arbitrarily oriented, curved, or deformed, is used to detect these texts. Our research after experimentations shows that the system has a 91.8% word segmentation accuracy and a 94.2% character recognition accuracy. We believe in the future that the researchers will excel in the field of image processing while treating text images to improve or reduce noise by processing scene images in any language by enhancing the functionality of VGG-16 based model using Neural Networks.