• Title/Summary/Keyword: Handwritten Text

Search Result 40, Processing Time 0.026 seconds

An Efficient Slant Correction for Handwritten Hangul Strings using Structural Properties (한글필기체의 구조적 특징을 이용한 효율적 기울기 보정)

  • 유대근;김경환
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.93-102
    • /
    • 2003
  • A slant correction method for handwritten Korean strings based on analysis of stroke distribution, which effectively reflects structural properties of Korean characters, is presented in this paper. The method aims to deal with typical problems which have been frequently observed in slant correction of handwritten Korean strings with conventional approaches developed for English/European languages. Extracted strokes from a line of text image are classified into two clusters by applying the K-means clustering. Gaussian modeling is applied to each of the clusters and the slant angle is estimated from the model which represents the vertical strokes. Experimental results support the effectiveness of the proposed method. For the performance comparison 1,300 handwritten address string images were used, and the results show that the proposed method has more superior performance than other conventional approaches.

A Unicode based Deep Handwritten Character Recognition model for Telugu to English Language Translation

  • BV Subba Rao;J. Nageswara Rao;Bandi Vamsi;Venkata Nagaraju Thatha;Katta Subba Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.101-112
    • /
    • 2024
  • Telugu language is considered as fourth most used language in India especially in the regions of Andhra Pradesh, Telangana, Karnataka etc. In international recognized countries also, Telugu is widely growing spoken language. This language comprises of different dependent and independent vowels, consonants and digits. In this aspect, the enhancement of Telugu Handwritten Character Recognition (HCR) has not been propagated. HCR is a neural network technique of converting a documented image to edited text one which can be used for many other applications. This reduces time and effort without starting over from the beginning every time. In this work, a Unicode based Handwritten Character Recognition(U-HCR) is developed for translating the handwritten Telugu characters into English language. With the use of Centre of Gravity (CG) in our model we can easily divide a compound character into individual character with the help of Unicode values. For training this model, we have used both online and offline Telugu character datasets. To extract the features in the scanned image we used convolutional neural network along with Machine Learning classifiers like Random Forest and Support Vector Machine. Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMS-P) and Adaptative Moment Estimation (ADAM)optimizers are used in this work to enhance the performance of U-HCR and to reduce the loss function value. This loss value reduction can be possible with optimizers by using CNN. In both online and offline datasets, proposed model showed promising results by maintaining the accuracies with 90.28% for SGD, 96.97% for RMS-P and 93.57% for ADAM respectively.

Consonant-Vowel Classification Based Segmentation Technique for Handwritten Off-Line Hangul (자소 클래스 인식에 의한 off-line 필기체 한글 문자 분할)

  • Hwang, Sun-Ja;Kim, Mun-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.4
    • /
    • pp.1002-1013
    • /
    • 1996
  • The segmentation of characters is an important step in the automatic recognition of handwritten text. This paper proposes the segmenting method of off-line handwritten Hangul. The suggested approach is based on the structural characteristics of Hangul. The first step extracts the local features. connected component and strokes from the imput word. In the second step we identify the class of strokes. The third segmenting step specifies WRC(White Run Column) before consonant or horizontal vowel. If the segment is longer than threshold, the system estimates segmenting columns using the consonant-vowel information and column features, and then finds a cornered boundary along the strokes within the estimated segmenting columns.

  • PDF

Vehicle License Plate Text Recognition Algorithm Using Object Detection and Handwritten Hangul Recognition Algorithm (객체 검출과 한글 손글씨 인식 알고리즘을 이용한 차량 번호판 문자 추출 알고리즘)

  • Na, Min Won;Choi, Ha Na;Park, Yun Young
    • Journal of Information Technology Services
    • /
    • v.20 no.6
    • /
    • pp.97-105
    • /
    • 2021
  • Recently, with the development of IT technology, unmanned systems are being introduced in many industrial fields, and one of the most important factors for introducing unmanned systems in the automobile field is vehicle licence plate recognition(VLPR). The existing VLPR algorithms are configured to use image processing for a specific type of license plate to divide individual areas of a character within the plate to recognize each character. However, as the number of Korean vehicle license plates increases, the law is amended, there are old-fashioned license plates, new license plates, and different types of plates are used for each type of vehicle. Therefore, it is necessary to update the VLPR system every time, which incurs costs. In this paper, we use an object detection algorithm to detect character regardless of the format of the vehicle license plate, and apply a handwritten Hangul recognition(HHR) algorithm to enhance the recognition accuracy of a single Hangul character, which is called a Hangul unit. Since Hangul unit is recognized by combining initial consonant, medial vowel and final consonant, so it is possible to use other Hangul units in addition to the 40 Hangul units used for the Korean vehicle license plate.

Handwritten Indic Digit Recognition using Deep Hybrid Capsule Network

  • Mohammad Reduanul Haque;Rubaiya Hafiz;Mohammad Zahidul Islam;Mohammad Shorif Uddin
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.89-94
    • /
    • 2024
  • Indian subcontinent is a birthplace of multilingual people where documents such as job application form, passport, number plate identification, and so forth is composed of text contents written in different languages/scripts. These scripts may be in the form of different indic numerals in a single document page. Due to this reason, building a generic recognizer that is capable of recognizing handwritten indic digits written by diverse writers is needed. Also, a lot of work has been done for various non-Indic numerals particularly, in case of Roman, but, in case of Indic digits, the research is limited. Moreover, most of the research focuses with only on MNIST datasets or with only single datasets, either because of time restraints or because the model is tailored to a specific task. In this work, a hybrid model is proposed to recognize all available indic handwritten digit images using the existing benchmark datasets. The proposed method bridges the automatically learnt features of Capsule Network with hand crafted Bag of Feature (BoF) extraction method. Along the way, we analyze (1) the successes (2) explore whether this method will perform well on more difficult conditions i.e. noise, color, affine transformations, intra-class variation, natural scenes. Experimental results show that the hybrid method gives better accuracy in comparison with Capsule Network.

Matching Algorithm for Hangul Recognition Based on PDA

  • Kim Hyeong-Gyun;Choi Gwang-Mi
    • Journal of information and communication convergence engineering
    • /
    • v.2 no.3
    • /
    • pp.161-166
    • /
    • 2004
  • Electronic Ink is a stored data in the form of the handwritten text or the script without converting it into ASCII by handwritten recognition on the pen-based computers and Personal Digital Assistants(PDA) for supporting natural and convenient data input. One of the most important issue is to search the electronic ink in order to use it. We proposed and implemented a script matching algorithm for the electronic ink. Proposed matching algorithm separated the input stroke into a set of primitive stroke using the curvature of the stroke curve. After determining the type of separated strokes, it produced a stroke feature vector. And then it calculated the distance between the stroke feature vector of input strokes and one of strokes in the database using the dynamic programming technique.

A Framework for Digitalizing Handwritten Document using Digital Pen and Handwriting Recognition Technology (디지털펜과 필기체인식 기술을 이용한 수기문서 전자화 프레임워크)

  • Son, Bong-Ki;Kim, Hak-Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.3
    • /
    • pp.1417-1426
    • /
    • 2011
  • Business still relies heavily on pen and paper for legal reasons or convenience. The handwritten document is to be converted into digitalized document for IT system to manage and process in real time. Because the previous document digitalization systems convert the handwritten documents into digitalized documents by scanning and post-processing the documents, it is difficult to seamlessly proceed the work process. This paper proposes the LiveForm, a framework for digitalizing handwritten document using digital pen and handwriting recognition technology. To prove the applicability of the proposed LiveForm, we also implement a LiveForm based service in industrial gas distribution process and analyze effects of the system. The LiveForm generates the same digital image as the handwritten document by writing up the paper with absolute coordinates by digital pen and converts the handwriting data to digital text to insert the information into back-end system. The LiveForm based system eliminates scanning for document digitalization and data input with keyboard into back-end system in paper-based information gathering. Therefore, it is possible for the LiveForm to improve work process in various business areas.

Implementation of an efficient Pocket PC- based Hangul Matching System (Pocket PC기반의 효율적인 한글 정합 시스템 구현)

  • Park Jong-Min;Cho Beom-Joon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.7
    • /
    • pp.1546-1552
    • /
    • 2004
  • Electronic Ink is a stored data in the form of the handwritten text or the script without converting it into ASCII by handwritten recognition on the pen-based computers and Personal Digital Assistants(Pocket PC) for supporting natural and convenient data input. One of the most important issues is to search the electronic ink in order to use it. We proposed and implemented a script matching algorithm for the electronic ink. Proposed matching algorithm separated the input stroke into a set of primitive stroke using the curvature of the stroke curve. After determining the type of separated strokes, it produced a stroke feature vector. And then it calculated the distance between the stroke feature vector of input strokes and one of strokes in the database using the dynamic programming technique.

A Study on the Construction of a Document Input/Output system (문서 입출력 시스템의 구성에 관한 연구)

  • 함영국;도상윤;정홍규;김우성;박래홍;이창범;김상중
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.10
    • /
    • pp.100-112
    • /
    • 1992
  • In this paper, an integrated document input/output system is developed which constructs the graphic document from a text file, converts the document into encoded facsimile data, and also recognizes printed/handwritten alphanumerics and Korean characters in a facsimile or graphic document. For an output system, we develop the method which generates bit-map patterns from the document consisting of the KSC5601 and ASCII codes. The binary graphic image, if necessary, is encoded by the G3 coding scheme for facsimile transmission. For a user friendly input system for documents consisting of alphanumerics and Korean characters obtained from a facsimile or scanner, we propose a document recognition algirithm utilizing several special features(partial projection, cross point, and distance features) and the membership function of the fuzzy set theory. In summary, we develop an integrated document input/output system and its performance is demonstrated via computer simulation.

  • PDF

Text Area Segmentation and Layout Vectorization of Off-line Handwritten Forms (손으로 설계한 서식 문서의 문자 영역 분리 및 서식 벡터화)

  • Kim, Byeong-Yong;Gwon, O-Seok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.10
    • /
    • pp.3086-3097
    • /
    • 2000
  • 본 논문에서는 손으로 자유스럽게 그린 서식 문서에서 문자 영역을 분리하고, 이 중 선 성분을 벡터화하는 방법을 제안한다. 제안된 방법은 우선 이진화 및 세선화 과정에서의 데이터 손실을 방지하기 위해 스캔한 영상에 DRC 알고리즘을 적용한다. 그리고 영상의 기울어짐을 교정하기 위해 세선화된 영상에 허프 변환을 적용하여 기울어짐을 추정하고 교정한 다음, 서식의 구조를 이루는 선 성분을 추출해 낸다. 그리고 문자 영역은 연결 요소 분석법에 의해 문자 영역을 나타내는 데이터로 변환되며, 추출된 선 성분을 정렬, 합병 및 교정처리를 통해 벡터화 된다. 제안된 방법의 실효성을 입증하기 위해 각각 25명의 다른 사람이 필기구에 제한을 두지 않고 하나는 자를 사용하여 작성하고 다른 하나는 자를 사용하지 않고 작성한 서식에 대해 실험한 결과 전체 750개의 벡터 집합 중에서 전처리를 하지 않은 경우에는 666개, 전처리를 한 경우에는 746개의 서식 벡터 검출에 성공하여 그 유효성을 확인할 수 있었다.

  • PDF