• Title/Summary/Keyword: Handwritten Text

Search Result 40, Processing Time 0.022 seconds

Text Line Segmentation of Handwritten Documents by Area Mapping

  • Boragule, Abhijeet;Lee, GueeSang
    • Smart Media Journal
    • /
    • v.4 no.3
    • /
    • pp.44-49
    • /
    • 2015
  • Text line segmentation is a preprocessing step in OCR, which can significantly influence the accuracy of document analysis applications. This paper proposes a novel methodology for the text line segmentation of handwritten documents. First, the average width of the connected components is used to form a 1-D Gaussian kernel and a smoothing operation is then applied to the input binary image. The adaptive binarization of the smoothed image forms the final text lines. In this work, the segmentation method involves two stages: firstly, the large connected components are labelled as a unique text line using text line area mapping. Secondly, the final refinement of the segmentation is performed using the Euclidean distance between the text line and small connected components. The group of uniquely labelled text candidates achieves promising segmentation results. The proposed approach works well on Korean and English language handwritten documents captured using a camera.

Arabic Handwritten Manuscripts Text Recognition: A Systematic Review

  • Alghamdi, Arwa;Alluhaybi, Dareen;Almehmadi, Doaa;Alameer, Khadijah;Siddeq, Sundos Bin;Alsubait, Tahani
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.319-323
    • /
    • 2022
  • Handwritten text recognition is one of the active research areas nowadays. The progress in this field differs in every language. For example, the progress in Arabic handwritten text recognition is still insignificant and needs more attentions and efforts. One of the most important fields in this is Arabic handwritten manuscript text recognition which focuses in extracting text from historical manuscripts. For eons, ancients used manuscripts to write everything. Nowadays, there are millions of manuscripts all around the world. There are two main challenges in dealing with these manuscripts. The first one is that they are at the risk of damage since they are written in primitive materials, the second challenge is due to the difference in writing styles, hence most people are unable to read these manuscripts easily. Therefore, we discuss in this study different papers that are related to this important research field.

Text Line Segmentation using AHTC and Watershed Algorithm for Handwritten Document Images

  • Oh, KangHan;Kim, SooHyung;Na, InSeop;Kim, GwangBok
    • International Journal of Contents
    • /
    • v.10 no.3
    • /
    • pp.35-40
    • /
    • 2014
  • Text line segmentation is a critical task in handwritten document recognition. In this paper, we propose a novel text-line-segmentation method using baseline estimation and watershed. The baseline-detection algorithm estimates the baseline using Adaptive Head-Tail Connection (AHTC) on the document. Then, the watershed method segments the line region using the baseline-detection result. Finally, the text lines are separated by watershed result and a post-processing algorithm defines the lines more correctly. The scheme successfully segments text lines with 97% accuracy from the handwritten document images in the ICDAR database.

Machine Printed and Handwritten Text Discrimination in Korean Document Images

  • Trieu, Son Tung;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.5 no.3
    • /
    • pp.30-34
    • /
    • 2016
  • Nowadays, there are a lot of Korean documents, which often need to be identified in one of printed or handwritten text. Early methods for the identification use structural features, which can be simple and easy to apply to text of a specific font, but its performance depends on the font type and characteristics of the text. Recently, the bag-of-words model has been used for the identification, which can be invariant to changes in font size, distortions or modifications to the text. The method based on bag-of-words model includes three steps: word segmentation using connected component grouping, feature extraction, and finally classification using SVM(Support Vector Machine). In this paper, bag-of-words model based method is proposed using SURF(Speeded Up Robust Feature) for the identification of machine printed and handwritten text in Korean documents. The experiment shows that the proposed method outperforms methods based on structural features.

Text line separation in handwritten address image using partial projection technique (부분 투영기법을 이용한 필기체 주소 영상에서의 문자열 분리)

  • 정선화;남윤석
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.31-34
    • /
    • 2003
  • In this paper, we describe a method for separating text lines in handwritten Korean address images. The most remarkable feature of the proposed method is to use a modified projection technique. named a partial projection technique. A projection based text line separation method which projects the whole address image in horizontal direction to find split points for text line separation cannot avoid failing separation in case of images with a little skew or overlap between vertically neighboring text lines. To overcome this problem, we have introduced a partial projection technique which splits an address image into a few partial address images to be equal width and then project them each horizontally. The experiment done with 989 handwritten Korean address images extracted from live mails shows the superiority of the proposed method. The correct text-line separation rate fir the testing images was about 91.5%.

  • PDF

Language Identification in Handwritten Words Using a Convolutional Neural Network

  • Tung, Trieu Son;Lee, Gueesang
    • International Journal of Contents
    • /
    • v.13 no.3
    • /
    • pp.38-42
    • /
    • 2017
  • Documents of the last few decades typically include more than one kind of language, so linguistic classification of each word is essential, especially in terms of English and Korean in handwritten documents. Traditional methods mostly use conventional features of structural or stroke features, but sometimes they fail to identify many characteristics of words because of complexity introduced by handwriting. Therefore, traditional methods lead to a considerably more-complicated task and naturally lead to possibly poor results. In this study, convolutional neural network (CNN) is used for classification of English and Korean handwritten words in text documents. Experimental results reveal that the proposed method works effectively compared to previous methods.

The Recognition of Vowels and Consonants in a Handwritten Hangul Text with Attributed Grammars (속성문법을 이용한 필기체 한글 문서 내의 자모인식)

  • Lyu, Sung-Pil;Kim, Tae-Kyun
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.3
    • /
    • pp.85-94
    • /
    • 1989
  • This paper proposes a method to recognize vowels and consonants in a handwritten Hangul text, in which the sizes of chracters and the spaces between characters are not uniform. In this method, all characters in the thinned image of a handwritten Hangul text are transformed into strokes, and the attributes which represent the relations between strokes are extracted from these strokes, and the attributes which represent the relations between strokes are extracted from these strokes. The vowels and consonants are recognized by applying attributed grammars to the strokes and attributes.

  • PDF

Adaptive Character Segmentation to Improve Text Recognition Accuracy on Mobile Phones (모바일 시스템에서 텍스트 인식 위한 적응적 문자 분할)

  • Kim, Jeong Sik;Yang, Hyung Jeong;Kim, Soo Hyung;Lee, Guee Sang;Do, Luu Ngoc;Kim, Sun Hee
    • Smart Media Journal
    • /
    • v.1 no.4
    • /
    • pp.59-71
    • /
    • 2012
  • Since mobile phones are used as common communication devices, their applications are increasingly important to human's life. Using smart-phones camera to collect daily life environment's information is one of targets for many applications such as text recognition, object recognition or context awareness. Studies have been conducted to provide important information through the recognition of texts, which are artificially or naturally included in images and movies acquired from mobile phones. In this study, a character segmentation method that improves character-recognition accuracy in images obtained from mobile phone cameras is proposed. The proposed method first classifies texts in a given image to printed letters and handwritten letters since segmentation approaches for them are different. For printed letters, rough segmentation process is conducted, then the segmented regions are integrated, deleted, and re-segmented. Segmentation for the handwritten letters is performed after skews are corrected and the characters are classified by integrating them. The experimental result shows our method achieves a successful performance for both printed and handwritten letters as 95.9% and 84.7%, respectively.

  • PDF

Fuzzy-Membership Based Writer Identification from Handwritten Devnagari Script

  • Kumar, Rajiv;Ravulakollu, Kiran Kumar;Bhat, Rajesh
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.893-913
    • /
    • 2017
  • The handwriting based person identification systems use their designer's perceived structural properties of handwriting as features. In this paper, we present a system that uses those structural properties as features that graphologists and expert handwriting analyzers use for determining the writer's personality traits and for making other assessments. The advantage of these features is that their definition is based on sound historical knowledge (i.e., the knowledge discovered by graphologists, psychiatrists, forensic experts, and experts of other domains in analyzing the relationships between handwritten stroke characteristics and the phenomena that imbeds individuality in stroke). Hence, each stroke characteristic reflects a personality trait. We have measured the effectiveness of these features on a subset of handwritten Devnagari and Latin script datasets from the Center for Pattern Analysis and Recognition (CPAR-2012), which were written by 100 people where each person wrote three samples of the Devnagari and Latin text that we have designed for our experiments. The experiment yielded 100% correct identification on the training set. However, we observed an 88% and 89% correct identification rate when we experimented with 200 training samples and 100 test samples on handwritten Devnagari and Latin text. By introducing the majority voting based rejection criteria, the identification accuracy increased to 97% on both script sets.

An Implementation of Hangul Handwriting Correction Application Based on Deep Learning (딥러닝에 의한 한글 필기체 교정 어플 구현)

  • Jae-Hyeong Lee;Min-Young Cho;Jin-soo Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.3
    • /
    • pp.13-22
    • /
    • 2024
  • Currently, with the proliferation of digital devices, the significance of handwritten texts in daily lives is gradually diminishing. As the use of keyboards and touch screens increase, a decline in Korean handwriting quality is being observed across a broad spectrum of Korean documents, from young students to adults. However, Korean handwriting still remains necessary for many documentations, as it retains individual unique features while ensuring readability. To this end, this paper aims to implement an application designed to improve and correct the quality of handwritten Korean script The implemented application utilizes the CRAFT (Character-Region Awareness For Text Detection) model for handwriting area detection and employs the VGG-Feature-Extraction as a deep learning model for learning features of the handwritten script. Simultaneously, the application presents the user's handwritten Korean script's reliability on a syllable-by-syllable basis as a recognition rate and also suggests the most similar fonts among candidate fonts. Furthermore, through various experiments, it can be confirmed that the proposed application provides an excellent recognition rate comparable to conventional commercial character recognition OCR systems.