• Title/Summary/Keyword: Handwritten Document

Search Result 16, Processing Time 0.022 seconds

Text Line Segmentation using AHTC and Watershed Algorithm for Handwritten Document Images

  • Oh, KangHan;Kim, SooHyung;Na, InSeop;Kim, GwangBok
    • International Journal of Contents
    • /
    • v.10 no.3
    • /
    • pp.35-40
    • /
    • 2014
  • Text line segmentation is a critical task in handwritten document recognition. In this paper, we propose a novel text-line-segmentation method using baseline estimation and watershed. The baseline-detection algorithm estimates the baseline using Adaptive Head-Tail Connection (AHTC) on the document. Then, the watershed method segments the line region using the baseline-detection result. Finally, the text lines are separated by watershed result and a post-processing algorithm defines the lines more correctly. The scheme successfully segments text lines with 97% accuracy from the handwritten document images in the ICDAR database.

Language Identification in Handwritten Words Using a Convolutional Neural Network

  • Tung, Trieu Son;Lee, Gueesang
    • International Journal of Contents
    • /
    • v.13 no.3
    • /
    • pp.38-42
    • /
    • 2017
  • Documents of the last few decades typically include more than one kind of language, so linguistic classification of each word is essential, especially in terms of English and Korean in handwritten documents. Traditional methods mostly use conventional features of structural or stroke features, but sometimes they fail to identify many characteristics of words because of complexity introduced by handwriting. Therefore, traditional methods lead to a considerably more-complicated task and naturally lead to possibly poor results. In this study, convolutional neural network (CNN) is used for classification of English and Korean handwritten words in text documents. Experimental results reveal that the proposed method works effectively compared to previous methods.

Machine Printed and Handwritten Text Discrimination in Korean Document Images

  • Trieu, Son Tung;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.5 no.3
    • /
    • pp.30-34
    • /
    • 2016
  • Nowadays, there are a lot of Korean documents, which often need to be identified in one of printed or handwritten text. Early methods for the identification use structural features, which can be simple and easy to apply to text of a specific font, but its performance depends on the font type and characteristics of the text. Recently, the bag-of-words model has been used for the identification, which can be invariant to changes in font size, distortions or modifications to the text. The method based on bag-of-words model includes three steps: word segmentation using connected component grouping, feature extraction, and finally classification using SVM(Support Vector Machine). In this paper, bag-of-words model based method is proposed using SURF(Speeded Up Robust Feature) for the identification of machine printed and handwritten text in Korean documents. The experiment shows that the proposed method outperforms methods based on structural features.

A Framework for Digitalizing Handwritten Document using Digital Pen and Handwriting Recognition Technology (디지털펜과 필기체인식 기술을 이용한 수기문서 전자화 프레임워크)

  • Son, Bong-Ki;Kim, Hak-Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.3
    • /
    • pp.1417-1426
    • /
    • 2011
  • Business still relies heavily on pen and paper for legal reasons or convenience. The handwritten document is to be converted into digitalized document for IT system to manage and process in real time. Because the previous document digitalization systems convert the handwritten documents into digitalized documents by scanning and post-processing the documents, it is difficult to seamlessly proceed the work process. This paper proposes the LiveForm, a framework for digitalizing handwritten document using digital pen and handwriting recognition technology. To prove the applicability of the proposed LiveForm, we also implement a LiveForm based service in industrial gas distribution process and analyze effects of the system. The LiveForm generates the same digital image as the handwritten document by writing up the paper with absolute coordinates by digital pen and converts the handwriting data to digital text to insert the information into back-end system. The LiveForm based system eliminates scanning for document digitalization and data input with keyboard into back-end system in paper-based information gathering. Therefore, it is possible for the LiveForm to improve work process in various business areas.

A Handwritten Document Digitalization Framework based Defect Management System in Educational Facilities (수기문서 전자화 프레임워크 기반의 교육시설 하자관리 시스템)

  • Son, Bong-Ki
    • The Journal of Sustainable Design and Educational Environment Research
    • /
    • v.9 no.3
    • /
    • pp.1-11
    • /
    • 2010
  • In the construction industry, IT based information system has been diversely applied to increase productivity. Although IT device such as PDA, RFID, Barcode, wireless network and web camera has been introduced to gather information in construction site, the effect of the IT device is limited, because of bringing about additional works of engineer. In this paper, we proposed a defect management system which is based on handwritten document digitalization framework for introducing applicability of new IT device, digital pen. By the proposed system, we can effectively gather and input defect information to defect management system by using digital pen and paper like conventional way. Applying the data gathering device, digital pen to defect management, it is able to increase productivity by improving work process, building up and utilizing defect information database of good quality.

A Production Traceability Information Gathering System based on Handwritten Data Digitalization Technology in Agro-livestock Products (수기정보 전자화 기술 기반의 농축산물 생산이력정보 수집 시스템)

  • Son, Bong-Ki
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.10
    • /
    • pp.4632-4641
    • /
    • 2011
  • The detailed production traceability information is a fundamental element in successful introduction and revitalization of traceability system. In this paper, we propose a production traceability information gathering system which is based on handwritten data digitalization technology in agro-livestock products. By the proposed system, we can effectively gather the detailed production traceability information with digital pen and the management ledger of paper document type by only writing the ledger. The server of the system generates the same digital image as the ledger and converts the handwritten data into digital text to insert the data into the database. Because the system is superior to data gathering system based on PC, PDA and touch screen in mobility, usability, data input speed, suitability in agro-livestock environment, it is possible to effectively gather traceability information of high quality by users even if they have low information ability and insufficient time to input data. We expect that the handwritten data digitalization technology is used to gather document based information in stage of manufacturing, distribution and marketing. In addition, this technology is applied to implementing advanced traceability system with RFID/USN based systems.

Document Structure Understanding on Subjects Registration Table

  • Ito, Yuichi;Ohno, Masanaga;Tsuruoka, Shinji;Yoshikawa, Tomohiro;Tsuyoshi, Shinogi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.571-574
    • /
    • 2003
  • This research is aimed to automate the generating process of the database from paper based table forms like this work. The registration table has so complicate table structures, ana in this research we used the registration tables as an example of general table structure understanding. We propose a table structure understanding system for some table types, and it has some steps. The first step is that the document images on paper are read from the image scanner. The second step is that a document image segments into some tables. In the third step, the character strings is extracted using image processing technology and the property of the character strings is determined. And the structured database is generated automatically. The proposed system consists of two systems. "Master document generation system" is used for the table form definition, and it doesn′t include the handwritten characters. "Structure analysis system for complete d table" is used for the written form, and it analyzes the table form filled in the handwritten character. We implemented the system using MS Visual C++ on Windows, and it can get the correct extraction rate 98% among 51 registration tables written by the different students.

  • PDF

Text Line Segmentation of Handwritten Documents by Area Mapping

  • Boragule, Abhijeet;Lee, GueeSang
    • Smart Media Journal
    • /
    • v.4 no.3
    • /
    • pp.44-49
    • /
    • 2015
  • Text line segmentation is a preprocessing step in OCR, which can significantly influence the accuracy of document analysis applications. This paper proposes a novel methodology for the text line segmentation of handwritten documents. First, the average width of the connected components is used to form a 1-D Gaussian kernel and a smoothing operation is then applied to the input binary image. The adaptive binarization of the smoothed image forms the final text lines. In this work, the segmentation method involves two stages: firstly, the large connected components are labelled as a unique text line using text line area mapping. Secondly, the final refinement of the segmentation is performed using the Euclidean distance between the text line and small connected components. The group of uniquely labelled text candidates achieves promising segmentation results. The proposed approach works well on Korean and English language handwritten documents captured using a camera.

A Study on the Construction of a Document Input/Output system (문서 입출력 시스템의 구성에 관한 연구)

  • 함영국;도상윤;정홍규;김우성;박래홍;이창범;김상중
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.10
    • /
    • pp.100-112
    • /
    • 1992
  • In this paper, an integrated document input/output system is developed which constructs the graphic document from a text file, converts the document into encoded facsimile data, and also recognizes printed/handwritten alphanumerics and Korean characters in a facsimile or graphic document. For an output system, we develop the method which generates bit-map patterns from the document consisting of the KSC5601 and ASCII codes. The binary graphic image, if necessary, is encoded by the G3 coding scheme for facsimile transmission. For a user friendly input system for documents consisting of alphanumerics and Korean characters obtained from a facsimile or scanner, we propose a document recognition algirithm utilizing several special features(partial projection, cross point, and distance features) and the membership function of the fuzzy set theory. In summary, we develop an integrated document input/output system and its performance is demonstrated via computer simulation.

  • PDF

Electronic Document Automation System Model for Improving Productivity in maintenance work - in Inspection Process of Construction Equipment Maintenance - (정비작업의 생산성 향상을 위한 전자문서자동화시스템 모형 - 건설장비 정비작업을 중심으로 -)

  • Kong, Myung-Dal
    • Journal of the Korea Safety Management & Science
    • /
    • v.19 no.3
    • /
    • pp.49-58
    • /
    • 2017
  • This paper suggests a specific model that could efficiently improve the interaction and the interface between MES(Manufacturing Execution System) server and POP(Point of Production) terminal through electronic document server and electronic pen, bluetooth receiver and form paper in disassembly and process inspection works. The proposed model shows that the new method by electronic document automation system can more efficiently perform to reduce processing time for maintenance work, compared with the current approach by handwritten processing system. It is noted in case of the method by electronic document automation system that the effects of proposed model are as follows; (a) While the processing time per equipment for maintenance by the current method was 300 minutes, the processing time by the new method was 50 minutes. (b) While the processing error ratio by the current method was 20%, the error ratio by the new method was 1%.