• Title/Summary/Keyword: optical character

Search Result 280, Processing Time 0.024 seconds

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Simple Frame Marker: Implementation of In-Marker Image and Character Recognition and Tracking Method (심플 프레임 마커: 마커 내부 이미지 및 문자 패턴의 인식 및 추적 기법 구현)

  • Kim, Hye-Jin;Woo, Woon-Tack
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.558-561
    • /
    • 2009
  • In this paper, we propose Simple Frame Marker(SFMarker) to support recognition of characters and images included in a marker in augmented reality. If characters are inserted inside of marker and are recognised using Optical Character Recognition(OCR), it doesn't need marker learning process before an execution. It also reduces visual disturbance compared to 2D barcode marker due to familarity of characters. Therefore, proposed SFMarker distinguishes Square SFMarker that embeds images from Rectangle SFMarker with characters according to ratio of marker and applies different recognition algorithms. Also, in order to reduce preprocessing of character recognition, SFMarker inserts direction information in border of marker and extracts it to execute character recognition fast and correctly. Finally, since the character recognition for every frame slows down tracking speed, we increase the speed of recognition process using the result of character recognition in previous frame when frame difference is low.

  • PDF

Design and Implementation of Personal Information Identification and Masking System Based on Image Recognition (이미지 인식 기반 향상된 개인정보 식별 및 마스킹 시스템 설계 및 구현)

  • Park, Seok-Cheon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.1-8
    • /
    • 2017
  • Recently, with the development of ICT technology such as cloud and mobile, image utilization through social networks is increasing rapidly. These images contain personal information, and personal information leakage accidents may occur. As a result, studies are underway to recognize and mask personal information in images. However, optical character recognition, which recognizes personal information in images, varies greatly depending on brightness, contrast, and distortion, and Korean recognition is insufficient. Therefore, in this paper, we design and implement a personal information identification and masking system based on image recognition through deep learning application using CNN algorithm based on optical character recognition method. Also, the proposed system and optical character recognition compares and evaluates the recognition rate of personal information on the same image and measures the face recognition rate of the proposed system. Test results show that the recognition rate of personal information in the proposed system is 32.7% higher than that of optical character recognition and the face recognition rate is 86.6%.

Trends in Deep Learning-based Medical Optical Character Recognition (딥러닝 기반의 의료 OCR 기술 동향)

  • Sungyeon Yoon;Arin Choi;Chaewon Kim;Sumin Oh;Seoyoung Sohn;Jiyeon Kim;Hyunhee Lee;Myeongeun Han;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.453-458
    • /
    • 2024
  • Optical Character Recognition is the technology that recognizes text in images and converts them into digital format. Deep learning-based OCR is being used in many industries with large quantities of recorded data due to its high recognition performance. To improve medical services, deep learning-based OCR was actively introduced by the medical industry. In this paper, we discussed trends in OCR engines and medical OCR and provided a roadmap for development of medical OCR. By using natural language processing on detected text data, current medical OCR has improved its recognition performance. However, there are limits to the recognition performance, especially for non-standard handwriting and modified text. To develop advanced medical OCR, databaseization of medical data, image pre-processing, and natural language processing are necessary.

Recent Trends in Deep Learning-Based Optical Character Recognition (딥러닝 기반 광학 문자 인식 기술 동향)

  • Min, G.;Lee, A.;Kim, K.S.;Kim, J.E.;Kang, H.S.;Lee, G.H.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.5
    • /
    • pp.22-32
    • /
    • 2022
  • Optical character recognition is a primary technology required in different fields, including digitizing archival documents, industrial automation, automatic driving, video analytics, medicine, and financial institution, among others. It was created in 1928 using pattern matching, but with the advent of artificial intelligence, it has since evolved into a high-performance character recognition technology. Recently, methods for detecting curved text and characters existing in a complicated background are being studied. Additionally, deep learning models are being developed in a way to recognize texts in various orientations and resolutions, perspective distortion, illumination reflection and partially occluded text, complex font characters, and special characters and artistic text among others. This report reviews the recent deep learning-based text detection and recognition methods and their various applications.

An Implementation of a System for Video Translation on Window Platform Using OCR (윈도우 기반의 광학문자인식을 이용한 영상 번역 시스템 구현)

  • Hwang, Sun-Myung;Yeom, Hee-Gyun
    • Journal of Internet of Things and Convergence
    • /
    • v.5 no.2
    • /
    • pp.15-20
    • /
    • 2019
  • As the machine learning research has developed, the field of translation and image analysis such as optical character recognition has made great progress. However, video translation that combines these two is slower than previous developments. In this paper, we develop an image translator that combines existing OCR technology and translation technology and verify its effectiveness. Before developing, we presented what functions are needed to implement this system and how to implement them, and then tested their performance. With the application program developed through this paper, users can access translation more conveniently, and also can contribute to ensuring the convenience provided in any environment.

The Centering of the Invariant Feature for the Unfocused Input Character using a Spherical Domain System

  • Seo, Choon-Weon
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.29 no.9
    • /
    • pp.14-22
    • /
    • 2015
  • TIn this paper, a centering method for an unfocused input character using the spherical domain system and the centering character to use the shift invariant feature for the recognition system is proposed. A system for recognition is implemented using the centroid method with coordinate average values, and the results of an above 78.14% average differential ratio for the character features were obtained. It is possible to extract the shift invariant feature using spherical transformation similar to the human eyeball. The proposed method, which is feature extraction using spherical coordinate transform and transformed extracted data, makes it possible to move the character to the center position of the input plane. Both digital and optical technologies are mixed using a spherical coordinate similar to the 3 dimensional human eyeball for the 2 dimensional plane format. In this paper, a centering character feature using the spherical domain is proposed for character recognition, and possibilities for the recognized possible character shape as well as calculating the differential ratio of the centered character using a centroid method are suggested.

Human Interface Software for Wireless and Mobile Devices (무선 이동 통신 기기용 휴먼인터페이스 소프트웨어)

  • Kim, Se-Ho;Lee, Chan-Gun
    • Journal of KIISE:Information Networking
    • /
    • v.37 no.1
    • /
    • pp.57-65
    • /
    • 2010
  • Recently, the character recognization technique is strongly needed to enable the mobile communication devices with cameras to gather input information from the users. In general, it is not easy to reuse a CBOCR(Camera Based Optical Character Recognizer) module because of its dependency on a specific platform. In this paper, we propose a software architecture for CBOCR module providing the easy adaptability to various mobile communication platforms. The proposed architecture is composed of the platform dependency support layer, the interface layer, the engine support layer, and the engine layer. The engine layer adopts a plug-in data structure to support various hardware endian policies. We show the effectiveness of the proposed method by applying the architecture to a practical product.

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.