• Title/Summary/Keyword: OCR - Optical Character Recognition

Search Result 134, Processing Time 0.023 seconds

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

Performance Improvement of Optical Character Recognition for Parts Book Using Pre-processing of Modified VGG Model (변형 VGG 모델의 전처리를 이용한 부품도면 문자 인식 성능 개선)

  • Shin, Hee-Ran;Lee, Sang-Hyeop;Park, Jang-Sik;Song, Jong-Kwan
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.2
    • /
    • pp.433-438
    • /
    • 2019
  • This paper proposes a method of improving deep learning based numbers and characters recognition performance on parts of drawing through image preprocessing. The proposed character recognition system consists of image preprocessing and 7 layer deep learning model. Mathematical morphological filtering is used as preprocessing to remove the lines and shapes which causes false recognition of numbers and characters on parts drawing. Further.. Further, the used deep learning model is a 7 layer deep learning model instead of VGG-16 model. As a result of the proposed OCR method, the recognition rate of characters is 92.57% and the precision is 92.82%.

Vehicle License Plate Text Recognition Algorithm Using Object Detection and Handwritten Hangul Recognition Algorithm (객체 검출과 한글 손글씨 인식 알고리즘을 이용한 차량 번호판 문자 추출 알고리즘)

  • Na, Min Won;Choi, Ha Na;Park, Yun Young
    • Journal of Information Technology Services
    • /
    • v.20 no.6
    • /
    • pp.97-105
    • /
    • 2021
  • Recently, with the development of IT technology, unmanned systems are being introduced in many industrial fields, and one of the most important factors for introducing unmanned systems in the automobile field is vehicle licence plate recognition(VLPR). The existing VLPR algorithms are configured to use image processing for a specific type of license plate to divide individual areas of a character within the plate to recognize each character. However, as the number of Korean vehicle license plates increases, the law is amended, there are old-fashioned license plates, new license plates, and different types of plates are used for each type of vehicle. Therefore, it is necessary to update the VLPR system every time, which incurs costs. In this paper, we use an object detection algorithm to detect character regardless of the format of the vehicle license plate, and apply a handwritten Hangul recognition(HHR) algorithm to enhance the recognition accuracy of a single Hangul character, which is called a Hangul unit. Since Hangul unit is recognized by combining initial consonant, medial vowel and final consonant, so it is possible to use other Hangul units in addition to the 40 Hangul units used for the Korean vehicle license plate.

Design for Automation System for Pharmaceutical Prescription Using Arduino and Optical Character Recognition

  • Lim, Myung-Jae;Jung, Dong-Kun;Kim, Kyu-Dong;Kwon, Young-Man
    • International journal of advanced smart convergence
    • /
    • v.10 no.3
    • /
    • pp.66-71
    • /
    • 2021
  • Recent healthcare environments have characteristics of expanding the scope of healthcare-impacting healthcare, complexity resulting from diversification of components, and accelerating the pace of change. Drugs are used for the prevention, mitigation, and treatment of diseases, so they can inevitably cause harm, while they have efficacy and effectiveness, which are key elements of health recovery. Therefore, many countries regulate permits for safe and effective medicines, and also designate essential drugs directly related to life as pay targets and guarantee health insurance. Especially Pharmacist relying on manpower for composition medicine is liable for mal-manufacture due to combination of toxic medical substances or other chemical usage. In this paper, we focus on using Kiosk and Optical Character Recognition (OCR) for automated pharmacy to level up medical service and create labor friendly environment for pharmacist themselves through maintenance of prescription data and automated manufacturing solution. Presentation of drug substances and precautions will lead to efficient drug prescription and prevent misuse of information while auto manufacturing system efficiently maintain labor force and raise patient satisfaction level by reduction of waiting time.

Frame Rearrangement Method by Time Information Remarked on Recovered Image (복원된 영상에 표기된 시간 정보에 의한 프레임 재정렬 기법)

  • Kim, Yong Jin;Lee, Jung Hwan;Byun, Jun Seok;Park, Nam In
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.12
    • /
    • pp.1641-1652
    • /
    • 2021
  • To analyze the crime scene, the role of digital evidence such as CCTV and black box is very important. Such digital evidence is often damaged due to device defects or intentional deletion. In this case, the deleted video can be restored by well-known techniques like the frame-based recovery method. Especially, the data such as the video can be generally fragmented and saved in the case of the memory used almost fully. If the fragmented video were recovered in units of images, the sequence of the recovered images may not be continuous. In this paper, we proposed a new video restoration method to match the sequence of recovered images. First, the images are recovered through a frame-based recovery technique. Then, after analyzing the time information marked on the images, the time information was extracted and recognized via optical character recognition (OCR). Finally, the recovered images are rearranged based on the time information obtained by OCR. For performance evaluation, we evaluate the recovery rate of our proposed video restoration method. As a result, it was shown that the recovery rate for the fragmented video was recovered from a minimum of about 47% to a maximum of 98%.

Digitization of Old Korean Texts with Obsolete Korean Characters and Suggestion for Improvement of Information Sharing (옛한글 문서의 전자문서화와 정보공유 방법 제안)

  • Kim, Ha Young;Yoo, Woo Sik
    • Journal of Conservation Science
    • /
    • v.37 no.3
    • /
    • pp.255-269
    • /
    • 2021
  • A vast amount of materials-such as prints, woodblock prints, manuscripts, old novels, and letters-written in old Korean and using old grammar and/or obsolete characters, are collected in many institutions, including the Jangseogak at the Academy of Korean Studies. Digitization of these texts has required a prolonged manual inputting process. Individual researchers, who majored in old Korean, have read and typed the characters into electronic documents, which depends upon individual skill, effort, and approach, and is particularly limiting because none can be significantly increased. To date, only a small proportion of the old Korean document collections, currently kept in storage, have been digitized and made available to the public. Even the electronic formats of the texts prove difficult to displaying correctly, due to the incompatibility between the old Korean characters and the character set on today's electronic devices. To improve the techniques and efficiency of digitizing old Korean texts, it is necessary to develop optical character recognition (OCR), which will analyze images of old Korean documents, as well as input, display, and storage methods.

A Study on the Prediction for the OCR Technology Development Trajectory based on the Patent and Article Information (특허와 논문정보를 활용한 OCR 기술발전 동향예측에 관한 연구)

  • Won Jun, Kim;Sang Kon, Lee;Sung Kuk, Pyo
    • Journal of Information Technology Services
    • /
    • v.21 no.6
    • /
    • pp.39-51
    • /
    • 2022
  • As the 4th Industrial Revolution emerged as a key to improving national competitiveness, OCR technology, one of the major technologies in the 4th industry is in the spotlight. Since characters in various images contain a lot of information, OCR technology for recognizing these characters has evolved into technology used in many industries. In this paper, trends in OCR technology were identified and predicted using thesis data published in 'RISS' and patent data by International patent classification (IPC) under the theme of Optical character recognition (OCR). For patent data 20,000 patents related to OCR technology from 2002 to 2020 were used as data, and 432 papers from 2012 to 2022 were used as data. Through time-series analysis, each patent data and thesis data were investigated since when OCR technology has developed, and various keyword analysis predicted which technology will be used in the future. Finally, the direction of future OCR technology development was presented through network association analysis with patent data and thesis data.

Structure Recognition Method of Invoice Document Image for Document Processing Automation (문서 처리 자동화를 위한 인보이스 이미지의 구조 인식 방법)

  • Dong-seok Lee;Soon-kak Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.2
    • /
    • pp.11-19
    • /
    • 2023
  • In this paper, we propose the methods of invoice document structure recognition and of making a spreadsheet electronic document. The texts and block location information of word blocks are recognized by an optical character recognition engine through deep learning. The word blocks on the same row and same column are found through their coordinates. The document area is divided through arrangement information of the word blocks. The character recognition result is inputted in the spreadsheet based on the document structure. In simulation result, the item placement through the proposed method shows an average accuracy of 92.30%.

Using Naïve Bayes Classifier and Confusion Matrix Spelling Correction in OCR (나이브 베이즈 분류기와 혼동 행렬을 이용한 OCR에서의 철자 교정)

  • Noh, Kyung-Mok;Kim, Chang-Hyun;Cheon, Min-Ah;Kim, Jae-Hoon
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.310-312
    • /
    • 2016
  • OCR(Optical Character Recognition)의 오류를 줄이기 위해 본 논문에서는 교정 어휘 쌍의 혼동 행렬(confusion matrix)과 나이브 베이즈 분류기($na{\ddot{i}}ve$ Bayes classifier)를 이용한 철자 교정 시스템을 제안한다. 본 시스템에서는 철자 오류 중 한글에 대한 철자 오류만을 교정하였다. 실험에 사용된 말뭉치는 한국어 원시 말뭉치와 OCR 출력 말뭉치, OCR 정답 말뭉치이다. 한국어 원시 말뭉치로부터 자소 단위의 언어모델(language model)과 교정 후보 검색을 위한 접두사 말뭉치를 구축했고, OCR 출력 말뭉치와 OCR 정답 말뭉치로부터 교정 어휘 쌍을 추출하고, 자소 단위로 분해하여 혼동 행렬을 만들고, 이를 이용하여 오류 모델(error model)을 구축했다. 접두사 말뭉치를 이용해서 교정 후보를 찾고 나이브 베이즈 분류기를 통해 확률이 높은 교정 후보 n개를 제시하였다. 후보 n개 내에 정답 어절이 있다면 교정을 성공하였다고 판단했고, 그 결과 약 97.73%의 인식률을 가지는 OCR에서, 3개의 교정 후보를 제시하였을 때, 약 0.28% 향상된 98.01%의 인식률을 보였다. 이는 한글에 대한 오류를 교정했을 때이며, 향후 특수 문자와 숫자 등을 복합적으로 처리하여 교정을 시도한다면 더 나은 결과를 보여줄 것이라 기대한다.

  • PDF

Using Naïve Bayes Classifier and Confusion Matrix Spelling Correction in OCR (나이브 베이즈 분류기와 혼동 행렬을 이용한 OCR에서의 철자 교정)

  • Noh, Kyung-Mok;Kim, Chang-Hyun;Cheon, Min-Ah;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.310-312
    • /
    • 2016
  • OCR(Optical Character Recognition)의 오류를 줄이기 위해 본 논문에서는 교정 어휘 쌍의 혼동 행렬(confusion matrix)과 나이브 베이즈 분류기($na{\ddot{i}}ve$ Bayes classifier)를 이용한 철자 교정 시스템을 제안한다. 본 시스템에서는 철자 오류 중 한글에 대한 철자 오류만을 교정하였다. 실험에 사용된 말뭉치는 한국어 원시 말뭉치와 OCR 출력 말뭉치, OCR 정답 말뭉치이다. 한국어 원시 말뭉치로부터 자소 단위의 언어 모델(language model)과 교정 후보 검색을 위한 접두사 말뭉치를 구축했고, OCR 출력 말뭉치와 OCR 정답 말뭉치로부터 교정 어휘 쌍을 추출하고, 자소 단위로 분해하여 혼동 행렬을 만들고, 이를 이용하여 오류 모델(error model)을 구축했다. 접두사 말뭉치를 이용해서 교정 후보를 찾고 나이브 베이즈 분류기를 통해 확률이 높은 교정 후보 n개를 제시하였다. 후보 n개 내에 정답 어절이 있다면 교정을 성공하였다고 판단했고, 그 결과 약 97.73%의 인식률을 가지는 OCR에서, 3개의 교정 후보를 제시하였을 때, 약 0.28% 향상된 98.01%의 인식률을 보였다. 이는 한글에 대한 오류를 교정했을 때이며, 향후 특수 문자와 숫자 등을 복합적으로 처리하여 교정을 시도한다면 더 나은 결과를 보여줄 것이라 기대한다.

  • PDF