Browse > Article
http://dx.doi.org/10.5762/KAIS.2020.21.5.451

A Personal Information Security System using Form Recognition and Optical Character Recognition in Electronic Documents  

Baek, Jong-Kyung (Division of Computer, Graduate school of Soongsil University)
Jee, Yoon-Seok (Department of IT Policy Management, Graduate school of Soongsil University)
Park, Jae-Pyo (Graduate School of Information Science, Soongsil University)
Publication Information
Journal of the Korea Academia-Industrial cooperation Society / v.21, no.5, 2020 , pp. 451-457 More about this Journal
Abstract
Format recognition and OCR techniques are widely used as methods for detecting and protecting personal information from electronic documents. However, due to the poor recognition rate of the OCR engine, personal information cannot be detected or false positives commonly occur. It also takes a long time to analyze a large amount of electronic documents. In this paper, we propose a method to improve the speed of image analysis of electronic documents, character recognition rate of the OCR engine, and detection rate of personal information by improving the existing method. The analysis speed was increased using the format recognition method while the analysis speed and character recognition rate of the OCR engine was improved by image correction. An algorithm for analyzing personal information from images was proposed to increase the reconnaissance rate of personal information. Through the experiments, 1755 image format recognition samples were analyzed in an average time of 0.24 seconds, which was 0.5 seconds higher than the conventional PAID system format recognition method, and the image recognition rate was 99%. The proposed method in this paper can be used in various fields such as public, telecommunications, finance, tourism, and security as a system to protect personal information in electronic documents.
Keywords
Classification; OCR; Image Correction; Personal Information; Security;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 S. H. Lee, J. H. Joen, H. S. Hong, D. H. Kang, M. H. Park, "Korean Prescription Character Recognition System Using OCR Technology", Korean Institute of Information Scientists and Engineers 2017 Conference, Korea, pp.362-364, 2017.
2 I. G. Cheon, T. Y. Young, "Basic image processing", KiHanJae, 1999.
3 D. H. Jang, "Implementation of Digital Image Processing", PC ADVANCE, 1999.
4 https://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software (accessed Oct. 31. 2019)
5 https://docs.opencv.org/4.1.2 (accessed Oct. 31. 2019)
6 https://en.wikipedia.org/wiki/Regular_expression (accessed Oct. 31. 2019)
7 Ray Smith, "An Overview of the Tesseract OCR Engine", Google Inc., 2007.
8 J. H. Cho, C. W. Ahn, "Auto Detection System of Personal Information based on Images and Document Analysis", The Journal of The Institute of Internet, Broadcasting and Communication, Vol 15 No 5, pp.183-192, 2015. DOI:https://doi.org/10.7236/JIIBC.2015.15.5.183   DOI
9 J. W. Kim, S. T. Kim, J. Y. Yoon, Y. I. Joo, "A Personal Prescription Management System Employing Optical Character Recognition Technique", Journal of the Korea Institute of Information and Communication Engineering, Vol 19, No. 10, pp.2423-2428, 2015. DOI:https://doi.org/10.6109/jkiice.2015.19.10.2423   DOI
10 S. C. Park, "Design and Implementation of Personal Information Identification and Masking System Based on Image Recognition", The Journal of The Institute of Internet, Broadcasting and Communication, Vol 17 No 5, pp.1-8, 2017. DOI:https://doi.org/10.7236/JIIBC.2017.17.5.1   DOI
11 Y. G. Kim, "Improvement of Korean Characters Recognition Performance Using CNN and Feature Extraction", Ph.D dissertation, Pusan National University, 2017.
12 G. W. Joe, "A Personal Information Detection Method of Image File", Master's thesis, Jeonbuk National University, 2018.