DOI QR코드

DOI QR Code

딥러닝 기반의 의료 OCR 기술 동향

Trends in Deep Learning-based Medical Optical Character Recognition

  • 윤성연 (서울여자대학교 데이터사이언스학과) ;
  • 최아린 (서울여자대학교 데이터사이언스학과) ;
  • 김채원 (서울여자대학교 데이터사이언스학과) ;
  • 오수민 (서울여자대학교 데이터사이언스학과) ;
  • 손서영 (서울여자대학교 데이터사이언스학과) ;
  • 김지연 (서울여자대학교 디지털미디어학과) ;
  • 이현희 (서울여자대학교 데이터과학전공) ;
  • 한명은 (서울여자대학교 데이터과학전공) ;
  • 박민서 (서울여자대학교 데이터사이언스학과)
  • 투고 : 2024.01.02
  • 심사 : 2024.02.01
  • 발행 : 2024.03.31

초록

광학 문자 인식(Optical Character Recognition, OCR)은 이미지 내의 문자를 인식하여 디지털 포맷(Digital Format)의 텍스트로 변환하는 기술이다. 딥러닝(Deep Learning) 기반의 OCR이 높은 인식률을 보여줌에 따라 대량의 기록 자료를 보유한 많은 산업 분야에서 OCR을 활용하고 있다. 특히, 의료 산업 분야는 의료 서비스 향상을 위해 딥러닝 기반의 OCR을 적극 도입하였다. 본 논문에서는 딥러닝 기반 OCR 엔진(Engine) 및 의료 데이터에 특화된 OCR의 동향을 살펴보고, 의료 OCR의 발전 방향에 대해 제시한다. 현재의 의료 OCR은 검출한 문자 데이터를 자연어 처리(Natural Language Processing, NLP)하여 인식률을 개선하였다. 그러나, 정형화되지 않은 손글씨(Handwriting)나 변형된 문자에서는 여전히 인식 정확도에 한계를 보였다. 의료 데이터의 데이터베이스(Database)화, 이미지 전처리(Pre-processing), 특화된 자연어 처리를 통해 더욱 고도화된 의료 OCR을 발전시키는 것이 필요하다.

Optical Character Recognition is the technology that recognizes text in images and converts them into digital format. Deep learning-based OCR is being used in many industries with large quantities of recorded data due to its high recognition performance. To improve medical services, deep learning-based OCR was actively introduced by the medical industry. In this paper, we discussed trends in OCR engines and medical OCR and provided a roadmap for development of medical OCR. By using natural language processing on detected text data, current medical OCR has improved its recognition performance. However, there are limits to the recognition performance, especially for non-standard handwriting and modified text. To develop advanced medical OCR, databaseization of medical data, image pre-processing, and natural language processing are necessary.

키워드

참고문헌

  1. J. Memon, M. Sami, R. A. Khan, and M. Uddin, "Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)," IEEE Access, Vol. 8, pp. 142642-142668, 2020. DOI:10.1109/ACCESS.2020.3012542
  2. Z. Raisi, M. A. Naiel, P. Fieguth, S. Wardell, and J. Zelek, "Text detection and recognition in the wild: A review," arXiv preprint arXiv:2006.04305, 30 Jun 2020. DOI: 10.48550/arXiv.2006.04305
  3. S. Ahn, H. Hwang, and J. Hee, "A Case Study on the Application of AI-OCR for Data Transformation of Paper Records," Journal of Information Management Society, Vol. 39, No. 3, pp. 165-193, 2022. DOI:10.3743/KOSIM.2022.3.165
  4. G. Min, A. Lee, K. S. Kim, J. E. Kim, H. S. Kang, and G. H. Lee, "Recent Trends in Deep Learning-Based Optical Character Recognition," Electronics and Telecommunications Trends, Vol. 37, No. 5, pp. 22-32, Oct 2022. DOI: 10.22648/ETRI.2022.J.370503
  5. D. Gifu, "AI-backed OCR in Healthcare," Procedia Computer Science, Vol. 207, No. 2, pp. 1134-1143, Oct 2022. DOI: 10.1016/j.procs.2022.09.169
  6. J. M. Park, S. K, Choi, J. Y. Kim, S. H. Jung, and C. B. Sim, "Implementation of a Drug Information Retrieval System Through OCR API pErformance Comparison," The J ournal of The Korea Institute of Electronic Communication Sciences (KIECS), Vol. 18, No. 5, pp. 989-998, 31 Oct 2023. DOI: 10.13067/JKIECS.2023.18.5.989
  7. E. Hsu, I. Malagaris, Y. F. Kuo, R. Sultana, and K. Roberts, "Deep learning-based NLP data pipeline for EHR-scanned document information extraction," JAMIA open, Vol. 5, No. 2, pp. 1-12, 2022. DOI: 10.1093/jamiaopen/ooac045
  8. W. A. Qader and M. M. Ameen, "Diagnosis of Diseases from Medical Check-up Test Reports Using OCR Technology with BoW and AdaBoost algorithms," In 2019 International Engineering Conference (IEC), pp. 205-210, 23-25 June 2019. DOI: 10.1109/IEC47844.2019.8950605
  9. W. A. J. R. Silva, H. M. K. Shirantha, L. J. M. V. N. Balalla, R. A. D. V. K., N. Kuruwitaarachchi, and D. Kasthurirathna, "Predicting Diabetes Mellitus Using Machine Learning and Optical Character Recognition," In 2021 6th International Conference for Convergence in Technology (I2CT), pp. 1-6, 02-04 Apr 2021. DOI: 10.1109/I2CT51068.2021.9417941
  10. N. Lee, M. Jeong, Y. Kim, J. Shin, I. Joe, S. Jeon, and B. Ko, "IoT-based Architecture and Implementation for Automatic Shock Treatment," KSII Transactions on Internet and Information Systems, Vol. 16, No. 7, pp. 2209-2224, June 2022. DOI: 10.3837/tiis.2022.07.005
  11. P. Batra, N. Pulkit, D. Kurmi, J. Tembhurne, P. Sahare, and T. Diwan, "OCR-MRD: performance analysis of different optical character recognition engines for medical report digitization," International Journal of Information Technology, Vol. 16, No.1, pp. 447-455, 24 November 2023. DOI:10.21203/rs.3.rs-2513255/v1
  12. S. Tangkawanit, J. Pooksook, J. Ieamsaard, and P. Sornkhom, "OCR Application for Cancer Care," 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 7-10 November 2022. DOI:10.23919/APSIPAASC55919.2022.9980078
  13. L. Drukker, R. Droste, C. Ioannou, L. Impey, J. A. Noble, and A. T. Papageorghiou, "Function and safety of SlowFlowHD ultrasound Doppler in obstetrics," Ultrasound in Medicine & Biology, Vol. 48, No. 6, pp. 1157-1162. DOI: 10.1016/j.ultrasmedbio.2022.02.012
  14. R. Smith, "An overview of the Tesseract OCR engine," In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2, pp. 629-633, 23-26 Sep 2007.
  15. EasyOCR, JaidedAI, 2023. Available online: https://github.com/JaidedAI/EasyOCR (accessed on 20 February 2024)
  16. H. Feng, Y. Wang, W. Zhou, J. Deng, and H. Li, "Doctr: Document image transformer for geometric unwarping and illumination correction," Proceedings of the 29th ACMInternational Conference on Multimedia, pp. 273-281, October 2021. DOI:10.48550/arXiv.2110.12942
  17. Keras-ocr, 2019. Available online: https://keras-ocr.readthedocs.io/en/latest/ (accessed on 20 February 2024)
  18. Naver CLOVA OCR, 2023. Available online: https://clova.ai/ocr/?lang=ko (accessed on 20 Feb ruary 2024)
  19. Cloud Vision API, Detect text in images, 2023. Available online: https://cloud.google.com/vision/docs/ocr?hl=ko (accessed on 20 February 2024)
  20. Y. Wang, M. Huang, L. Zhao, and X. Zhu, "Attention-based LSTM for aspect-level sentiment classification," Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606-315, January 2016. DOI:10.18653/v1/D16-1058
  21. Y. Baek, B. Lee, D. Han, S. Yun, and H Lee, "Character region awareness for text detection," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9365-9374, DOI: 10.48550/arXiv.1904.01941
  22. A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, Proceedings of the 23rd international conference on Machine learning, pp. 369-376, January 2006. DOI:10.1145/1143844.1143891
  23. A. Chaurasia and E. Culurciello, "Linknet: Exploiting encoder representations for efficient semantic segmentation," In 2017 IEEE visual communications and image processing (VCIP), pp. 1-4, 10-13 December 2017. DOI: 10.1109/VCIP.2017.8305148
  24. M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, "Real-time scene text detection with differentiable binarization," In Proceedings of the AAAI conference on artificial intelligence, Vol. 34, No. 7, pp. 11474-11481, April 2020. DOI: 10.1609/aaai.v34i07.6812
  25. B. Shi, X. Bai, and C. Yao, "An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition," IEEE transactions on pattern analysis and machine intelligence, Vol. 39, No. 11, pp. 2298-2304, November 2017. DOI: 10.1109/TPAMI.2016.2646371
  26. H. Li, P. Wang, C. Shen, and G. Zhang, "Show, attend and read: A simple and strong baseline for irregular text recognition," In Proceedings of the AAAI conference on artificial intelligence, Vol. 33, No. 01, pp. 8610-8617, July 2019. DOI: 10.1609/aaai.v33i01.33018610
  27. L. Ning, Y. Wenwen, Q. Xianbiao, C. Yihao, G. Ping, X. Rong, and B. Xiang, "Master: Multi-aspect non-local network for scene text recognition," Pattern Recognition, 15 April 2021. DOI:10.1016/j.patcog.2021.107980
  28. J. Devlin, M. W. Chang, K. Lee, K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv preprint arXiv:1810.04805, 2018. DOI: 10.48550/arXiv.1810.04805
  29. E. Alsentzer, J. R. Murphy, W. Boag, W. H. Weng, D. Jin, T. Naumann, and M. B. A. McDermott, "Publicly available clinical BERT embeddings," In Proceedings of the 2nd Clinical Natural Language Processing (ClinicalNLP) Workshop in North American Chapter of the Association for Computational Linguistics (NAACL), 2019. DOI: 10.48550/arXiv.1904.03323
  30. J. Polpinij and A. K. Ghose, "An ontology-based sentiment classification methodology for online consumer reviews," In 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Vol. 1, pp. 518-524, 09-12 Dec 2008. DOI: 10.1109/WIIAT.2008.68
  31. P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1, 8-14 December 2001. DOI: 10.1109/CVPR.2001.990517
  32. S. Yoon and M. Park, "Media-based Analysis of Gasoline Inventory with Korean Text Summarization," The Journal of the Convergence on Culture Technology (JCCT), Vol. 9, No. 5, pp. 509-515, Oct 2023. DOI: 10.17703/JCCT.2023.9.5.509
  33. H. S. Lee, "Rearch of Late Adolcent Activity based on Using Big Data Analysis," The International J ournal of Advanced Culture Technology (IJACT), Vol. 10, No. 4, pp. 361-368, Dec 2022. DOI: 10.17703/IJACT.2022.10.4.361
  34. M. R. Segal, "Machine learning benchmarks and random forest regression," Center for Bioinformatics and Molecular Biostatistics, 2004.