DOI QR코드

DOI QR Code

YOLO, EAST : Comparison of Scene Text Detection Performance, Using a Neural Network Model

YOLO, EAST: 신경망 모델을 이용한 문자열 위치 검출 성능 비교

  • Received : 2021.06.29
  • Accepted : 2021.08.31
  • Published : 2022.03.31

Abstract

In this paper, YOLO and EAST models are tested to analyze their performance in text area detecting for real-world and normal text images. The earl ier YOLO models which include YOLOv3 have been known to underperform in detecting text areas for given images, but the recently released YOLOv4 and YOLOv5 achieved promising performances to detect text area included in various images. Experimental results show that both of YOLO v4 and v5 models are expected to be widely used for text detection in the filed of scene text recognition in the future.

본 논문에서는 최근 다양한 분야에서 많이 활용되고 있는 YOLO와 EAST 신경망을 이미지 속 문자열 탐지문제에 적용해보고 이들의 성능을 비교분석 해 보았다. YOLO 신경망은 일반적으로 이미지 속 문자영역 탐지에 낮은 성능을 보인다고 알려졌으나, 실험결과 YOLOv3는 문자열 탐지에 비교적 약점을 보이지만 최근 출시된 YOLOv4와 YOLOv5의 경우 다양한 형태의 이미지 속에 있는 한글과 영문 문자열 탐지에 뛰어난 성능을 보여줌을 확인하였다. 따라서, 이들 YOLO 신경망 기반 문자열 탐지방법이 향후 문자 인식 분야에서 많이 활용될 것으로 전망한다.

Keywords

Acknowledgement

이 논문은 2019~2021년도 중소벤처기업부의 창업성장 기술개발사업 지원에 의해 이루어짐[S2833775].

References

  1. Y. M. Baek, B. D. Lee, D. Y. Han, S. D. Yun, and H. S. Lee, "Character region awareness for text detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.9365-9374, 2019.
  2. T. Wang, T. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, and M. Cai, "Decoupled attention network for text recognition," in Proceedings of the AAAI Conference on Artificial Intelligence, Vol.34, No.7, pp.12216-12224, 2019.
  3. P. Lyu, C. Yao, W. Wu, S. Yan, and X. Bai, "Multi-oriented scene text detection via corner localization and region segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7553-7563, 2018.
  4. Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, "Detecting text in natural image with connectionist text proposal network," in European Conference on Computer Vision, Springer, Cham, pp.56-72, 2016.
  5. M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, "Textboxes: A fast text detector with a single deep neural network," in Thirty-first AAAI Conference on Artificial Intelligence, 2017.
  6. M. Liao, B. Shi, and X. Bai, "Textboxes++: A single-shot oriented scene text detector," IEEE Transactions on Image Processing, Vol.27, No.8, pp.3676-3690, 2018. https://doi.org/10.1109/TIP.2018.2825107
  7. F. Jiang, Z. Hao, and X. Liu, "Deep scene text detection with connected component proposals," arXiv preprint arXiv:1708.05133, 2017.
  8. Y. Jiang, et al., "R2cnn: rotational region cnn for orientation robust scene text detection," arXiv preprint arXiv:1706.09579, 2017.
  9. X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, "East: an efficient and accurate scene text detector," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5551-5560, 2017.
  10. H. Hu, C. Zhang, Y. Luo, Y. Wang, J. Han, and E. Ding, "Wordsup: Exploiting word annotations for character based text detection," in Proceedings of the IEEE International Conference on Computer Vision, pp.4940-4949, 2017.
  11. S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, "Textsnake: A flexible representation for detecting text of arbitrary shapes," in Proceedings of the European Conference on Computer Vision (ECCV), pp.20-36, 2018.
  12. T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, "An end-to-end textspotter with explicit alignment and attention," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5020-5029, 2018.
  13. P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes," in Proceedings of the European Conference on Computer Vision (ECCV), pp.67-83, 2018.
  14. P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, "Single shot text detector with regional attention," in Proceedings of the IEEE International Conference on Computer Vision, pp.3047-3055, 2017.
  15. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.779-788, 2016.
  16. S. Qin and R. Manduchi, "Cascaded segmentation-detection networks for word-level text spotting," in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol.1, pp.1275-1282, 2017.
  17. A. Bochkovskiy, C. Y. Wang, and H. Y. M.. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
  18. G. Jocher, K. Nishimura, T. Mineeva, R. Vilarino, GitHub repository [Internet], https://github.com/ultralytics/yolov5
  19. J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
  20. X. Wang, S. Zheng, C. Zhang, R. Li, and L. Gui, "R-YOLO: A real-time text detector for natural scenes with arbitrary rotation," Sensors, Vol.21, No.3, pp.888, 2021.