Browse > Article
http://dx.doi.org/10.7471/ikeee.2020.24.4.1086

Scene Text Recognition Performance Improvement through an Add-on of an OCR based Classifier  

Chae, Ho-Yeol (Dpet. of Computer and Communications Engineering, Kangwon National University)
Seok, Ho-Sik (Dept. of Computer Science and Engineering, Kangwon National University)
Publication Information
Journal of IKEEE / v.24, no.4, 2020 , pp. 1086-1092 More about this Journal
Abstract
An autonomous agent for real world should be able to recognize text in scenes. With the advancement of deep learning, various DNN models have been utilized for transformation, feature extraction, and predictions. However, the existing state-of-the art STR (Scene Text Recognition) engines do not achieve the performance required for real world applications. In this paper, we introduce a performance-improvement method through an add-on composed of an OCR (Optical Character Recognition) engine and a classifier for STR engines. On instances from IC13 and IC15 datasets which a STR engine failed to recognize, our method recognizes 10.92% of unrecognized characters.
Keywords
Scene text recognition (STR); Optical character recognition (OCR); Text detection; Deep learning; Machine learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Zhu, C. Yao, and X. Bai, "Scene Text Detection and Recognition: Recent Advances and Future Trends," Front Compu .Sci, vol.10, pp. 19-36, 2016. DOI: 10.1007%2Fs11704-015-4488-0   DOI
2 X. Chen and A. L. Yuille, "Detecting and Reading Text in Natural Scenes," in Proc. CVPR 2004, 2004.
3 J.-J. Lee, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch, "AdaBoost for Text Detection in Natural Scene," in Proc. ICDAR 2011, pp.429-434. 2011. DOI: 10.1109/ICDAR.2011.93   DOI
4 J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis," in Proc. ICCV 2019, pp.4715-4723, 2019.
5 S. M. Hanif and L. Prevost, "Text Detection and Localization in Complex Scene Images using Constrained AdaBoost Algorithm," in Proc. Int. Conf. on Doc. Anal. and Recognit. pp.1-5, 2009. DOI: 10.1109/ICDAR.2009.172   DOI
6 K. I. Kim, K. Jung, and J. H. Kim, "Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm," IEEE Trans Pattern Ana. Mach Intell,. vol.25, no.12, pp.1631-1639, 2003. DOI: 10.1109/TPAMI.2003.1251157   DOI
7 C. Yi and Y. Tian, "Text String Detection From Natural Scenes by Structure-Based Partition and Grouping," IEEE Trans. Image Process, vol. 20, no.9, pp.2594-2605, 2011. DOI: 10.1109/TIP.2011.2126586   DOI
8 B. Epshtein, E. Ofek, and Y. Wexler, "Detecting text in natural scenes with stroke width transform," In Proc. of CVPR, 2010. DOI: 10.1109/CVPR.2010.5540041   DOI
9 L. Neumann and J. Matas, "A method for text localization and recognition in real-world images," in Proc. ACCV 2010, pp.779-783, 2010. DOI: 10.1007/978-3-642-19318-7_60   DOI
10 Z. Raisi, M. A. Naiel, P. Fieguth, S. Wardell, and J. Zelek, "Text Detection and Recognition in the Wild: a Review," arXiv:2006.04305, 2020.
11 M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, "Spatial Transformer Networks," in Proc. NIPS 2015, pp.2017-2025, 2015.
12 J. Matas, C. M. Urban, and T. Pajdla, "Robust wide-baseline stereo from maximally stable extremal regions," Image Vis Comput, vol.22, pp.761-767, 2004. DOI: 10.1016/j.imavis.2004.02.006   DOI
13 X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, "Robust Text Detection in Natural Scene Images," IEEE Trans Pattern Ana. Mach Intell, vol.36, no.5, pp.970-983, 2014. DOI: 10.1109/TPAMI.2013.182   DOI
14 A. Tabassum and S. A. Dhondse, "Text Detection Using MSER and Stroke Width Transform," in Proc. CSNT 2015, 2015. DOI: 10.1109/CSNT.2015.154   DOI
15 M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep Structured Output Learning for Unconstrained Text Recognition," arXiv:1412.5903,2015.
16 H. Hu, C. Zhang, Y. Luo, Y. Wang, J. Han, and E. Ding, "WordSup: Exploiting Word Annotations for Character based Text Detection," in Proc ICCV 2017, pp.4940-4949, 2017.
17 K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in Proc. ICLR, 2015.
18 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. CVPR 2016, pp.770-778, 2016.
19 C.-Y. Lee and S. Osindero, "Recursive Recurrent Nets with Attention Modeling for OCR in the Wild," in Proc CVPR 2016, pp.2231-2239, 2016.
20 Y. Baek, B. Lee, D. Han, S. Yun, and H. Lee, "Character Region Awareness for Text Detection," in Proc. CVPR 2019, 2019.
21 Y. Liu, S. Zhang, L. Jin, L. Xie, Y. Wu, and Z. Wang, "Omnidirectional Scene Text Detection with Sequential-free Box Discretization," in Proc. IJCAI 2019, 2019.
22 A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic Data for Text Localisation in Natural Images," in Proc. CVPR 2016, 2016.
23 Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, "Focusing Attention: Towards Accurate Text Recognition in Natural Images," in Proc. ICCV 2017, pp.5076-5084, 2017.
24 W. Hu, X. Cai, J. Hou, S. Yi, and Z. Lin, "GTC: Guided Training of CTC towards Efficient and Accurate Scene Text Recognition," in Proc. AAAI-20, pp.11005-11012, 2020.
25 M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition," in Workshop on Deep Learning NIPS, 2014.
26 A. Mishra, K. Alahari, and C. V. Jawahar, "Scene Text Recognition using Higher Order Language Priors," in Proc. BMVC 2012, 2012.
27 K. Wang, B. Babenko, and S. Belongie, "End-to-end Scene Text Recognition," in Proc. ICCV 2011, pp.1457-1464, 2011. DOI: 10.1109/ICCV.2011.6126402   DOI
28 S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, "ICDAR 2003 Robust Reading Competitions," In Proc. ICDAR 2003, pp.682-687, 2003.
29 D. Karatzas, et al. "ICDAR 2013 Robust Reading Competition," in Proc. ICDAR 2013, pp.1484-1493, 2013.
30 D. Karatzas et al., "ICDAR 2015 Competition on Robust Reading," in Proc. ICDAR 2015, 2015.
31 C. Yi and Y. Tian, "Text String Detection from Natural Scenes by Structure-based Partition and Grouping," IEEE Trans Image Process, vol. 20. no.9, pp.2594-2605. 2011. DOI: 10.1109/TIP.2011.2126586   DOI
32 Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. "EMNIST: an extension of MNIST to handwritten letters.," arXiv:1702.05373, 2017.
33 T. Y. Zhang, and C. Y. Suen, "A Fast Parallel Algorithm for Thining Digital patterns," Commun ACM, vol.27, no.3, pp.236-239, 1984. DOI: 10.1145/357994.358023   DOI