Browse > Article
http://dx.doi.org/10.7746/jkros.2021.16.3.276

Adversarial Shade Generation and Training Text Recognition Algorithm that is Robust to Text in Brightness  

Seo, Minseok (Department of Information and Communication Engineering, Hanbat National University)
Kim, Daehan (Department of Information and Communication Engineering, Hanbat National University)
Choi, Dong-Geol (Department of Information and Communication Engineering, Hanbat National University)
Publication Information
The Journal of Korea Robotics Society / v.16, no.3, 2021 , pp. 276-282 More about this Journal
Abstract
The system for recognizing text in natural scenes has been applied in various industries. However, due to the change in brightness that occurs in nature such as light reflection and shadow, the text recognition performance significantly decreases. To solve this problem, we propose an adversarial shadow generation and training algorithm that is robust to shadow changes. The adversarial shadow generation and training algorithm divides the entire image into a total of 9 grids, and adjusts the brightness with 4 trainable parameters for each grid. Finally, training is conducted in a adversarial relationship between the text recognition model and the shaded image generator. As the training progresses, more and more difficult shaded grid combinations occur. When training with this curriculum-learning attitude, we not only showed a performance improvement of more than 3% in the ICDAR2015 public benchmark dataset, but also confirmed that the performance improved when applied to our's android application text recognition dataset.
Keywords
Text Recognition; Deep Learning; Smart Phone Application;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Seo, S. Lee, and D.-G. Choi, "Spatial-temporal Ensemble Method for Action Recognition," Journal of Korea Robotics Society, vol. 15, no. 4, pp. 385-391, Dec., 2020, DOI: 10.7746/jkros.2020.15.4.385.   DOI
2 B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, "ASTER: An Attentional Scene Text Recognizer with Flexible Rectification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2035-2048, Sep., 2019, DOI: 10.1109/tpami.2018.2848939.   DOI
3 F. F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large Scale System for Text Detection and Recognition in Images," 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, DOI: 10.1145/3219819.3219861.   DOI
4 SHI, Baoguang; BAI, Xiang; YAO, Cong. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016, 39.11: 2298-2304, DOI: 10.1109/TPAMI.2016.2646371.   DOI
5 J.-H. Kim and J. Lim, "License Plate Detection and Recognition Algorithm using Deep Learning," Journal of IKEEE, vol. 23, no. 2, pp. 642-651, Jun., 2019, DOI: 10.7471/IKEEE.2019.23.2.642.   DOI
6 B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, "Robust Scene Text Recognition with Automatic Rectification," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/cvpr.2016.452.   DOI
7 M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, "Spatial transformer networks," NIPS, 2015, [Online], https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf.
8 W. Liu, C. Chen, K. Wong, Z. Su, and J. Han, "STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition," British Machine Vision Conference 2016, 2016, DOI: 10.5244/c.30.43.   DOI
9 Y. Mou, L. Tan, H. Yang, J. Chen, L. Liu, P. Yan, and Y. Huang, "PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit," European Conference on Computer Vision, pp. 158-174, 2020, DOI: 10.1007/978-3-030-58555-6_10.   DOI
10 J.-H. Kim, "Automatic Recognition of Bank Security Card Using Smart Phone," The Journal of the Korea Contents Association, vol. 16, no. 12, pp. 19-26, Dec. 2016, DOI: 10.5392/JKCA.2016.16.12.019.   DOI
11 S. Lee and G. Park, "Proposal for License Plate Recognition Using Synthetic Data and Vehicle Type Recognition System," Journal of Broadcast Engineering, vol. 25, no. 5, pp. 776-788, Sep., 2020, DOI: 10.5909/JBE.2020.25.5.776.   DOI
12 C. -Y. Lee and S. Osindero, "Recursive recurrent nets with attention modeling for ocr in the wild," In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2231-2239, DOI: 10.1109/CVPR.2016.245.   DOI
13 E. Rusak, L. Schott, R. S. Zimmermann, J. Bitterwolf, O. Bringmann, M. Bethge, and W. Brendel, "A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions," Lecture Notes in Computer Science, pp. 53-69, 2020, DOI: 10.1007/978-3-030-58580-8_4.   DOI
14 D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, "ICDAR 2015 competition on Robust Reading," 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015, DOI: 10.1109/icdar.2015.7333942.   DOI
15 A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic Data for Text Localisation in Natural Images," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/cvpr.2016.254.   DOI
16 D. Hendrycks and T. Dietterich, "Benchmarking neural network robustness to common corruptions and perturbations," International Conference on Learning Representations (ICLR), 2019, [Online], https://arxiv.org/pdf/1903.12261.pdf.
17 M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, "Synthetic data and artificial neural networks for natural scene text recognition," NIPS DLW, 2014, [Online], https://arxiv.org/pdf/1406.2227.pdf.
18 K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, DOI: 10.1109/iccv.2011.6126402.   DOI
19 Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," IEEE, vol. 86, no. 11, pp. 2278-2324, 1998, DOI: 10.1109/5.726791.   DOI
20 J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 2019, DOI: 10.1109/iccv.2019.00481.   DOI