DOI QR코드

DOI QR Code

Adversarial Shade Generation and Training Text Recognition Algorithm that is Robust to Text in Brightness

밝기 변화에 강인한 적대적 음영 생성 및 훈련 글자 인식 알고리즘

  • Seo, Minseok (Department of Information and Communication Engineering, Hanbat National University) ;
  • Kim, Daehan (Department of Information and Communication Engineering, Hanbat National University) ;
  • Choi, Dong-Geol (Department of Information and Communication Engineering, Hanbat National University)
  • Received : 2021.05.17
  • Accepted : 2021.07.06
  • Published : 2021.08.31

Abstract

The system for recognizing text in natural scenes has been applied in various industries. However, due to the change in brightness that occurs in nature such as light reflection and shadow, the text recognition performance significantly decreases. To solve this problem, we propose an adversarial shadow generation and training algorithm that is robust to shadow changes. The adversarial shadow generation and training algorithm divides the entire image into a total of 9 grids, and adjusts the brightness with 4 trainable parameters for each grid. Finally, training is conducted in a adversarial relationship between the text recognition model and the shaded image generator. As the training progresses, more and more difficult shaded grid combinations occur. When training with this curriculum-learning attitude, we not only showed a performance improvement of more than 3% in the ICDAR2015 public benchmark dataset, but also confirmed that the performance improved when applied to our's android application text recognition dataset.

Keywords

Acknowledgement

This research was supported by Korea Electric Power Corporation. (Grant number : 202100240001)

References

  1. J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 2019, DOI: 10.1109/iccv.2019.00481.
  2. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," IEEE, vol. 86, no. 11, pp. 2278-2324, 1998, DOI: 10.1109/5.726791.
  3. B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, "ASTER: An Attentional Scene Text Recognizer with Flexible Rectification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2035-2048, Sep., 2019, DOI: 10.1109/tpami.2018.2848939.
  4. F. F. Borisyuk, A. Gordo, and V. Sivakumar, "Rosetta: Large Scale System for Text Detection and Recognition in Images," 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, DOI: 10.1145/3219819.3219861.
  5. SHI, Baoguang; BAI, Xiang; YAO, Cong. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016, 39.11: 2298-2304, DOI: 10.1109/TPAMI.2016.2646371.
  6. J.-H. Kim and J. Lim, "License Plate Detection and Recognition Algorithm using Deep Learning," Journal of IKEEE, vol. 23, no. 2, pp. 642-651, Jun., 2019, DOI: 10.7471/IKEEE.2019.23.2.642.
  7. M. Seo, S. Lee, and D.-G. Choi, "Spatial-temporal Ensemble Method for Action Recognition," Journal of Korea Robotics Society, vol. 15, no. 4, pp. 385-391, Dec., 2020, DOI: 10.7746/jkros.2020.15.4.385.
  8. D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, "ICDAR 2015 competition on Robust Reading," 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 2015, DOI: 10.1109/icdar.2015.7333942.
  9. A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic Data for Text Localisation in Natural Images," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/cvpr.2016.254.
  10. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, "Synthetic data and artificial neural networks for natural scene text recognition," NIPS DLW, 2014, [Online], https://arxiv.org/pdf/1406.2227.pdf.
  11. D. Hendrycks and T. Dietterich, "Benchmarking neural network robustness to common corruptions and perturbations," International Conference on Learning Representations (ICLR), 2019, [Online], https://arxiv.org/pdf/1903.12261.pdf.
  12. E. Rusak, L. Schott, R. S. Zimmermann, J. Bitterwolf, O. Bringmann, M. Bethge, and W. Brendel, "A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions," Lecture Notes in Computer Science, pp. 53-69, 2020, DOI: 10.1007/978-3-030-58580-8_4.
  13. K. Wang, B. Babenko, and S. Belongie, "End-to-end scene text recognition," 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, DOI: 10.1109/iccv.2011.6126402.
  14. B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, "Robust Scene Text Recognition with Automatic Rectification," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, DOI: 10.1109/cvpr.2016.452.
  15. M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, "Spatial transformer networks," NIPS, 2015, [Online], https://proceedings.neurips.cc/paper/2015/file/33ceb07bf4eeb3da587e268d663aba1a-Paper.pdf.
  16. W. Liu, C. Chen, K. Wong, Z. Su, and J. Han, "STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition," British Machine Vision Conference 2016, 2016, DOI: 10.5244/c.30.43.
  17. Y. Mou, L. Tan, H. Yang, J. Chen, L. Liu, P. Yan, and Y. Huang, "PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit," European Conference on Computer Vision, pp. 158-174, 2020, DOI: 10.1007/978-3-030-58555-6_10.
  18. J.-H. Kim, "Automatic Recognition of Bank Security Card Using Smart Phone," The Journal of the Korea Contents Association, vol. 16, no. 12, pp. 19-26, Dec. 2016, DOI: 10.5392/JKCA.2016.16.12.019.
  19. S. Lee and G. Park, "Proposal for License Plate Recognition Using Synthetic Data and Vehicle Type Recognition System," Journal of Broadcast Engineering, vol. 25, no. 5, pp. 776-788, Sep., 2020, DOI: 10.5909/JBE.2020.25.5.776.
  20. C. -Y. Lee and S. Osindero, "Recursive recurrent nets with attention modeling for ocr in the wild," In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2231-2239, DOI: 10.1109/CVPR.2016.245.