[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.14372/IEMEK.2020.15.2.51

Development of a Low-cost Industrial OCR System with an End-to-end Deep Learning Technology

Subedi, Bharat (Kumoh National Institute of Technology)
Yunusov, Jahongir (Kumoh National Institute of Technology)
Gaybulayev, Abdulaziz (Kumoh National Institute of Technology)
Kim, Tae-Hyong (Kumoh National Institute of Technology)

Publication Information

IEMEK Journal of Embedded Systems and Applications / v.15, no.2, 2020 , pp. 51-60 More about this Journal

Abstract

Optical character recognition (OCR) has been studied for decades because it is very useful in a variety of places. Nowadays, OCR's performance has improved significantly due to outstanding deep learning technology. Thus, there is an increasing demand for commercial-grade but affordable OCR systems. We have developed a low-cost, high-performance OCR system for the industry with the cheapest embedded developer kit that supports GPU acceleration. To achieve high accuracy for industrial use on limited computing resources, we chose a state-of-the-art text recognition algorithm that uses an end-to-end deep learning network as a baseline model. The model was then improved by replacing the feature extraction network with the best one suited to our conditions. Among the various candidate networks, EfficientNet-B3 has shown the best performance: excellent recognition accuracy with relatively low memory consumption. Besides, we have optimized the model written in TensorFlow's Python API using TensorFlow-TensorRT integration and TensorFlow's C++ API, respectively.

Keywords

Embedded systems; Optical character recognition; Deep learning; End-to-end approach; Low-cost implementation;

Citations & Related Records

Reference

1	M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition," pp. 1-10, 2014.
2	A. Poznanski, L. Wolf, "CNN-N-gram for Handwriting Word Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2305-2314, 2016.
3	Shangbang Long, Xin He, Cong Yao, "Scene Text Detection and Recognition: The Deep Learning Era," pp. 1-20, 2018.
4	M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, "Reading Text in the Wild with Convolutional Neural Networks," International Journal of Computer Vision, Vol. 116, No. 1, pp. 1-20, 2016. DOI
5	M. Liao, B. Shi, X. Bai, X. Wang, W. Liu, "Textboxes: A Fast Text Detector with a Single Deep Neural Network," Proceedings of Advancement of Artificial Intelligence, pp. 4161-4167, 2017.
6	Z. Tian, W. Huang, T. He, P. He, Y. Qiao, "Detecting Text in Natural Image with Connectionist Text Proposal Network," Proceedings of European Conference on Computer Vision, pp. 56-72, 2016.
7	X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, "East: An Efficient and Accurate Scene Text Detector," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551-5560, 2017.
8	Christian Bartz, Haojin Yang, Christoph Meinel, "STN-OCR: A Single Neural Network for Text Detection and Text Recognition," pp. 1-9, 2017.
9	M. Busta, L. Neumann, J. Matas, "Deep Textspotter: An End-to-end Trainable Scene Text Localization and Recognition Framework," Proceedings of IEEE International Conference on Computer Vision, pp. 2204-2212, 2017.
10	T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, C.Sun, "An End-to-end Textspotter with Explicit Alignment and Attention," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5020-5029, 2018.
11	X. Liu, D. Liang, S. Yan, D. Chen, Y. Qiao, J. Yan, "Fots: Fast Oriented Text Spotting with a Unified Network," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676-5685, 2018.
12	Wikipedia, Comparison of Deep-learning Software, Available on : https://en.wikipedia.org/wiki/Comparison_of_deep-learning_software
13	M. Liao, B. Shi, X. Bai, "TextBoxes++: A Single-shot Oriented Scene Text Detector," Journal of IEEE Transactions on Image Process, Vol. 27, No. 8, pp. 3676-3690, 2018. DOI
14	B. Shi, X. Bai, C. Yao, "An End-to-end Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition," Journal of IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 11, pp. 2298-2304, 2017. DOI
15	K. Simonyanm, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Proceedings of International Conference on Learning Representations, pp. 7-9, 2015.
16	K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
17	K. Kim, S. Hong, B. Roh, Y. Cheon, M. Park, "PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection," pp. 1-7, 2016.
18	A. Howard, M. Sandler, G. Chu, L. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, V. Q. Le, H. Adam, "Searching for MobileNetV3," Proceedings of IEEE International Conference on Computer Vision, pp. 1314-1324, 2019.
19	A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam,"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," pp. 1-9, 2017.
20	M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
21	M. Tan, Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolution Neural Networks," pp. 1-10, 2019.
22	Sandler, M. Howard, A. Zhu, M. Zhmoginov, L. Chen, "Mobilenetv2: Inverted Residuals and Linear Bottlenecks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
23	Tan, M. Chen, B. Pang, R. Vasudevan, V. Sandler, M. Howard, Q. V. Le, "MnasNet: Platform-aware Neural Architecture Search for Mobile," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828, 2019.
24	Hu, J. Shen, G. Sun, "Squeeze-and-excitation Networks," Proceedings of IEEE Conferenc on Computer Vision and Pattern Recognition, pp. 7132-7141, 2018.
25	xTensor, Multi-dimensional Arrays with Broadcasting and Lazy Computing, Available on : https://xtensor.readthedocs.io/
26	Tobias Knopp, NumCpp, Available on : https://numcpp.readthedocs.io/