Browse > Article
http://dx.doi.org/10.13088/jiis.2020.26.2.001

Automatic gasometer reading system using selective optical character recognition  

Lee, Kyohyuk (Management of Technology, Yonsei University)
Kim, Taeyeon (Computer Science, KAIST)
Kim, Wooju (Graduate School of Information and Industrial Engineering, Yonsei University)
Publication Information
Journal of Intelligence and Information Systems / v.26, no.2, 2020 , pp. 1-25 More about this Journal
Abstract
In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.
Keywords
Gasometer; automatic reading; selective optical character recognition; convolutional neural network; recurrent neural network; parallel processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Huang, G., Liu, Z., VanDer Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
2 Cao, Z., Simon, T., Wei,S.-E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
3 Dalal, N., & Triggs,B. (2005). Histograms of oriented gradients for human detection. Paper presented at the international Conference on computer vision & Pattern Recognition (CVPR'05).
4 Felzenszwalb, P. F., & Huttenlocher, D. P. (2004). Efficient graph-based image segmentation. International journal of computer vision, 59(2), 167-181.   DOI
5 Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202.   DOI
6 Girshick, R., Donahue, J.,Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. n Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 580-587).
7 Girshick, R., Donahue, J.,Darrell, T., & Malik, J. (2015). Fast r-cnn. n Proceedings of the IEEE international conference on computer vision (pp. 1440-1448).
8 Glorot, X., & Bengio,Y. (2010). Understanding the difficultyof training deep feedforward neural networks. Paper presented at the Proceedings of the thirteenth international conference on artificial intelligence and statistics.
9 He, K., Gkioxari, G.,Dollar, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision(pp. 2961-2969).
10 Gower, J. C., & Ross,G. J. (1969). Minimum spanning trees and single linkage cluster analysis. Applied statistics, 54-64.
11 He, K., Zhang, X., Ren,S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.   DOI
12 Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.   DOI
13 Huang, Z., Xu, W., &Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991.
14 Iandola, F., Moskewicz,M., Karayev, S., Girshick, R., Darrell, T., & Keutzer, K. (2014). Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869.
15 Ioffe, S., & Szegedy,C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning.
16 Jaderberg, M., et al.(2016). Reading Text in the Wild with Convolutional Neural Networks. International Journal of Computer Vision, vol. 116, no. 1, 2016, pp. 1-20.   DOI
17 Krizhevsky, A., Sutskever,I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
18 Liao, M., Shi, B., &Bai, X. (2018). Textboxes++:A single-shot oriented scene text detector. IEEE transactions on image processing, 27(8), 3676-3690.   DOI
19 Larsson, G., Maire, M., & Shakhnarovich, G. (2016). Fractalnet: Ultra-deep neural networks with outresiduals. arXiv preprintarXiv:1605.07648.
20 LeCun, Y., Bottou, L.,Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.   DOI
21 Liao, M., Shi, B., Bai, X., Wang, X., & Liu, W. (2017). Textboxes:A fast text detector with a single deep neural network. Paper presented at the Thirty-First AAAI Conference on Artificial Intelligence.
22 Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXivpreprint arXiv:1312.4400.
23 Lindeberg, T. (2012). Scale invariant feature transform.
24 Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Paper presented at the European conference on computer vision.
25 Nair, V., & Hinton, G.E. (2010). Rectified linear units improverestricted boltzmann machines. Paper presented at the Proceedings of the27th international conference on machine learning (ICML-10).
26 Liu, W., Chen, C., Wong, K.-Y. K., Su, Z., & Han, J. (2016). STAR-Net:A SpaTial Attention Residue Network for Scene Text Recognition. Paper presented at the BMVC.
27 Long, J., Shelhamer, E.,& Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
28 Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.   DOI
29 Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXivpreprint arXiv:1606.02147.
30 Papandreou, G., Zhu, T.,Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., & Murphy, K. (2017). Towards accurate multi-person poseestimation in the wild. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
31 Redmon, J., & Farhadi,A. (2017). YOLO9000:better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263-7271).
32 Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
33 Redmon, J., Divvala, S.,Girshick, R., & Farhadi, A. (2016). Youonly look once: Unified, real-time object detection. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
34 Shi, B., Bai, X., &Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11), 2298-2304.   DOI
35 Ren, S., He, K., Girshick,R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances inneural information processing systems (pp. 91-99).
36 Rosenblatt, F. (1958). Theperceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.   DOI
37 Rumelhart, D. E., Hinton,G. E., & Williams, R. J. (1986). Learning representations by back propagating errors. nature, 323(6088), 533-536.   DOI
38 Shi, B., et al. (2017).Detecting Oriented Text in Natural Images by Linking Segments. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3482-3490.
39 Szegedy, C., Ioffe, S.,Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-resnet andthe impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence.
40 Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
41 Szegedy, C., Liu, W., Jia,Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 1-9).
42 Tian, S., Lu, S., &Li, C. (2017). Wetext: Scene text detection under weak supervision. Paper presented at the Proceedings of the IEEE International Conference on Computer Vision.
43 Toshev, A., & Szegedy, C. (2014). Deeppose:Human poseestimation via deep neural networks. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
44 Uijlings, J. R., Van DeSande, K. E., Gevers, T., & Smeulders, A. W. (2013). Selective search for object recognition. International journal of computer vision, 104(2), 154-171.   DOI
45 Wei, S.-E., Ramakrishna,V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.
46 Zeiler, M. D., &Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. European Conference on Computer Vision, 818-833.
47 Ballard, D. H. (1981).Generalizing the Hough transform to detect arbitrary shapes. Pattern recognition, 13(2), 111-122.   DOI
48 He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
49 Ahn, H., K.-j. Kim, and I. Han, "Purchase Prediction Model using the Support Vector Machine," Journal of Intelligence and Information Systems, Vol.11, No.3(2005), 69-81.
50 Baek, Y., Lee, B., Han,D., Yun, S., & Lee, H. (2019). Character Region Awareness for Text Detection. arXiv preprintarXiv:1904.01941.
51 Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprintarXiv:1605.07678.
52 Zhu, X., etal. (2017). Deep Residual Text Detection Network for Scene Text. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017, pp. 807-812.
53 Zhou, X., Yao, C., Wen,H., Wang, Y., Zhou, S., He, W., & Liang, J. (2017). EAST: an efficient and accurate scene text detector. Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition.