• Title/Summary/Keyword: long short-term memory recurrent network

Search Result 143, Processing Time 0.018 seconds

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Automated Vehicle Research by Recognizing Maneuvering Modes using LSTM Model (LSTM 모델 기반 주행 모드 인식을 통한 자율 주행에 관한 연구)

  • Kim, Eunhui;Oh, Alice
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.4
    • /
    • pp.153-163
    • /
    • 2017
  • This research is based on the previous research that personally preferred safe distance, rotating angle and speed are differentiated. Thus, we use machine learning model for recognizing maneuvering modes trained per personal or per similar driving pattern groups, and we evaluate automatic driving according to maneuvering modes. By utilizing driving knowledge, we subdivided 8 kinds of longitudinal modes and 4 kinds of lateral modes, and by combining the longitudinal and lateral modes, we build 21 kinds of maneuvering modes. we train the labeled data set per time stamp through RNN, LSTM and Bi-LSTM models by the trips of drivers, which are supervised deep learning models, and evaluate the maneuvering modes of automatic driving for the test data set. The evaluation dataset is aggregated of living trips of 3,000 populations by VTTI in USA for 3 years and we use 1500 trips of 22 people and training, validation and test dataset ratio is 80%, 10% and 10%, respectively. For recognizing longitudinal 8 kinds of maneuvering modes, RNN achieves better accuracy compared to LSTM, Bi-LSTM. However, Bi-LSTM improves the accuracy in recognizing 21 kinds of longitudinal and lateral maneuvering modes in comparison with RNN and LSTM as 1.54% and 0.47%, respectively.

Comparative analysis of activation functions of artificial neural network for prediction of optimal groundwater level in the middle mountainous area of Pyoseon watershed in Jeju Island (제주도 표선유역 중산간지역의 최적 지하수위 예측을 위한 인공신경망의 활성화함수 비교분석)

  • Shin, Mun-Ju;Kim, Jin-Woo;Moon, Duk-Chul;Lee, Jeong-Han;Kang, Kyung Goo
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1143-1154
    • /
    • 2021
  • The selection of activation function has a great influence on the groundwater level prediction performance of artificial neural network (ANN) model. In this study, five activation functions were applied to ANN model for two groundwater level observation wells in the middle mountainous area of the Pyoseon watershed in Jeju Island. The results of the prediction of the groundwater level were compared and analyzed, and the optimal activation function was derived. In addition, the results of LSTM model, which is a widely used recurrent neural network model, were compared and analyzed with the results of the ANN models with each activation function. As a result, ELU and Leaky ReLU functions were derived as the optimal activation functions for the prediction of the groundwater level for observation well with relatively large fluctuations in groundwater level and for observation well with relatively small fluctuations, respectively. On the other hand, sigmoid function had the lowest predictive performance among the five activation functions for training period, and produced inappropriate results in peak and lowest groundwater level prediction. The ANN-ELU and ANN-Leaky ReLU models showed groundwater level prediction performance comparable to that of the LSTM model, and thus had sufficient potential for application. The methods and results of this study can be usefully used in other studies.