• Title/Summary/Keyword: Feature point extraction

Search Result 265, Processing Time 0.019 seconds

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Implementation of Constructor-Oriented Visualization System for Occluded Construction via Mobile Augmented-Reality (모바일 증강현실을 이용한 작업자 중심의 폐색된 건축물 시각화 시스템 개발)

  • Kim, Tae-Ho;Kim, Kyung-Ho;Han, Yunsang;Lee, Seok-Han;Choi, Jong-Soo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.55-68
    • /
    • 2014
  • Some infrastructure these days is usually constructed under the ground for it to not interfere the foot-traffic of pedestrians, and thus, it is difficult to visually confirm the accurate location of the site where the establishments must be buried. These technical difficulties increase the magnitude of the problems that could arise from over-reliance on the experience of the worker or a mere blueprint. Such problems include exposure to flood and collapse. This paper proposes a constructor-oriented visualization system via mobile gadgets in general construction sites with occluded structures. This proposal is consisted with three stages. First, "Stage of detecting manhole and extracting features" detects and extracts the basis point of occluded structures which is unoccluded manhole. Next, "Stage of tracking features" tracks down the extracted features in the previous stage. Lastly, "Stage of visualizing occluded constructions" analyzes and synthesizes the GPS data and 3D objects obtained from mobile gadgets in the previous stages. This proposal implemented ideal method through parallel analysis of manhole detection, feature extraction, and tracking techniques in indoor environment, and confirmed the possibility through occluded water-pipe augmentation in real environment. Also, it offers a practical constructor-oriented environment derived from the augmented 3D results of occluded water-pipings.

Design of ASM-based Face Recognition System Using (2D)2 Hybird Preprocessing Algorithm (ASM기반 (2D)2 하이브리드 전처리 알고리즘을 이용한 얼굴인식 시스템 설계)

  • Kim, Hyun-Ki;Jin, Yong-Tak;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.2
    • /
    • pp.173-178
    • /
    • 2014
  • In this study, we introduce ASM-based face recognition classifier and its design methodology with the aid of 2-dimensional 2-directional hybird preprocessing algorithm. Since the image of face recognition is easily affected by external environments, ASM(active shape model) as image preprocessing algorithm is used to resolve such problem. In particular, ASM is used widely for the purpose of feature extraction for human face. After extracting face image area by using ASM, the dimensionality of the extracted face image data is reduced by using $(2D)^2$hybrid preprocessing algorithm based on LDA and PCA. Face image data through preprocessing algorithm is used as input data for the design of the proposed polynomials based radial basis function neural network. Unlike as the case in existing neural networks, the proposed pattern classifier has the characteristics of a robust neural network and it is also superior from the view point of predictive ability as well as ability to resolve the problem of multi-dimensionality. The essential design parameters (the number of row eigenvectors, column eigenvectors, and clusters, and fuzzification coefficient) of the classifier are optimized by means of ABC(artificial bee colony) algorithm. The performance of the proposed classifier is quantified through yale and AT&T dataset widely used in the face recognition.

Reproducing Summarized Video Contents based on Camera Framing and Focus

  • Hyung Lee;E-Jung Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.85-92
    • /
    • 2023
  • In this paper, we propose a method for automatically generating story-based abbreviated summaries from long-form dramas and movies. From the shooting stage, the basic premise was to compose a frame with illusion of depth considering the golden division as well as focus on the object of interest to focus the viewer's attention in terms of content delivery. To consider how to extract the appropriate frames for this purpose, we utilized elemental techniques that have been utilized in previous work on scene and shot detection, as well as work on identifying focus-related blur. After converting the videos shared on YouTube to frame-by-frame, we divided them into a entire frame and three partial regions for feature extraction, and calculated the results of applying Laplacian operator and FFT to each region to choose the FFT with relative consistency and robustness. By comparing the calculated values for the entire frame with the calculated values for the three regions, the target frames were selected based on the condition that relatively sharp regions could be identified. Based on the selected results, the final frames were extracted by combining the results of an offline change point detection method to ensure the continuity of the frames within the shot, and an edit decision list was constructed to produce an abbreviated summary of 62.77% of the footage with F1-Score of 75.9%

Quantitative Evaluation of Super-resolution Drone Images Generated Using Deep Learning (딥러닝을 이용하여 생성한 초해상화 드론 영상의 정량적 평가)

  • Seo, Hong-Deok;So, Hyeong-Yoon;Kim, Eui-Myoung
    • Journal of Cadastre & Land InformatiX
    • /
    • v.53 no.2
    • /
    • pp.5-18
    • /
    • 2023
  • As the development of drones and sensors accelerates, new services and values are created by fusing data acquired from various sensors mounted on drone. However, the construction of spatial information through data fusion is mainly constructed depending on the image, and the quality of data is determined according to the specification and performance of the hardware. In addition, it is difficult to utilize it in the actual field because expensive equipment is required to construct spatial information of high-quality. In this study, super-resolution was performed by applying deep learning to low-resolution images acquired through RGB and THM cameras mounted on a drone, and quantitative evaluation and feature point extraction were performed on the generated high-resolution images. As a result of the experiment, the high-resolution image generated by super-resolution was maintained the characteristics of the original image, and as the resolution was improved, more features could be extracted compared to the original image. Therefore, when generating a high-resolution image by applying a low-resolution image to an super-resolution deep learning model, it is judged to be a new method to construct spatial information of high-quality without being restricted by hardware.