Search | Korea Science

Multiple Camera-Based Real-Time Long Queue Vision Algorithm for Public Safety and Efficiency

Tae-hoon Kim;Ji-young Na;Ji-won Yoon;Se-Hun Lee;Jun-ho Ahn
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.10
- /
- pp.47-57
- /
- 2024
This paper proposes a system to efficiently manage delays caused by unmanaged and congested queues in crowded environments. Such queues not only cause inconvenience but also pose safety risks. Existing systems, relying on single-camera feeds, are inadequate for complex scenarios requiring multiple cameras. To address this, we developed a multi-vision long queue detection system that integrates multiple vision algorithms to accurately detect various types of queues. The algorithm processes real-time video data from multiple cameras, stitching overlapping segments into a single panoramic image. By combining object detection, tracking, and position variation analysis, the system recognizes long queues in crowded environments. The algorithm was validated with 96% accuracy and a 92% F1-score across diverse settings.
https://doi.org/10.9708/jksci.2024.29.10.047 인용 PDF

Determining Method of Factors for Effective Real Time Background Modeling (효과적인 실시간 배경 모델링을 위한 환경 변수 결정 방법)

Lee, Jun-Cheol;Ryu, Sang-Ryul;Kang, Sung-Hwan;Kim, Sung-Ho
- Journal of KIISE:Software and Applications
- /
- v.34 no.1
- /
- pp.59-69
- /
- 2007
In the video with a various environment, background modeling is important for extraction and recognition the moving object. For this object recognition, many methods of the background modeling are proposed in a process of preprocess. Among these there is a Kumar method which represents the Queue-based background modeling. Because this has a fixed period of updating examination of the frame, there is a limit for various system. This paper use a background modeling based on the queue. We propose the method that major parameters are decided as adaptive by background model. They are the queue size of the sliding window, the sire of grouping by the brightness of the visual and the period of updating examination of the frame. In order to determine the factors, in every process, RCO (Ratio of Correct Object), REO (Ratio of Error Object) and UR (Update Ratio) are considered to be the standard of evaluation. The proposed method can improve the existing techniques of the background modeling which is unfit for the real-time processing and recognize the object more efficient.
PDF KSCI

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
- Journal of Intelligence and Information Systems
- /
- v.26 no.2
- /
- pp.1-25
- /
- 2020
In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.
https://doi.org/10.13088/jiis.2020.26.2.001 인용 PDF KSCI

Training Sample of Artificial Neural Networks for Predicting Signalized Intersection Queue Length (신호교차로 대기행렬 예측을 위한 인공신경망의 학습자료 구성분석)

한종학;김성호;최병국
- Journal of Korean Society of Transportation
- /
- v.18 no.4
- /
- pp.75-85
- /
- 2000
The Purpose of this study is to analyze wether the composition of training sample have a relation with the Predictive ability and the learning results of ANNs(Artificial Neural Networks) fur predicting one cycle ahead of the queue length(veh.) in a signalized intersection. In this study, ANNs\` training sample is classified into the assumption of two cases. The first is to utilize time-series(Per cycle) data of queue length which would be detected by one detector (loop or video) The second is to use time-space correlated data(such as: a upstream feed-in flow, a link travel time, a approach maximum stationary queue length, a departure volume) which would be detected by a integrative vehicle detection systems (loop detector, video detector, RFIDs) which would be installed between the upstream node(intersection) and downstream node. The major findings from this paper is In Daechi Intersection(GangNamGu, Seoul), in the case of ANNs\` training sample constructed by time-space correlated data between the upstream node(intersection) and downstream node, the pattern recognition ability of an interrupted traffic flow is better.
PDF

Speech Recognition based Message Transmission System for the Hearing Impaired Persons (청각장애인을 위한 음성인식 기반 메시지 전송 시스템)

Kim, Sung-jin;Cho, Kyoung-woo;Oh, Chang-heon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.22 no.12
- /
- pp.1604-1610
- /
- 2018
The speech recognition service is used as an ancillary means of communication by converting and visualizing the speaker's voice into text to the hearing impaired persons. However, in open environments such as classrooms and conference rooms it is difficult to provide speech recognition service to many hearing impaired persons. For this, a method is needed to efficiently provide it according to the surrounding environment. In this paper, we propose a system that recognizes the speaker's voice and transmits the converted text to many hearing impaired persons as messages. The proposed system uses the MQTT protocol to deliver messages to many users at the same time. The end-to-end delay was measured to confirm the service delay of the proposed system according to the QoS level setting of the MQTT protocol. As a result of the measurement, the delay between the most reliable Qos level 2 and 0 is 111ms, confirming that it does not have a great influence on conversation recognition.
https://doi.org/10.6109/jkiice.2018.22.12.1604 인용 PDF KSCI HTML

Improving Multi-DNN Computational Performance of Embedded Multicore Processors through a Global Queue (글로벌 큐를 통한 임베디드 멀티코어 프로세서의 멀티 DNN 연산 성능 향상)

Cho, Ho-jin;Kim, Myung-sun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.24 no.6
- /
- pp.714-721
- /
- 2020
DNN is expanding its use in embedded systems such as robots and autonomous vehicles. For high recognition accuracy, computational complexity is greatly increased, and multiple DNNs are running aperiodically. Therefore, the ability processing multiple DNNs in embedded environments is a crucial issue. Accordingly, multicore based platforms are being released. However, most DNN models are operated in a batch process, and when multiple DNNs are operated in multicore together, the execution time deviation between each DNN may be large and the end-to-end execution time of the whole DNNs could be long depending on how they are allocated to the cores. In this paper, we solve these problems by providing a framework that decompose each DNN into individual layers and then distribute to multicores through a global queue. As a result of the experiment, the total DNN execution time was reduced by 31%, and when operating multiple identical DNNs, the deviation in execution time was reduced by up to 95.1%.
https://doi.org/10.6109/jkiice.2020.24.6.714 인용 PDF KSCI

Recognition Model of the Vehicle Type usig Clustering Methods (클러스터링 방법을 이용한 차종인식 모형)

Jo, Hyeong-Gi;Min, Jun-Yeong;Choe, Jong-Uk
- The Transactions of the Korea Information Processing Society
- /
- v.3 no.2
- /
- pp.369-380
- /
- 1996
Inductive Loop Detector(ILD) has been commonly used in collecting traffic data such as occupancy time and non-occupancy time. From the data, the traffic volume and type of passing vehicle is calculated. To provide reliable data for traffic control and plan, accuracy is required in type recognition which can be utilized to determine split of traffic signal and to provide forecasting data of queue-length for over-saturation control. In this research, a new recognition model issuggested for recognizing typeof vehicle from thecollected data obtained through ILD systems. Two clustering methods, based on statistical algorithms, and one neural network clustering method were employed to test the reliability and occuracy for the methods. In a series of experiments, it was found that the new model can greatly enhance the reliability and accuracy of type recongition rate, much higher than conventional approa-ches. The model modifies the neural network clustering method and enhances the recongition accuracy by iteratively applying the algorithm until no more unclustered data remains.
PDF

Design and Implementation of Finger Keyboard with Video Camera (비디오 카메라를 이용한 핑거 키보드의 설계 및 구현)

Hwang, Kitae
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.16 no.5
- /
- pp.157-163
- /
- 2016
This paper presents Finger Keyboard which detects the user's key types on a keyboard drawn on the paper using a video camera. The Finger Keyboard software was written in standard C/C++ language and thus easy to port to other computing environments. We installed a popular USB-type web camera on a Windows PC and implemented the Finger Keyboard as a Windows application which detects key typing and then injects the key code into the message queue of the Windows operating system. Also we implemented the Finger Keyboard on the Raspberry Pi 2 embedded computer with a dedicated camera and connected it to the Android device as an external keyboard through the Bluetooth. The result of experiments showed that the average ratio of recognition success is around 80% at the typing speed of 120 characters per minute.
https://doi.org/10.7236/JIIBC.2016.16.5.157 인용 PDF KSCI

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
- Journal of Intelligence and Information Systems
- /
- v.28 no.1
- /
- pp.89-106
- /
- 2022
Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.
https://doi.org/10.13088/jiis.2022.28.1.089 인용 PDF KSCI

Search Result 9, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)