• Title/Summary/Keyword: Object Recognition Technology

Search Result 471, Processing Time 0.026 seconds

Semantic Image Segmentation for Efficiently Adding Recognition Objects

  • Lu, Chengnan;Park, Jinho
    • Journal of Information Processing Systems
    • /
    • v.18 no.5
    • /
    • pp.701-710
    • /
    • 2022
  • With the development of artificial intelligence technology, various methods have been developed for recognizing objects in images using machine learning. Image segmentation is the most effective among these methods for recognizing objects within an image. Conventionally, image datasets of various classes are trained simultaneously. In situations where several classes require segmentation, all datasets have to be trained thoroughly. Such repeated training results in low training efficiency because most of the classes have already been trained. In addition, the number of classes that appear in the datasets affects training. Some classes appear in datasets in remarkably smaller numbers than others, and hence, the training errors will not be properly reflected when all the classes are trained simultaneously. Therefore, a new method that separates some classes from the dataset is proposed to improve efficiency during training. In addition, the accuracies of the conventional and proposed methods are compared.

A Basic Study on the Instance Segmentation with Surveillance Cameras at Construction Sties using Deep Learning based Computer Vision (건설 현장 CCTV 영상에서 딥러닝을 이용한 사물 인식 기초 연구)

  • Kang, Kyung-Su;Cho, Young-Woon;Ryu, Han-Guk
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2020.11a
    • /
    • pp.55-56
    • /
    • 2020
  • The construction industry has the highest occupational fatality and injury rates related to accidents of any industry. Accordingly, safety managers closely monitor to prevent accidents in real-time by installing surveillance cameras at construction sites. However, due to human cognitive ability limitations, it is impossible to monitor many videos simultaneously, and the fatigue of the person monitoring surveillance cameras is also very high. Thus, to help safety managers monitor work and reduce the occupational accident rate, a study on object recognition in construction sites was conducted through surveillance cameras. In this study, we applied to the instance segmentation to identify the classification and location of objects and extract the size and shape of objects in construction sites. This research considers ways in which deep learning-based computer vision technology can be applied to safety management on a construction site.

  • PDF

Using Ensemble Learning Algorithm and AI Facial Expression Recognition, Healing Service Tailored to User's Emotion (앙상블 학습 알고리즘과 인공지능 표정 인식 기술을 활용한 사용자 감정 맞춤 힐링 서비스)

  • Yang, seong-yeon;Hong, Dahye;Moon, Jaehyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.818-820
    • /
    • 2022
  • The keyword 'healing' is essential to the competitive society and culture of Koreans. In addition, as the time at home increases due to COVID-19, the demand for indoor healing services has increased. Therefore, this thesis analyzes the user's facial expression so that people can receive various 'customized' healing services indoors, and based on this, provides lighting, ASMR, video recommendation service, and facial expression recording service.The user's expression was analyzed by applying the ensemble algorithm to the expression prediction results of various CNN models after extracting only the face through object detection from the image taken by the user.

Proposal of 3D Camera-Based Digital Coordinate Recognition Technology (3D 카메라 기반 디지털 좌표 인식 기술 제안)

  • Koh, Jun-Young;Lee, Kang-Hee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.229-230
    • /
    • 2022
  • 본 논문에서는 CNN Object Detection과 더불어 3D 카메라 기반 디지털 좌표 인식 기술을 제안한다. 이 기술은 3D Depth Camera인 Intel 사의 Realsense D455를 이용해 대상을 감지하고 분류하며 대상의 위치를 파악한다. 또한 이 기술은 기존의 Depth Camera 내장 거리와는 다르게 좌표를 인식하여 좌표간의 거리까지 계산이 가능하다. 또한 Tensorflow SSD 구조와의 메모리 공유를 통해 시스템의 자원 낭비를 줄이며, 속도를 높이는 멀티쓰레드를 탑재했다. 본 기술을 통해 좌표간의 거리를 계산함으로써 스포츠, 심리, 놀이, 산업 등 다양한 환경에서 활용할 수 있다.

  • PDF

Estimation of fruit number of apple tree based on YOLOv5 and regression model (YOLOv5 및 다항 회귀 모델을 활용한 사과나무의 착과량 예측 방법)

  • Hee-Jin Gwak;Yunju Jeong;Ik-Jo Chun;Cheol-Hee Lee
    • Journal of IKEEE
    • /
    • v.28 no.2
    • /
    • pp.150-157
    • /
    • 2024
  • In this paper, we propose a novel algorithm for predicting the number of apples on an apple tree using a deep learning-based object detection model and a polynomial regression model. Measuring the number of apples on an apple tree can be used to predict apple yield and to assess losses for determining agricultural disaster insurance payouts. To measure apple fruit load, we photographed the front and back sides of apple trees. We manually labeled the apples in the captured images to construct a dataset, which was then used to train a one-stage object detection CNN model. However, when apples on an apple tree are obscured by leaves, branches, or other parts of the tree, they may not be captured in images. Consequently, it becomes difficult for image recognition-based deep learning models to detect or infer the presence of these apples. To address this issue, we propose a two-stage inference process. In the first stage, we utilize an image-based deep learning model to count the number of apples in photos taken from both sides of the apple tree. In the second stage, we conduct a polynomial regression analysis, using the total apple count from the deep learning model as the independent variable, and the actual number of apples manually counted during an on-site visit to the orchard as the dependent variable. The performance evaluation of the two-stage inference system proposed in this paper showed an average accuracy of 90.98% in counting the number of apples on each apple tree. Therefore, the proposed method can significantly reduce the time and cost associated with manually counting apples. Furthermore, this approach has the potential to be widely adopted as a new foundational technology for fruit load estimation in related fields using deep learning.

Development of a Voice-activated Map Information Retrieval System based on MFC (MFC 기반 음성구동 수치지도정보 검색시스템의 구현)

  • Kim, Nag-Cheol;Kim, Tae-Soo;Jo, Myung-Hee;Chung, Hyun-Yeol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.3 no.1
    • /
    • pp.69-77
    • /
    • 2000
  • In retrieving and analyzing digital map information using mouse or key strokes, it needs several times of repeated mouse operation for designating the range of study area. In this study, we proposed a voice activated map information retrieval system for eliminating such repetitions and we realized the system on the personal computer. The system was constructed in two ways - traditional OLE(object linking embedding) method and MFC(Microsoft fundamental class) method in controlling of window display for practical use. In the system performance evaluation, the retrieval data for digital map were consisted of 68 words uttered by 3 male persons which include attribute words and control words for Susung-gu area of Taegu city in a 1:5,000 map. As the results, we obtained the average 98.02% of recognition rate through on-line tests in the office environment and the operating speed of 5.39 seconds by OLE, 10.38 seconds by MFC. These results showed the possibility for practical use of information retrieval system using speech recognition in digital map.

  • PDF

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Recent Technologies for the Acquisition and Processing of 3D Images Based on Deep Learning (딥러닝기반 입체 영상의 획득 및 처리 기술 동향)

  • Yoon, M.S.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.5
    • /
    • pp.112-122
    • /
    • 2020
  • In 3D computer graphics, a depth map is an image that provides information related to the distance from the viewpoint to the subject's surface. Stereo sensors, depth cameras, and imaging systems using an active illumination system and a time-resolved detector can perform accurate depth measurements with their own light sources. The 3D image information obtained through the depth map is useful in 3D modeling, autonomous vehicle navigation, object recognition and remote gesture detection, resolution-enhanced medical images, aviation and defense technology, and robotics. In addition, the depth map information is important data used for extracting and restoring multi-view images, and extracting phase information required for digital hologram synthesis. This study is oriented toward a recent research trend in deep learning-based 3D data analysis methods and depth map information extraction technology using a convolutional neural network. Further, the study focuses on 3D image processing technology related to digital hologram and multi-view image extraction/reconstruction, which are becoming more popular as the computing power of hardware rapidly increases.

Object Recognition utilizing Complementary Feature-point-based descriptor containing color information (컬러 정보를 포함하는 보완적 특징점 기반 기술자를 활용한 객체인식)

  • Jang, Young-Kyoon;Kim, Ju-Whan;Moon, Seung-Geon;Nam, Tek-Jin;Kwon, Dong-Soo;Woo, Woon-Tack
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.341-343
    • /
    • 2012
  • 본 논문에서는 기존의 특징점 기반 객체 인식 방법의 확장으로 보완적 특징점 기반의 컬러 정보를 포함하는 기술자를 활용하는 객체 인식 방법을 제안한다. 제안하는 방법은 무늬가 적은 객체에서도 에지의 위치를 샘플링함으로써 보완적 특징점을 생성해 낸다. 그리고 검출된 보완적 특징점으로부터 얻어지는 그레이 값 변화도방향 정보와 컬러 정보를 가지고 있는 기술자를 생성한다. 그리고 생성된 기술자를 객체 단위로 묶어 낼 수 있도록 하는 코드북(Codebook)을 학습함으로써 각 객체를 구분해 낼 수 있는 강건한 히스토그램를 생성한다. 생성된 코드북을 활용함으로써 제안하는 방법은 객체의 크기 및 환경 변화, 3차원 회전의 경우에도 기존의 방법보다 강건하게 인식한다. 실험 결과 제안하는 방법은 75.8% 인식률을 보이는 것을 확인하였다. 이 방법은 증강현실 응용에 정보 제시를 위해 가장 먼저 이루어지는 핵심 기술로써 활용될 수 있다.

Hazy Particle Map-based Automated Fog Removal Method with Haziness Degree Evaluator Applied (Haziness Degree Evaluator를 적용한 Hazy Particle Map 기반 자동화 안개 제거 방법)

  • Sim, Hwi Bo;Kang, Bong Soon
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.9
    • /
    • pp.1266-1272
    • /
    • 2022
  • With the recent development of computer vision technology, image processing-based mechanical devices are being developed to realize autonomous driving. The camera-taken images of image processing-based machines are invisible due to scattering and absorption of light in foggy conditions. This lowers the object recognition rate and causes malfunction. The safety of the technology is very important because the malfunction of autonomous driving leads to human casualties. In order to increase the stability of the technology, it is necessary to apply an efficient haze removal algorithm to the camera. In the conventional haze removal method, since the haze removal operation is performed regardless of the haze concentration of the input image, excessive haze is removed and the quality of the resulting image is deteriorated. In this paper, we propose an automatic haze removal method that removes haze according to the haze density of the input image by applying Ngo's Haziness Degree Evaluator (HDE) to Kim's haze removal algorithm using Hazy Particle Map. The proposed haze removal method removes the haze according to the haze concentration of the input image, thereby preventing the quality degradation of the input image that does not require haze removal and solving the problem of excessive haze removal. The superiority of the proposed haze removal method is verified through qualitative and quantitative evaluation.