• Title/Summary/Keyword: YOLOv5

Search Result 169, Processing Time 0.019 seconds

Lunar Crater Detection using Deep-Learning (딥러닝을 이용한 달 크레이터 탐지)

  • Seo, Haingja;Kim, Dongyoung;Park, Sang-Min;Choi, Myungjin
    • Journal of Space Technology and Applications
    • /
    • v.1 no.1
    • /
    • pp.49-63
    • /
    • 2021
  • The exploration of the solar system is carried out through various payloads, and accordingly, many research results are emerging. We tried to apply deep-learning as a method of studying the bodies of solar system. Unlike Earth observation satellite data, the data of solar system differ greatly from celestial bodies to probes and to payloads of each probe. Therefore, it may be difficult to apply it to various data with the deep-learning model, but we expect that it will be able to reduce human errors or compensate for missing parts. We have implemented a model that detects craters on the lunar surface. A model was created using the Lunar Reconnaissance Orbiter Camera (LROC) image and the provided shapefile as input values, and applied to the lunar surface image. Although the result was not satisfactory, it will be applied to the image of the permanently shadow regions of the Moon, which is finally acquired by ShadowCam through image pre-processing and model modification. In addition, by attempting to apply it to Ceres and Mercury, which have similar the lunar surface, it is intended to suggest that deep-learning is another method for the study of the solar system.

A Study on the Dataset Construction and Model Application for Detecting Surgical Gauze in C-Arm Imaging Using Artificial Intelligence (인공지능을 활용한 C-Arm에서 수술용 거즈 검출을 위한 데이터셋 구축 및 검출모델 적용에 관한 연구)

  • Kim, Jin Yeop;Hwang, Ho Seong;Lee, Joo Byung;Choi, Yong Jin;Lee, Kang Seok;Kim, Ho Chul
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.4
    • /
    • pp.290-297
    • /
    • 2022
  • During surgery, Surgical instruments are often left behind due to accidents. Most of these are surgical gauze, so radioactive non-permeable gauze (X-ray gauze) is used for preventing of accidents which gauze is left in the body. This gauze is divided into wire and pad type. If it is confirmed that the gauze remains in the body, gauze must be detected by radiologist's reading by imaging using a mobile X-ray device. But most of operating rooms are not equipped with a mobile X-ray device, but equipped C-Arm equipment, which is of poorer quality than mobile X-ray equipment and furthermore it takes time to read them. In this study, Use C-Arm equipment to acquire gauze image for detection and Build dataset using artificial intelligence and select a detection model to Assist with the relatively low image quality and the reading of radiology specialists. mAP@50 and detection time are used as indicators for performance evaluation. The result is that two-class gauze detection dataset is more accurate and YOLOv5 model mAP@50 is 93.4% and detection time is 11.7 ms.

A Study on the Bleeding Detection Using Artificial Intelligence in Surgery Video (수술 동영상에서의 인공지능을 사용한 출혈 검출 연구)

  • Si Yeon Jeong;Young Jae Kim;Kwang Gi Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.3
    • /
    • pp.211-217
    • /
    • 2023
  • Recently, many studies have introduced artificial intelligence systems in the surgical process to reduce the incidence and mortality of complications in patients. Bleeding is a major cause of operative mortality and complications. However, there have been few studies conducted on detecting bleeding in surgical videos. To advance the development of deep learning models for detecting intraoperative hemorrhage, three models have been trained and compared; such as, YOLOv5, RetinaNet50, and RetinaNet101. We collected 1,016 bleeding images extracted from five surgical videos. The ground truths were labeled based on agreement from two specialists. To train and evaluate models, we divided the datasets into training data, validation data, and test data. For training, 812 images (80%) were selected from the dataset. Another 102 images (10%) were used for evaluation and the remaining 102 images (10%) were used as the evaluation data. The three main metrics used to evaluate performance are precision, recall, and false positive per image (FPPI). Based on the evaluation metrics, RetinaNet101 achieved the best detection results out of the three models (Precision rate of 0.99±0.01, Recall rate of 0.93±0.02, and FPPI of 0.01±0.01). The information on the bleeding detected in surgical videos can be quickly transmitted to the operating room, improving patient outcomes.

Object detection within the region of interest based on gaze estimation (응시점 추정 기반 관심 영역 내 객체 탐지)

  • Seok-Ho Han;Hoon-Seok Jang
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.3
    • /
    • pp.117-122
    • /
    • 2023
  • Gaze estimation, which automatically recognizes where a user is currently staring, and object detection based on estimated gaze point, can be a more accurate and efficient way to understand human visual behavior. in this paper, we propose a method to detect the objects within the region of interest around the gaze point. Specifically, after estimating the 3D gaze point, a region of interest based on the estimated gaze point is created to ensure that object detection occurs only within the region of interest. In our experiments, we compared the performance of general object detection, and the proposed object detection based on region of interest, and found that the processing time per frame was 1.4ms and 1.1ms, respectively, indicating that the proposed method was faster in terms of processing speed.

Automatic identification and analysis of multi-object cattle rumination based on computer vision

  • Yueming Wang;Tiantian Chen;Baoshan Li;Qi Li
    • Journal of Animal Science and Technology
    • /
    • v.65 no.3
    • /
    • pp.519-534
    • /
    • 2023
  • Rumination in cattle is closely related to their health, which makes the automatic monitoring of rumination an important part of smart pasture operations. However, manual monitoring of cattle rumination is laborious and wearable sensors are often harmful to animals. Thus, we propose a computer vision-based method to automatically identify multi-object cattle rumination, and to calculate the rumination time and number of chews for each cow. The heads of the cattle in the video were initially tracked with a multi-object tracking algorithm, which combined the You Only Look Once (YOLO) algorithm with the kernelized correlation filter (KCF). Images of the head of each cow were saved at a fixed size, and numbered. Then, a rumination recognition algorithm was constructed with parameters obtained using the frame difference method, and rumination time and number of chews were calculated. The rumination recognition algorithm was used to analyze the head image of each cow to automatically detect multi-object cattle rumination. To verify the feasibility of this method, the algorithm was tested on multi-object cattle rumination videos, and the results were compared with the results produced by human observation. The experimental results showed that the average error in rumination time was 5.902% and the average error in the number of chews was 8.126%. The rumination identification and calculation of rumination information only need to be performed by computers automatically with no manual intervention. It could provide a new contactless rumination identification method for multi-cattle, which provided technical support for smart pasture.

Estimating vegetation index for outdoor free-range pig production using YOLO

  • Sang-Hyon Oh;Hee-Mun Park;Jin-Hyun Park
    • Journal of Animal Science and Technology
    • /
    • v.65 no.3
    • /
    • pp.638-651
    • /
    • 2023
  • The objective of this study was to quantitatively estimate the level of grazing area damage in outdoor free-range pig production using a Unmanned Aerial Vehicles (UAV) with an RGB image sensor. Ten corn field images were captured by a UAV over approximately two weeks, during which gestating sows were allowed to graze freely on the corn field measuring 100 × 50 m2. The images were corrected to a bird's-eye view, and then divided into 32 segments and sequentially inputted into the YOLOv4 detector to detect the corn images according to their condition. The 43 raw training images selected randomly out of 320 segmented images were flipped to create 86 images, and then these images were further augmented by rotating them in 5-degree increments to create a total of 6,192 images. The increased 6,192 images are further augmented by applying three random color transformations to each image, resulting in 24,768 datasets. The occupancy rate of corn in the field was estimated efficiently using You Only Look Once (YOLO). As of the first day of observation (day 2), it was evident that almost all the corn had disappeared by the ninth day. When grazing 20 sows in a 50 × 100 m2 cornfield (250 m2/sow), it appears that the animals should be rotated to other grazing areas to protect the cover crop after at least five days. In agricultural technology, most of the research using machine and deep learning is related to the detection of fruits and pests, and research on other application fields is needed. In addition, large-scale image data collected by experts in the field are required as training data to apply deep learning. If the data required for deep learning is insufficient, a large number of data augmentation is required.

Dr. Vegetable: an AI-based Mobile Application for Diagnosis of Plant Diseases and Insect Pests (농작물 병해충 진단을 위한 인공지능 앱, Dr. Vegetable)

  • Soohwan Kim;DaeKy Jeong;SeungJun Lee;SungYeob Jung;DongJae Yang;GeunyEong Jeong;Suk-Hyung Hwang;Sewoong Hwang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.457-460
    • /
    • 2023
  • 본 연구는 시설작물의 병충해 진단을 위해 딥러닝 모델을 응용한 인공지능 서비스 앱, Dr. Vegetable을 제안하고자 한다. 농업 현장에서 숙련된 농부는 한눈에 농작물의 병충해를 판단할 수 있지만 미숙련된 농부는 병충해 피해를 발견하더라도 그 종류와 해결 방법을 찾아내기가 매우 어렵다. 또한 아무리 숙련된 농부라고 할지라도 육안검사만으로 병충해를 조기에 발견하는 것은 쉽지 않다. 한편 시설작물의 경우 병충해에 의한 연쇄피해가 발생할 우려가 있으므로 병충해의 조기 발견 및 방제가 매우 중요하다. 즉, 농부의 경험에 따른 농작물 병해충 진단은 정확성을 장담할 수 없으며 비용과 시간적인 측면에서 위험성이 높다고 할 수 있다. 본 논문에서는 YOLOv5를 활용하여 상추, 고추, 토마토 등 농작물의 병충해를 진단하는 인공지능 서비스를 제안한다. 특히 한국지능정보사회진흥원이 운영하고 있는 AI 통합 플랫폼인 AI 허브에서 제공하는 노지 작물 질병 및 해충 진단 이미지를 사용하여 딥러닝 모델을 학습하였다. 본 연구를 통해 개발된 모바일 어플리케이션을 이용하여 실제 시설농장에서 병충해 진단 서비스를 적용한 결과 약 86%의 정확도, F1 Score 0.84, 그리고 0.98의 mAP 값을 얻을 수 있었다. 본 연구에서 개발한 병충해 진단 딥러닝 모델을 다양한 조도에서 강인하게 동작하도록 개선한다면 농업 현장에서 널리 활용될 수 있을 것으로 기대한다.

  • PDF

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

  • Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.89-106
    • /
    • 2022
  • Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.

Developing an Occupants Count Methodology in Buildings Using Virtual Lines of Interest in a Multi-Camera Network (다중 카메라 네트워크 가상의 관심선(Line of Interest)을 활용한 건물 내 재실자 인원 계수 방법론 개발)

  • Chun, Hwikyung;Park, Chanhyuk;Chi, Seokho;Roh, Myungil;Susilawati, Connie
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.667-674
    • /
    • 2023
  • In the event of a disaster occurring within a building, the prompt and efficient evacuation and rescue of occupants within the building becomes the foremost priority to minimize casualties. For the purpose of such rescue operations, it is essential to ascertain the distribution of individuals within the building. Nevertheless, there is a primary dependence on accounts provided by pertinent individuals like building proprietors or security staff, alongside fundamental data encompassing floor dimensions and maximum capacity. Consequently, accurate determination of the number of occupants within the building holds paramount significance in reducing uncertainties at the site and facilitating effective rescue activities during the golden hour. This research introduces a methodology employing computer vision algorithms to count the number of occupants within distinct building locations based on images captured by installed multiple CCTV cameras. The counting methodology consists of three stages: (1) establishing virtual Lines of Interest (LOI) for each camera to construct a multi-camera network environment, (2) detecting and tracking people within the monitoring area using deep learning, and (3) aggregating counts across the multi-camera network. The proposed methodology was validated through experiments conducted in a five-story building with the average accurary of 89.9% and the average MAE of 0.178 and RMSE of 0.339, and the advantages of using multiple cameras for occupant counting were explained. This paper showed the potential of the proposed methodology for more effective and timely disaster management through common surveillance systems by providing prompt occupancy information.