• Title/Summary/Keyword: instance segmentation

Search Result 73, Processing Time 0.028 seconds

A Study on Automatic Vehicle Extraction within Drone Image Bounding Box Using Unsupervised SVM Classification Technique (무감독 SVM 분류 기법을 통한 드론 영상 경계 박스 내 차량 자동 추출 연구)

  • Junho Yeom
    • Land and Housing Review
    • /
    • v.14 no.4
    • /
    • pp.95-102
    • /
    • 2023
  • Numerous investigations have explored the integration of machine leaning algorithms with high-resolution drone image for object detection in urban settings. However, a prevalent limitation in vehicle extraction studies involves the reliance on bounding boxes rather than instance segmentation. This limitation hinders the precise determination of vehicle direction and exact boundaries. Instance segmentation, while providing detailed object boundaries, necessitates labour intensive labelling for individual objects, prompting the need for research on automating unsupervised instance segmentation in vehicle extraction. In this study, a novel approach was proposed for vehicle extraction utilizing unsupervised SVM classification applied to vehicle bounding boxes in drone images. The method aims to address the challenges associated with bounding box-based approaches and provide a more accurate representation of vehicle boundaries. The study showed promising results, demonstrating an 89% accuracy in vehicle extraction. Notably, the proposed technique proved effective even when dealing with significant variations in spectral characteristics within the vehicles. This research contributes to advancing the field by offering a viable solution for automatic and unsupervised instance segmentation in the context of vehicle extraction from image.

Keypoint-based Deep Learning Approach for Building Footprint Extraction Using Aerial Images

  • Jeong, Doyoung;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.1
    • /
    • pp.111-122
    • /
    • 2021
  • Building footprint extraction is an active topic in the domain of remote sensing, since buildings are a fundamental unit of urban areas. Deep convolutional neural networks successfully perform footprint extraction from optical satellite images. However, semantic segmentation produces coarse results in the output, such as blurred and rounded boundaries, which are caused by the use of convolutional layers with large receptive fields and pooling layers. The objective of this study is to generate visually enhanced building objects by directly extracting the vertices of individual buildings by combining instance segmentation and keypoint detection. The target keypoints in building extraction are defined as points of interest based on the local image gradient direction, that is, the vertices of a building polygon. The proposed framework follows a two-stage, top-down approach that is divided into object detection and keypoint estimation. Keypoints between instances are distinguished by merging the rough segmentation masks and the local features of regions of interest. A building polygon is created by grouping the predicted keypoints through a simple geometric method. Our model achieved an F1-score of 0.650 with an mIoU of 62.6 for building footprint extraction using the OpenCitesAI dataset. The results demonstrated that the proposed framework using keypoint estimation exhibited better segmentation performance when compared with Mask R-CNN in terms of both qualitative and quantitative results.

A Comparative Study on Performance of Deep Learning Models for Vision-based Concrete Crack Detection according to Model Types (영상기반 콘크리트 균열 탐지 딥러닝 모델의 유형별 성능 비교)

  • Kim, Byunghyun;Kim, Geonsoon;Jin, Soomin;Cho, Soojin
    • Journal of the Korean Society of Safety
    • /
    • v.34 no.6
    • /
    • pp.50-57
    • /
    • 2019
  • In this study, various types of deep learning models that have been proposed recently are classified according to data input / output types and analyzed to find the deep learning model suitable for constructing a crack detection model. First the deep learning models are classified into image classification model, object segmentation model, object detection model, and instance segmentation model. ResNet-101, DeepLab V2, Faster R-CNN, and Mask R-CNN were selected as representative deep learning model of each type. For the comparison, ResNet-101 was implemented for all the types of deep learning model as a backbone network which serves as a main feature extractor. The four types of deep learning models were trained with 500 crack images taken from real concrete structures and collected from the Internet. The four types of deep learning models showed high accuracy above 94% during the training. Comparative evaluation was conducted using 40 images taken from real concrete structures. The performance of each type of deep learning model was measured using precision and recall. In the experimental result, Mask R-CNN, an instance segmentation deep learning model showed the highest precision and recall on crack detection. Qualitative analysis also shows that Mask R-CNN could detect crack shapes most similarly to the real crack shapes.

Semiconductor Process Inspection Using Mask R-CNN (Mask R-CNN을 활용한 반도체 공정 검사)

  • Han, Jung Hee;Hong, Sung Soo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.3
    • /
    • pp.12-18
    • /
    • 2020
  • In semiconductor manufacturing, defect detection is critical to maintain high yield. Currently, computer vision systems used in semiconductor photo lithography still have adopt to digital image processing algorithm, which often occur inspection faults due to sensitivity to external environment. Thus, we intend to handle this problem by means of using Mask R-CNN instead of digital image processing algorithm. Additionally, Mask R-CNN can be trained with image dataset pre-processed by means of the specific designed digital image filter to extract the enhanced feature map of Convolutional Neural Network (CNN). Our approach converged advantage of digital image processing and instance segmentation with deep learning yields more efficient semiconductor photo lithography inspection system than conventional system.

A Study on Automatic Classification of Characterized Ground Regions on Slopes by a Deep Learning based Image Segmentation (딥러닝 영상처리를 통한 비탈면의 지반 특성화 영역 자동 분류에 관한 연구)

  • Lee, Kyu Beom;Shin, Hyu-Soung;Kim, Seung Hyeon;Ha, Dae Mok;Choi, Isu
    • Tunnel and Underground Space
    • /
    • v.29 no.6
    • /
    • pp.508-522
    • /
    • 2019
  • Because of the slope failure, not only property damage but also human damage can occur, slope stability analysis should be conducted to predict and reinforce of the slope. This paper, defines the ground areas that can be characterized in terms of slope failure such as Rockmass jointset, Rockmass fault, Soil, Leakage water and Crush zone in sloped images. As a result, it was shown that the deep learning instance segmentation network can be used to recognize and automatically segment the precise shape of the ground region with different characteristics shown in the image. It showed the possibility of supporting the slope mapping work and automatically calculating the ground characteristics information of slopes necessary for decision making such as slope reinforcement.

A Suggestion for Worker Feature Extraction and Multiple-Object Tracking Method in Apartment Construction Sites (아파트 건설 현장 작업자 특징 추출 및 다중 객체 추적 방법 제안)

  • Kang, Kyung-Su;Cho, Young-Woon;Ryu, Han-Guk
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2021.05a
    • /
    • pp.40-41
    • /
    • 2021
  • The construction industry has the highest occupational accidents/injuries among all industries. Korean government installed surveillance camera systems at construction sites to reduce occupational accident rates. Construction safety managers are monitoring potential hazards at the sites through surveillance system; however, the human capability of monitoring surveillance system with their own eyes has critical issues. Therefore, this study proposed to build a deep learning-based safety monitoring system that can obtain information on the recognition, location, identification of workers and heavy equipment in the construction sites by applying multiple-object tracking with instance segmentation. To evaluate the system's performance, we utilized the MS COCO and MOT challenge metrics. These results present that it is optimal for efficiently automating monitoring surveillance system task at construction sites.

  • PDF

Improvement of Mask-RCNN Performance Using Deep-Learning-Based Arbitrary-Scale Super-Resolution Module (딥러닝 기반 임의적 스케일 초해상도 모듈을 이용한 Mask-RCNN 성능 향상)

  • Ahn, Young-Pill;Park, Hyun-Jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.381-388
    • /
    • 2022
  • In instance segmentation, Mask-RCNN is mostly used as a base model. Increasing the performance of Mask-RCNN is meaningful because it affects the performance of the derived model. Mask-RCNN has a transform module for unifying size of input images. In this paper, to improve the Mask-RCNN, we apply deep-learning-based ASSR to the resizing part in the transform module and inject calculated scale information into the model using IM(Integration Module). The proposed IM improves instance segmentation performance by 2.5 AP higher than Mask-RCNN in the COCO dataset, and in the periment for optimizing the IM location, the best performance was shown when it was located in the 'Top' before FPN and backbone were combined. Therefore, the proposed method can improve the performance of models using Mask-RCNN as a base model.

A Dataset of Ground Vehicle Targets from Satellite SAR Images and Its Application to Detection and Instance Segmentation (위성 SAR 영상의 지상차량 표적 데이터 셋 및 탐지와 객체분할로의 적용)

  • Park, Ji-Hoon;Choi, Yeo-Reum;Chae, Dae-Young;Lim, Ho;Yoo, Ji Hee
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.25 no.1
    • /
    • pp.30-44
    • /
    • 2022
  • The advent of deep learning-based algorithms has facilitated researches on target detection from synthetic aperture radar(SAR) imagery. While most of them concentrate on detection tasks for ships with open SAR ship datasets and for aircraft from SAR scenes of airports, there is relatively scarce researches on the detection of SAR ground vehicle targets where several adverse factors such as high false alarm rates, low signal-to-clutter ratios, and multiple targets in close proximity are predicted to degrade the performances. In this paper, a dataset of ground vehicle targets acquired from TerraSAR-X(TSX) satellite SAR images is presented. Then, both detection and instance segmentation are simultaneously carried out on this dataset based on the deep learning-based Mask R-CNN. Finally, this paper shows the future research directions to further improve the performances of detecting the SAR ground vehicle targets.

Data Augmentation for Tomato Detection and Pose Estimation (토마토 위치 및 자세 추정을 위한 데이터 증대기법)

  • Jang, Minho;Hwang, Youngbae
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.44-55
    • /
    • 2022
  • In order to automatically provide information on fruits in agricultural related broadcasting contents, instance image segmentation of target fruits is required. In addition, the information on the 3D pose of the corresponding fruit may be meaningfully used. This paper represents research that provides information about tomatoes in video content. A large amount of data is required to learn the instance segmentation, but it is difficult to obtain sufficient training data. Therefore, the training data is generated through a data augmentation technique based on a small amount of real images. Compared to the result using only the real images, it is shown that the detection performance is improved as a result of learning through the synthesized image created by separating the foreground and background. As a result of learning augmented images using images created using conventional image pre-processing techniques, it was shown that higher performance was obtained than synthetic images in which foreground and background were separated. To estimate the pose from the result of object detection, a point cloud was obtained using an RGB-D camera. Then, cylinder fitting based on least square minimization is performed, and the tomato pose is estimated through the axial direction of the cylinder. We show that the results of detection, instance image segmentation, and cylinder fitting of a target object effectively through various experiments.

Class-Agnostic 3D Mask Proposal and 2D-3D Visual Feature Ensemble for Efficient Open-Vocabulary 3D Instance Segmentation (효율적인 개방형 어휘 3차원 개체 분할을 위한 클래스-독립적인 3차원 마스크 제안과 2차원-3차원 시각적 특징 앙상블)

  • Sungho Song;Kyungmin Park;Incheol Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.7
    • /
    • pp.335-347
    • /
    • 2024
  • Open-vocabulary 3D point cloud instance segmentation (OV-3DIS) is a challenging visual task to segment a 3D scene point cloud into object instances of both base and novel classes. In this paper, we propose a novel model Open3DME for OV-3DIS to address important design issues and overcome limitations of the existing approaches. First, in order to improve the quality of class-agnostic 3D masks, our model makes use of T3DIS, an advanced Transformer-based 3D point cloud instance segmentation model, as mask proposal module. Second, in order to obtain semantically text-aligned visual features of each point cloud segment, our model extracts both 2D and 3D features from the point cloud and the corresponding multi-view RGB images by using pretrained CLIP and OpenSeg encoders respectively. Last, to effectively make use of both 2D and 3D visual features of each point cloud segment during label assignment, our model adopts a unique feature ensemble method. To validate our model, we conducted both quantitative and qualitative experiments on ScanNet-V2 benchmark dataset, demonstrating significant performance gains.