• Title/Summary/Keyword: the object-based attention

Search Result 215, Processing Time 0.021 seconds

Integration of Multi-scale CAM and Attention for Weakly Supervised Defects Localization on Surface Defective Apple

  • Nguyen Bui Ngoc Han;Ju Hwan Lee;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.45-59
    • /
    • 2023
  • Weakly supervised object localization (WSOL) is a task of localizing an object in an image using only image-level labels. Previous studies have followed the conventional class activation mapping (CAM) pipeline. However, we reveal the current CAM approach suffers from problems which cause original CAM could not capture the complete defects features. This work utilizes a convolutional neural network (CNN) pretrained on image-level labels to generate class activation maps in a multi-scale manner to highlight discriminative regions. Additionally, a vision transformer (ViT) pretrained was treated to produce multi-head attention maps as an auxiliary detector. By integrating the CNN-based CAMs and attention maps, our approach localizes defective regions without requiring bounding box or pixel-level supervision during training. We evaluate our approach on a dataset of apple images with only image-level labels of defect categories. Experiments demonstrate our proposed method aligns with several Object Detection models performance, hold a promise for improving localization.

Attention based Feature-Fusion Network for 3D Object Detection (3차원 객체 탐지를 위한 어텐션 기반 특징 융합 네트워크)

  • Sang-Hyun Ryoo;Dae-Yeol Kang;Seung-Jun Hwang;Sung-Jun Park;Joong-Hwan Baek
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.2
    • /
    • pp.190-196
    • /
    • 2023
  • Recently, following the development of LIDAR technology which can detect distance from the object, the interest for LIDAR based 3D object detection network is getting higher. Previous networks generate inaccurate localization results due to spatial information loss during voxelization and downsampling. In this study, we propose an attention-based convergence method and a camera-LIDAR convergence system to acquire high-level features and high positional accuracy. First, by introducing the attention method into the Voxel-RCNN structure, which is a grid-based 3D object detection network, the multi-scale sparse 3D convolution feature is effectively fused to improve the performance of 3D object detection. Additionally, we propose the late-fusion mechanism for fusing outcomes in 3D object detection network and 2D object detection network to delete false positive. Comparative experiments with existing algorithms are performed using the KITTI data set, which is widely used in the field of autonomous driving. The proposed method showed performance improvement in both 2D object detection on BEV and 3D object detection. In particular, the precision was improved by about 0.54% for the car moderate class compared to Voxel-RCNN.

Object of Interest Extraction Using Gabor Filters (가버 필터에 기반한 관심 객체 검출)

  • Kim, Sung-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.87-94
    • /
    • 2008
  • In this paper, an extraction method of objects of interest in the color images is proposed. It is possible to extract objects of interest from a complex background without any prior-knowledge based on the proposed method. For object extraction, Gator images that contain information of object location, are created by using Gator filter. Based on the images the initial location of attention windows is determined, from which image features are selected to extract objects. To extract object, I modify the previous method partially and apply the modified method. To evaluate the performance of propsed method, precision, recall and F-measure are calculated between the extraction results from propsed method and manually extracted results. I verify the performance of the proposed methods based on these accuracies. Also through comparison of the results with the existing method, I verily the superiority of the proposed method over the existing method.

  • PDF

Effects of Sensory Integration Therapy and Home-Based Sensory Integration on Visual Attention in Children with Down Syndrome (감각통합치료와 가정프로그램 중재병행이 다운증후군 아동의 시각적 주의력에 미치는 효과: 단일사례연구)

  • Son, Ji-Won;Lee, Hye-Rim
    • The Journal of Korean Academy of Sensory Integration
    • /
    • v.21 no.2
    • /
    • pp.12-23
    • /
    • 2023
  • Objective : The purpose of this study was to investigate the effect of sensory integration therapy and home program intervention on the visual attention of children with Down syndrome. Methods : This study used a single subject design for one child with Down syndrome, Sensory integrated treatment intervention was conducted once a week for 16 weeks, and home programs were conducted four times a week for 16 weeks. Changes in visual attention were measured after intervention in the target child. Results : After the intervention, the average values of object gaze, object horizontal pursuit, and object vertical pursuit time increased compared with the baseline period. In object gaze, object horizontal pursuit, and object vertical pursuit, a section higher than the ±2 standard deviation of the baseline period was observed during the intervention period. Conclusion : This study confirmed that the combination of sensory integrated therapy and home program intervention improved visual attention and visual perception in children with Down syndrome, and it is meaningful that it presented an effective intervention method.

Small Marker Detection with Attention Model in Robotic Applications (로봇시스템에서 작은 마커 인식을 하기 위한 사물 감지 어텐션 모델)

  • Kim, Minjae;Moon, Hyungpil
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.4
    • /
    • pp.425-430
    • /
    • 2022
  • As robots are considered one of the mainstream digital transformations, robots with machine vision becomes a main area of study providing the ability to check what robots watch and make decisions based on it. However, it is difficult to find a small object in the image mainly due to the flaw of the most of visual recognition networks. Because visual recognition networks are mostly convolution neural network which usually consider local features. So, we make a model considering not only local feature, but also global feature. In this paper, we propose a detection method of a small marker on the object using deep learning and an algorithm that considers global features by combining Transformer's self-attention technique with a convolutional neural network. We suggest a self-attention model with new definition of Query, Key and Value for model to learn global feature and simplified equation by getting rid of position vector and classification token which cause the model to be heavy and slow. Finally, we show that our model achieves higher mAP than state of the art model YOLOr.

A Survey on Vision Transformers for Object Detection Task (객체 탐지 과업에서의 트랜스포머 기반 모델의 특장점 분석 연구)

  • Jungmin, Ha;Hyunjong, Lee;Jungmin, Eom;Jaekoo, Lee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.6
    • /
    • pp.319-327
    • /
    • 2022
  • Transformers are the most famous deep learning models that has achieved great success in natural language processing and also showed good performance on computer vision. In this survey, we categorized transformer-based models for computer vision, particularly object detection tasks and perform comprehensive comparative experiments to understand the characteristics of each model. Next, we evaluated the models subdivided into standard transformer, with key point attention, and adding attention with coordinates by performance comparison in terms of object detection accuracy and real-time performance. For performance comparison, we used two metrics: frame per second (FPS) and mean average precision (mAP). Finally, we confirmed the trends and relationships related to the detection and real-time performance of objects in several transformer models using various experiments.

An Implementation of Noise-Tolerant Context-free Attention Operator and its Application to Efficient Multi-Object Detection (잡음에 강건한 주목 연산자의 구현과 효과적인 다중 물체 검출)

  • Park, Chang-Jun;Jo, Sang-Hyeon;Choe, Heung-Mun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.1
    • /
    • pp.89-96
    • /
    • 2001
  • In this paper, a noise-tolerant generalized symmetry transform(NTGST) is proposed and implemented as a context-free attention operator for efficient detection of multi-object. In contrast to the conventional context-free attention operator based on the GST in which only the magnitude and the symmetry of the pixel pairs are taken into account, the proposed NTGST additionally takes into account the convergence and the divergence of the radial orientation of the intensity gradient of the pixel pair. Thus, the proposed attention operator can easily detect multiple objects out of the noisy and complex backgrounded image. Experiments are conducted on various synthetic and real images, and the proposed NTGST is proved to be effective in multi-object detection from the noisy and complex backgrounds.

  • PDF

A Dual-Structured Self-Attention for improving the Performance of Vision Transformers (비전 트랜스포머 성능향상을 위한 이중 구조 셀프 어텐션)

  • Kwang-Yeob Lee;Hwang-Hee Moon;Tae-Ryong Park
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.251-257
    • /
    • 2023
  • In this paper, we propose a dual-structured self-attention method that improves the lack of regional features of the vision transformer's self-attention. Vision Transformers, which are more computationally efficient than convolutional neural networks in object classification, object segmentation, and video image recognition, lack the ability to extract regional features relatively. To solve this problem, many studies are conducted based on Windows or Shift Windows, but these methods weaken the advantages of self-attention-based transformers by increasing computational complexity using multiple levels of encoders. This paper proposes a dual-structure self-attention using self-attention and neighborhood network to improve locality inductive bias compared to the existing method. The neighborhood network for extracting local context information provides a much simpler computational complexity than the window structure. CIFAR-10 and CIFAR-100 were used to compare the performance of the proposed dual-structure self-attention transformer and the existing transformer, and the experiment showed improvements of 0.63% and 1.57% in Top-1 accuracy, respectively.

Traffic Sign Area Detection System Based on Color Processing Mechanism of Human (인간의 색상처리방식에 기반한 교통 표지판 영역 추출 시스템)

  • Cheoi, Kyung-Joo;Park, Min-Chul
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.2
    • /
    • pp.63-72
    • /
    • 2007
  • The traffic sign on the road should be easy to distinguishable even from far, and should be recognized in a short time. As traffic sign is a very important object which provides important information for the drivers to enhance safety, it has to attract human's attention among any other objects on the road. This paper proposes a new method of detecting the area of traffic sign, which uses attention module on the assumption that we attention our gaze on the traffic sign at first among other objects when we drive a car. In this paper, we analyze the previous studies of psycophysical and physiological results to get what kind of features are used in the process of human's object recognition, especially color processing, and with these results we detected the area of traffic sign. Various kinds of traffic sign images were tested, and the results showed good quality(average 97.8% success).

Real Time Hornet Classification System Based on Deep Learning (딥러닝을 이용한 실시간 말벌 분류 시스템)

  • Jeong, Yunju;Lee, Yeung-Hak;Ansari, Israfil;Lee, Cheol-Hee
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1141-1147
    • /
    • 2020
  • The hornet species are so similar in shape that they are difficult for non-experts to classify, and because the size of the objects is small and move fast, it is more difficult to detect and classify the species in real time. In this paper, we developed a system that classifies hornets species in real time based on a deep learning algorithm using a boundary box. In order to minimize the background area included in the bounding box when labeling the training image, we propose a method of selecting only the head and body of the hornet. It also experimentally compares existing boundary box-based object recognition algorithms to find the best algorithms that can detect wasps in real time and classify their species. As a result of the experiment, when the mish function was applied as the activation function of the convolution layer and the hornet images were tested using the YOLOv4 model with the Spatial Attention Module (SAM) applied before the object detection block, the average precision was 97.89% and the average recall was 98.69%.