• Title/Summary/Keyword: Spatial attention module

Search Result 19, Processing Time 0.209 seconds

Design and Implementation of HRNet Model Combined with Spatial Information Attention Module of Polarized Self-attention (편광 셀프어텐션의 공간정보 강조 모듈을 결합한 HRNet 모델 설계 및 구현)

  • Jin-Seong Kim;Jun Park;Se-Hoon Jung;Chun-Bo Sim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.485-487
    • /
    • 2023
  • 컴퓨터 비전의 하위 태스크(Task)인 의미론적 분할(Semantic Segmentation)은 자율주행, 해상에서 선박찾기 등 다양한 분야에서 연구되고 있다. 기존 FCN(Fully Conovlutional Networks) 기반 의미론적 분할 모델은 다운샘플링(Dowsnsampling)과정에서 공간정보의 손실이 발생하여 정확도가 하락했다. 본 논문에서는 공간정보 손실을 완화하고자 PSA(Polarized Self-attention)의 공간정보 강조 모듈을 HRNet(High-resolution Networks)의 합성곱 블록 사이에 추가한다. 실험결과 파라미터는 3.1M, GFLOPs는 3.2G 증가했으나 mIoU는 0.26% 증가했다. 공간정보가 의미론적 분할 정확도에 영향이 미치는 것을 확인했다.

Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

  • Xinhua Lu;Haihai Wei;Li Ma;Qingji Xue;Yonghui Fu
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.427-438
    • /
    • 2023
  • Plenty of works have indicated that single image super-resolution (SISR) models relying on synthetic datasets are difficult to be applied to real scene text image super-resolution (STISR) for its more complex degradation. The up-to-date dataset for realistic STISR is called TextZoom, while the current methods trained on this dataset have not considered the effect of multi-scale features of text images. In this paper, a multi-scale and attention fusion model for realistic STISR is proposed. The multi-scale learning mechanism is introduced to acquire sophisticated feature representations of text images; The spatial and channel attentions are introduced to capture the local information and inter-channel interaction information of text images; At last, this paper designs a multi-scale residual attention module by skillfully fusing multi-scale learning and attention mechanisms. The experiments on TextZoom demonstrate that the model proposed increases scene text recognition's (ASTER) average recognition accuracy by 1.2% compared to text super-resolution network.

A Proposal of Shuffle Graph Convolutional Network for Skeleton-based Action Recognition

  • Jang, Sungjun;Bae, Han Byeol;Lee, HeanSung;Lee, Sangyoun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.4
    • /
    • pp.314-322
    • /
    • 2021
  • Skeleton-based action recognition has attracted considerable attention in human action recognition. Recent methods for skeleton-based action recognition employ spatiotemporal graph convolutional networks (GCNs) and have remarkable performance. However, most of them have heavy computational complexity for robust action recognition. To solve this problem, we propose a shuffle graph convolutional network (SGCN) which is a lightweight graph convolutional network using pointwise group convolution rather than pointwise convolution to reduce computational cost. Our SGCN is composed of spatial and temporal GCN. The spatial shuffle GCN contains pointwise group convolution and part shuffle module which enhances local and global information between correlated joints. In addition, the temporal shuffle GCN contains depthwise convolution to maintain a large receptive field. Our model achieves comparable performance with lowest computational cost and exceeds the performance of baseline at 0.3% and 1.2% on NTU RGB+D and NTU RGB+D 120 datasets, respectively.

Attention Gated FC-DenseNet for Extracting Crop Cultivation Area by Multispectral Satellite Imagery (다중분광밴드 위성영상의 작물재배지역 추출을 위한 Attention Gated FC-DenseNet)

  • Seong, Seon-kyeong;Mo, Jun-sang;Na, Sang-il;Choi, Jae-wan
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_1
    • /
    • pp.1061-1070
    • /
    • 2021
  • In this manuscript, we tried to improve the performance of the FC-DenseNet by applying an attention gate for the classification of cropping areas. The attention gate module could facilitate the learning of a deep learning model and improve the performance of the model by injecting of spatial/spectral weights to each feature map. Crop classification was performed in the onion and garlic regions using a proposed deep learning model in which an attention gate was added to the skip connection part of FC-DenseNet. Training data was produced using various PlanetScope satellite imagery, and preprocessing was applied to minimize the problem of imbalanced training dataset. As a result of the crop classification, it was verified that the proposed deep learning model can more effectively classify the onion and garlic regions than existing FC-DenseNet algorithm.

A Tuberculosis Detection Method Using Attention and Sparse R-CNN

  • Xu, Xuebin;Zhang, Jiada;Cheng, Xiaorui;Lu, Longbin;Zhao, Yuqing;Xu, Zongyu;Gu, Zhuangzhuang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2131-2153
    • /
    • 2022
  • To achieve accurate detection of tuberculosis (TB) areas in chest radiographs, we design a chest X-ray TB area detection algorithm. The algorithm consists of two stages: the chest X-ray TB classification network (CXTCNet) and the chest X-ray TB area detection network (CXTDNet). CXTCNet is used to judge the presence or absence of TB areas in chest X-ray images, thereby excluding the influence of other lung diseases on the detection of TB areas. It can reduce false positives in the detection network and improve the accuracy of detection results. In CXTCNet, we propose a channel attention mechanism (CAM) module and combine it with DenseNet. This module enables the network to learn more spatial and channel features information about chest X-ray images, thereby improving network performance. CXTDNet is a design based on a sparse object detection algorithm (Sparse R-CNN). A group of fixed learnable proposal boxes and learnable proposal features are using for classification and location. The predictions of the algorithm are output directly without non-maximal suppression post-processing. Furthermore, we use CLAHE to reduce image noise and improve image quality for data preprocessing. Experiments on dataset TBX11K show that the accuracy of the proposed CXTCNet is up to 99.10%, which is better than most current TB classification algorithms. Finally, our proposed chest X-ray TB detection algorithm could achieve AP of 45.35% and AP50 of 74.20%. We also establish a chest X-ray TB dataset with 304 sheets. And experiments on this dataset showed that the accuracy of the diagnosis was comparable to that of radiologists. We hope that our proposed algorithm and established dataset will advance the field of TB detection.

A Study on Mobile SFA System Prototyping Using P2P LBS Service (P2P LBS를 활용한 모바일 영업자동화(SFA) 시스템에 관한 연구)

  • 박기호;정재곤;황명화
    • Spatial Information Research
    • /
    • v.11 no.1
    • /
    • pp.61-72
    • /
    • 2003
  • LBS has attracted considerable attention with the spread of high performance mobile devices and the expansion of mobile business. Our study starts from the recognition of the problems associated with the current mobile Sales Force Automation(SFA) which is one of the application domain of LBS: they lack the capabilities such as an efficient sharing of information. This paper presents a technical framework in which the location information on the move and the mobile P2P service are utilized for the realization of truly mobile SF A platforms. Major contributions of our study include feasible prototyping of gCRM middleware via which the location-based services on the move are enabled, and a agent module involving the P2P service for mobile clients.

  • PDF

Attention based Feature-Fusion Network for 3D Object Detection (3차원 객체 탐지를 위한 어텐션 기반 특징 융합 네트워크)

  • Sang-Hyun Ryoo;Dae-Yeol Kang;Seung-Jun Hwang;Sung-Jun Park;Joong-Hwan Baek
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.2
    • /
    • pp.190-196
    • /
    • 2023
  • Recently, following the development of LIDAR technology which can detect distance from the object, the interest for LIDAR based 3D object detection network is getting higher. Previous networks generate inaccurate localization results due to spatial information loss during voxelization and downsampling. In this study, we propose an attention-based convergence method and a camera-LIDAR convergence system to acquire high-level features and high positional accuracy. First, by introducing the attention method into the Voxel-RCNN structure, which is a grid-based 3D object detection network, the multi-scale sparse 3D convolution feature is effectively fused to improve the performance of 3D object detection. Additionally, we propose the late-fusion mechanism for fusing outcomes in 3D object detection network and 2D object detection network to delete false positive. Comparative experiments with existing algorithms are performed using the KITTI data set, which is widely used in the field of autonomous driving. The proposed method showed performance improvement in both 2D object detection on BEV and 3D object detection. In particular, the precision was improved by about 0.54% for the car moderate class compared to Voxel-RCNN.

Detection of Plastic Greenhouses by Using Deep Learning Model for Aerial Orthoimages (딥러닝 모델을 이용한 항공정사영상의 비닐하우스 탐지)

  • Byunghyun Yoon;Seonkyeong Seong;Jaewan Choi
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.2
    • /
    • pp.183-192
    • /
    • 2023
  • The remotely sensed data, such as satellite imagery and aerial photos, can be used to extract and detect some objects in the image through image interpretation and processing techniques. Significantly, the possibility for utilizing digital map updating and land monitoring has been increased through automatic object detection since spatial resolution of remotely sensed data has improved and technologies about deep learning have been developed. In this paper, we tried to extract plastic greenhouses into aerial orthophotos by using fully convolutional densely connected convolutional network (FC-DenseNet), one of the representative deep learning models for semantic segmentation. Then, a quantitative analysis of extraction results had performed. Using the farm map of the Ministry of Agriculture, Food and Rural Affairsin Korea, training data was generated by labeling plastic greenhouses into Damyang and Miryang areas. And then, FC-DenseNet was trained through a training dataset. To apply the deep learning model in the remotely sensed imagery, instance norm, which can maintain the spectral characteristics of bands, was used as normalization. In addition, optimal weights for each band were determined by adding attention modules in the deep learning model. In the experiments, it was found that a deep learning model can extract plastic greenhouses. These results can be applied to digital map updating of Farm-map and landcover maps.

A Study of Textured Image Segmentation using Phase Information (페이즈 정보를 이용한 텍스처 영상 분할 연구)

  • Oh, Suk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.249-256
    • /
    • 2011
  • Finding a new set of features representing textured images is one of the most important studies in textured image analysis. This is because it is impossible to construct a perfect set of features representing every textured image, and it is inevitable to choose some relevant features which are efficient to on-going image processing jobs. This paper intends to find relevant features which are efficient to textured image segmentation. In this regards, this paper presents a different method for the segmentation of textured images based on the Gabor filter. Gabor filter is known to be a very efficient and effective tool which represents human visual system for texture analysis. Filtering a real-valued input image by the Gabor filter results in complex-valued output data defined in the spatial frequency domain. This complex value, as usual, gives the module and the phase. This paper focused its attention on the phase information, rather than the module information. In fact, the module information is considered very useful at region analysis in texture, while the phase information was considered almost of no use. But this paper shows that the phase information can also be fully useful and effective at region analysis in texture, once a good method introduced. We now propose "phase derivated method", which is an efficient and effective way to compute the useful phase information directly from the filtered value. This new method reduces effectively computing burden and widen applicable textured images.