• Title/Summary/Keyword: Feature pyramid learning

Search Result 24, Processing Time 0.025 seconds

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network (계층 간 특징 복원-예측 네트워크를 통한 피라미드 특징 압축)

  • Kim, Minsub;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.283-294
    • /
    • 2022
  • The feature map used in the network for deep learning generally has larger data than the image and a higher compression rate than the image compression rate is required to transmit the feature map. This paper proposes a method for transmitting a pyramid feature map with high compression rate, which is used in a network with an FPN structure that has robustness to object size in deep learning-based image processing. In order to efficiently compress the pyramid feature map, this paper proposes a structure that predicts a pyramid feature map of a level that is not transmitted with pyramid feature map of some levels that transmitted through the proposed prediction network to efficiently compress the pyramid feature map and restores compression damage through the proposed reconstruction network. Suggested mAP, the performance of object detection for the COCO data set 2017 Train images of the proposed method, showed a performance improvement of 31.25% in BD-rate compared to the result of compressing the feature map through VTM12.0 in the rate-precision graph, and compared to the method of performing compression through PCA and DeepCABAC, the BD-rate improved by 57.79%.

Detection of PCB Components Using Deep Neural Nets (심층신경망을 이용한 PCB 부품의 검지 및 인식)

  • Cho, Tai-Hoon
    • Journal of the Semiconductor & Display Technology
    • /
    • v.19 no.2
    • /
    • pp.11-15
    • /
    • 2020
  • In a typical initial setup of a PCB component inspection system, operators should manually input various information such as category, position, and inspection area for each component to be inspected, thus causing much inconvenience and longer setup time. Although there are many deep learning based object detectors, RetinaNet is regarded as one of best object detectors currently available. In this paper, a method using an extended RetinaNet is proposed that automatically detects its component category and position for each component mounted on PCBs from a high-resolution color input image. We extended the basic RetinaNet feature pyramid network by adding a feature pyramid layer having higher spatial resolution to the basic feature pyramid. It was demonstrated by experiments that the extended RetinaNet can detect successfully very small components that could be missed by the basic RetinaNet. Using the proposed method could enable automatic generation of inspection areas, thus considerably reducing the setup time of PCB component inspection systems.

Skin Lesion Segmentation with Codec Structure Based Upper and Lower Layer Feature Fusion Mechanism

  • Yang, Cheng;Lu, GuanMing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.60-79
    • /
    • 2022
  • The U-Net architecture-based segmentation models attained remarkable performance in numerous medical image segmentation missions like skin lesion segmentation. Nevertheless, the resolution gradually decreases and the loss of spatial information increases with deeper network. The fusion of adjacent layers is not enough to make up for the lost spatial information, thus resulting in errors of segmentation boundary so as to decline the accuracy of segmentation. To tackle the issue, we propose a new deep learning-based segmentation model. In the decoding stage, the feature channels of each decoding unit are concatenated with all the feature channels of the upper coding unit. Which is done in order to ensure the segmentation effect by integrating spatial and semantic information, and promotes the robustness and generalization of our model by combining the atrous spatial pyramid pooling (ASPP) module and channel attention module (CAM). Extensive experiments on ISIC2016 and ISIC2017 common datasets proved that our model implements well and outperforms compared segmentation models for skin lesion segmentation.

Recognition of Bill Form using Feature Pyramid Network (FPN(Feature Pyramid Network)을 이용한 고지서 양식 인식)

  • Kim, Dae-Jin;Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.523-529
    • /
    • 2021
  • In the era of the Fourth Industrial Revolution, technological changes are being applied in various fields. Automation digitization and data management are also in the field of bills. There are more than tens of thousands of forms of bills circulating in society and bill recognition is essential for automation, digitization and data management. Currently in order to manage various bills, OCR technology is used for character recognition. In this time, we can increase the accuracy, when firstly recognize the form of the bill and secondly recognize bills. In this paper, a logo that can be used as an index to classify the form of the bill was recognized as an object. At this time, since the size of the logo is smaller than that of the entire bill, FPN was used for Small Object Detection among deep learning technologies. As a result, it was possible to reduce resource waste and increase the accuracy of OCR recognition through the proposed algorithm.

Depth Map Estimation Model Using 3D Feature Volume (3차원 특징볼륨을 이용한 깊이영상 생성 모델)

  • Shin, Soo-Yeon;Kim, Dong-Myung;Suh, Jae-Won
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.447-454
    • /
    • 2018
  • This paper proposes a depth image generation algorithm of stereo images using a deep learning model composed of a CNN (convolutional neural network). The proposed algorithm consists of a feature extraction unit which extracts the main features of each parallax image and a depth learning unit which learns the parallax information using extracted features. First, the feature extraction unit extracts a feature map for each parallax image through the Xception module and the ASPP(Atrous spatial pyramid pooling) module, which are composed of 2D CNN layers. Then, the feature map for each parallax is accumulated in 3D form according to the time difference and the depth image is estimated after passing through the depth learning unit for learning the depth estimation weight through 3D CNN. The proposed algorithm estimates the depth of object region more accurately than other algorithms.

ASPPMVSNet: A high-receptive-field multiview stereo network for dense three-dimensional reconstruction

  • Saleh Saeed;Sungjun Lee;Yongju Cho;Unsang Park
    • ETRI Journal
    • /
    • v.44 no.6
    • /
    • pp.1034-1046
    • /
    • 2022
  • The learning-based multiview stereo (MVS) methods for three-dimensional (3D) reconstruction generally use 3D volumes for depth inference. The quality of the reconstructed depth maps and the corresponding point clouds is directly influenced by the spatial resolution of the 3D volume. Consequently, these methods produce point clouds with sparse local regions because of the lack of the memory required to encode a high volume of information. Here, we apply the atrous spatial pyramid pooling (ASPP) module in MVS methods to obtain dense feature maps with multiscale, long-range, contextual information using high receptive fields. For a given 3D volume with the same spatial resolution as that in the MVS methods, the dense feature maps from the ASPP module encoded with superior information can produce dense point clouds without a high memory footprint. Furthermore, we propose a 3D loss for training the MVS networks, which improves the predicted depth values by 24.44%. The ASPP module provides state-of-the-art qualitative results by constructing relatively dense point clouds, which improves the DTU MVS dataset benchmarks by 2.25% compared with those achieved in the previous MVS methods.

One-step deep learning-based method for pixel-level detection of fine cracks in steel girder images

  • Li, Zhihang;Huang, Mengqi;Ji, Pengxuan;Zhu, Huamei;Zhang, Qianbing
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.153-166
    • /
    • 2022
  • Identifying fine cracks in steel bridge facilities is a challenging task of structural health monitoring (SHM). This study proposed an end-to-end crack image segmentation framework based on a one-step Convolutional Neural Network (CNN) for pixel-level object recognition with high accuracy. To particularly address the challenges arising from small object detection in complex background, efforts were made in loss function selection aiming at sample imbalance and module modification in order to improve the generalization ability on complicated images. Specifically, loss functions were compared among alternatives including the Binary Cross Entropy (BCE), Focal, Tversky and Dice loss, with the last three specialized for biased sample distribution. Structural modifications with dilated convolution, Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (FPN) were also performed to form a new backbone termed CrackDet. Models of various loss functions and feature extraction modules were trained on crack images and tested on full-scale images collected on steel box girders. The CNN model incorporated the classic U-Net as its backbone, and Dice loss as its loss function achieved the highest mean Intersection-over-Union (mIoU) of 0.7571 on full-scale pictures. In contrast, the best performance on cropped crack images was achieved by integrating CrackDet with Dice loss at a mIoU of 0.7670.

Modified Pyramid Scene Parsing Network with Deep Learning based Multi Scale Attention (딥러닝 기반의 Multi Scale Attention을 적용한 개선된 Pyramid Scene Parsing Network)

  • Kim, Jun-Hyeok;Lee, Sang-Hun;Han, Hyun-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.45-51
    • /
    • 2021
  • With the development of deep learning, semantic segmentation methods are being studied in various fields. There is a problem that segmenation accuracy drops in fields that require accuracy such as medical image analysis. In this paper, we improved PSPNet, which is a deep learning based segmentation method to minimized the loss of features during semantic segmentation. Conventional deep learning based segmentation methods result in lower resolution and loss of object features during feature extraction and compression. Due to these losses, the edge and the internal information of the object are lost, and there is a problem that the accuracy at the time of object segmentation is lowered. To solve these problems, we improved PSPNet, which is a semantic segmentation model. The multi-scale attention proposed to the conventional PSPNet was added to prevent feature loss of objects. The feature purification process was performed by applying the attention method to the conventional PPM module. By suppressing unnecessary feature information, eadg and texture information was improved. The proposed method trained on the Cityscapes dataset and use the segmentation index MIoU for quantitative evaluation. As a result of the experiment, the segmentation accuracy was improved by about 1.5% compared to the conventional PSPNet.

A Study on Lightweight Model with Attention Process for Efficient Object Detection (효율적인 객체 검출을 위해 Attention Process를 적용한 경량화 모델에 대한 연구)

  • Park, Chan-Soo;Lee, Sang-Hun;Han, Hyun-Ho
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.307-313
    • /
    • 2021
  • In this paper, a lightweight network with fewer parameters compared to the existing object detection method is proposed. In the case of the currently used detection model, the network complexity has been greatly increased to improve accuracy. Therefore, the proposed network uses EfficientNet as a feature extraction network, and the subsequent layers are formed in a pyramid structure to utilize low-level detailed features and high-level semantic features. An attention process was applied between pyramid structures to suppress unnecessary noise for prediction. All computational processes of the network are replaced by depth-wise and point-wise convolutions to minimize the amount of computation. The proposed network was trained and evaluated using the PASCAL VOC dataset. The features fused through the experiment showed robust properties for various objects through a refinement process. Compared with the CNN-based detection model, detection accuracy is improved with a small amount of computation. It is considered necessary to adjust the anchor ratio according to the size of the object as a future study.

Spatial-Temporal Scale-Invariant Human Action Recognition using Motion Gradient Histogram (모션 그래디언트 히스토그램 기반의 시공간 크기 변화에 강인한 동작 인식)

  • Kim, Kwang-Soo;Kim, Tae-Hyoung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1075-1082
    • /
    • 2007
  • In this paper, we propose the method of multiple human action recognition on video clip. For being invariant to the change of speed or size of actions, Spatial-Temporal Pyramid method is applied. Proposed method can minimize the complexity of the procedures owing to select Motion Gradient Histogram (MGH) based on statistical approach for action representation feature. For multiple action detection, Motion Energy Image (MEI) of binary frame difference accumulations is adapted and then we detect each action of which area is represented by MGH. The action MGH should be compared with pre-learning MGH having pyramid method. As a result, recognition can be done by the analyze between action MGH and pre-learning MGH. Ten video clips are used for evaluating the proposed method. We have various experiments such as mono action, multiple action, speed and site scale-changes, comparison with previous method. As a result, we can see that proposed method is simple and efficient to recognize multiple human action with stale variations.