• Title/Summary/Keyword: Feature pyramid network(FPN)

Search Result 11, Processing Time 0.029 seconds

Pyramid Feature Compression with Inter-Level Feature Restoration-Prediction Network (계층 간 특징 복원-예측 네트워크를 통한 피라미드 특징 압축)

  • Kim, Minsub;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.283-294
    • /
    • 2022
  • The feature map used in the network for deep learning generally has larger data than the image and a higher compression rate than the image compression rate is required to transmit the feature map. This paper proposes a method for transmitting a pyramid feature map with high compression rate, which is used in a network with an FPN structure that has robustness to object size in deep learning-based image processing. In order to efficiently compress the pyramid feature map, this paper proposes a structure that predicts a pyramid feature map of a level that is not transmitted with pyramid feature map of some levels that transmitted through the proposed prediction network to efficiently compress the pyramid feature map and restores compression damage through the proposed reconstruction network. Suggested mAP, the performance of object detection for the COCO data set 2017 Train images of the proposed method, showed a performance improvement of 31.25% in BD-rate compared to the result of compressing the feature map through VTM12.0 in the rate-precision graph, and compared to the method of performing compression through PCA and DeepCABAC, the BD-rate improved by 57.79%.

Recognition of Bill Form using Feature Pyramid Network (FPN(Feature Pyramid Network)을 이용한 고지서 양식 인식)

  • Kim, Dae-Jin;Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.523-529
    • /
    • 2021
  • In the era of the Fourth Industrial Revolution, technological changes are being applied in various fields. Automation digitization and data management are also in the field of bills. There are more than tens of thousands of forms of bills circulating in society and bill recognition is essential for automation, digitization and data management. Currently in order to manage various bills, OCR technology is used for character recognition. In this time, we can increase the accuracy, when firstly recognize the form of the bill and secondly recognize bills. In this paper, a logo that can be used as an index to classify the form of the bill was recognized as an object. At this time, since the size of the logo is smaller than that of the entire bill, FPN was used for Small Object Detection among deep learning technologies. As a result, it was possible to reduce resource waste and increase the accuracy of OCR recognition through the proposed algorithm.

Performance Evaluation of FPN-Attention Layered Model for Improving Visual Explainability of Object Recognition (객체 인식 설명성 향상을 위한 FPN-Attention Layered 모델의 성능 평가)

  • Youn, Seok Jun;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1311-1314
    • /
    • 2022
  • DNN을 사용하여 객체 인식 과정에서 객체를 잘 분류하기 위해서는 시각적 설명성이 요구된다. 시각적 설명성은 object class에 대한 예측을 pixel-wise attribution으로 표현해 예측 근거를 해석하기 위해 제안되었다, Scale-invariant한 특징을 제공하도록 설계된 pyramidal features 기반 backbone 구조는 object detection 및 classification 등에서 널리 쓰이고 있으며, 이러한 특징을 갖는 feature pyramid를 trainable attention mechanism에 적용하고자 할 때 계산량 및 메모리의 복잡도가 증가하는 문제가 있다. 본 논문에서는 일반적인 FPN에서 객체 인식 성능과 설명성을 높이기 위한 피라미드-주의집중 계층네트워크 (FPN-Attention Layered Network) 방식을 제안하고, 실험적으로 그 특성을 평가하고자 한다. 기존의 FPN만을 사용하였을 때 객체 인식 과정에서 설명성을 향상시키는 방식이 객체 인식에 미치는 정도를 정량적으로 평가하였다. 제안된 모델의 적용을 통해 낮은 computing 오버헤드 수준에서 multi-level feature를 고려한 시각적 설명성을 개선시켜, 결괴적으로 객체 인식 성능을 향상 시킬 수 있음을 실험적으로 확인할 수 있었다.

  • PDF

SEL-RefineMask: A Seal Segmentation and Recognition Neural Network with SEL-FPN

  • Dun, Ze-dong;Chen, Jian-yu;Qu, Mei-xia;Jiang, Bin
    • Journal of Information Processing Systems
    • /
    • v.18 no.3
    • /
    • pp.411-427
    • /
    • 2022
  • Digging historical and cultural information from seals in ancient books is of great significance. However, ancient Chinese seal samples are scarce and carving methods are diverse, and traditional digital image processing methods based on greyscale have difficulty achieving superior segmentation and recognition performance. Recently, some deep learning algorithms have been proposed to address this problem; however, current neural networks are difficult to train owing to the lack of datasets. To solve the afore-mentioned problems, we proposed an SEL-RefineMask which combines selector of feature pyramid network (SEL-FPN) with RefineMask to segment and recognize seals. We designed an SEL-FPN to intelligently select a specific layer which represents different scales in the FPN and reduces the number of anchor frames. We performed experiments on some instance segmentation networks as the baseline method, and the top-1 segmentation result of 64.93% is 5.73% higher than that of humans. The top-1 result of the SEL-RefineMask network reached 67.96% which surpassed the baseline results. After segmentation, a vision transformer was used to recognize the segmentation output, and the accuracy reached 91%. Furthermore, a dataset of seals in ancient Chinese books (SACB) for segmentation and small seal font (SSF) for recognition were established which are publicly available on the website.

Compression of Multiscale Features of FPN for VCM (VCM 을 위한 FPN 다중 스케일 특징 압축)

  • Kim, Dong-Ha;Yoon, Yong-Uk;Lee, Jooyoung;Jeong, Se-Yoon;Kim, Jae-Gon;Jeong, Dae-Gwon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.143-145
    • /
    • 2022
  • MPEG-VCM(Video Coding for Machine)은 입력된 비디오 특징(feature)를 압축하는 Track1 과 입력 영상을 직접 압축하는 Track2 로 나뉘어 표준화가 진행중이다. 본 논문은 VCM Track 1 에 해당하는 Detectron2 FPN(Feature Pyramid Network)에서 추출한 다중 스케일 특징맵을 VVC 로 압축하는 MSFC(Multi-Scale Feature Compression)을 구조를 제안한다. 본 논문의 MSFC 에서는 다중 스케일 특징을 결합하여 부호화/복호화하는 기존의 구조에서 특징맵의 해상도를 줄여 압축하는 개선된 MSFC 를 제시한다. 제안 방법은 VCM 의 Track2 의 영상 앵커(image anchor) 보다 우수한 BPP-mAP 성능을 보이고 최대 -84.98%의 BD-rate 성능향상을 보인다.

  • PDF

A Target Detection Algorithm based on Single Shot Detector (Single Shot Detector 기반 타깃 검출 알고리즘)

  • Feng, Yuanlin;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.358-361
    • /
    • 2021
  • In order to improve the accuracy of small target detection more effectively, this paper proposes an improved single shot detector (SSD) target detection and recognition method based on cspdarknet53, which introduces lightweight ECA attention mechanism and Feature Pyramid Network (FPN). First, the original SSD backbone network is replaced with cspdarknet53 to enhance the learning ability of the network. Then, a lightweight ECA attention mechanism is added to the basic convolution block to optimize the network. Finally, FPN is used to gradually fuse the multi-scale feature maps used for detection in the SSD from the deep to the shallow layers of the network to improve the positioning accuracy and classification accuracy of the network. Experiments show that the proposed target detection algorithm has better detection accuracy, and it improves the detection accuracy especially for small targets.

Change Attention based Dense Siamese Network for Remote Sensing Change Detection (원격 탐사 변화 탐지를 위한 변화 주목 기반의 덴스 샴 네트워크)

  • Hwang, Gisu;Lee, Woo-Ju;Oh, Seoung-Jun
    • Journal of Broadcast Engineering
    • /
    • v.26 no.1
    • /
    • pp.14-25
    • /
    • 2021
  • Change detection, which finds changes in remote sensing images of the same location captured at different times, is very important because it is used in various applications. However, registration errors, building displacement errors, and shadow errors cause false positives. To solve these problems, we propose a novle deep convolutional network called CADNet (Change Attention Dense Siamese Network). CADNet uses FPN (Feature Pyramid Network) to detect multi-scale changes, applies a Change Attention Module that attends to the changes, and uses DenseNet as a feature extractor to use feature maps that contain both low-level and high-level features for change detection. CADNet performance measured from the Precision, Recall, F1 side is 98.44%, 98.47%, 98.46% for WHU datasets and 90.72%, 91.89%, 91.30% for LEVIR-CD datasets. The results of this experiment show that CADNet can offer better performance than any other traditional change detection method.

One-step deep learning-based method for pixel-level detection of fine cracks in steel girder images

  • Li, Zhihang;Huang, Mengqi;Ji, Pengxuan;Zhu, Huamei;Zhang, Qianbing
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.153-166
    • /
    • 2022
  • Identifying fine cracks in steel bridge facilities is a challenging task of structural health monitoring (SHM). This study proposed an end-to-end crack image segmentation framework based on a one-step Convolutional Neural Network (CNN) for pixel-level object recognition with high accuracy. To particularly address the challenges arising from small object detection in complex background, efforts were made in loss function selection aiming at sample imbalance and module modification in order to improve the generalization ability on complicated images. Specifically, loss functions were compared among alternatives including the Binary Cross Entropy (BCE), Focal, Tversky and Dice loss, with the last three specialized for biased sample distribution. Structural modifications with dilated convolution, Spatial Pyramid Pooling (SPP) and Feature Pyramid Network (FPN) were also performed to form a new backbone termed CrackDet. Models of various loss functions and feature extraction modules were trained on crack images and tested on full-scale images collected on steel box girders. The CNN model incorporated the classic U-Net as its backbone, and Dice loss as its loss function achieved the highest mean Intersection-over-Union (mIoU) of 0.7571 on full-scale pictures. In contrast, the best performance on cropped crack images was achieved by integrating CrackDet with Dice loss at a mIoU of 0.7670.

Enhancement of MSFC-Based Multi-Scale Features Compression Network with Bottom-UP MSFF in VCM (VCM 의 바텀-업 MSFF 를 이용한 MSFC 기반 멀티-스케일 특징 압축 네트워크 개선)

  • Dong-Ha Kim;Gyu-Woong Han;Jun-Seok Cha;Jae-Gon Kim
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.116-118
    • /
    • 2022
  • MPEG-VCM(Video Coding for Machine)은 입력된 이미지/비디오의 특징(feature)를 압축하는 Track 1 과 입력 이미지/비디오를 직접 압축하는 Track 2 로 나뉘어 표준화가 진행 중이다. 본 논문은 Track 1 의 비전임무 네트워크로 사용하는 Detectron2 의 FPN(Feature Pyramid Network)에서 추출한 멀티-스케일 특징을 효율적으로 압축하는 MSFC 기반의 압축 모델의 개선 기법을 제시한다. 제안기법은 해상도를 줄여서 단일-스케일 압축맵을 압축하는 기존의 압축 모델에서 저해상도 특징맵을 고해상도 특징맵에 바텀-업(Bottom-Up) 구조로 합성하여 단일-스케일 특징맵을 구성하는 바텀-업 MSFF 를 가지는 압축 모델을 제시한다. 제안방법은 기존의 모델 보다 BPP-mAP 성능에서 1 ~ 2.7%의 개선된 BD-rate 성능을 보이며 VCM 의 이미지 앵커(image anchor) 대비 최대 -85.94%의 BD-rate 성능향상을 보인다.

  • PDF

Improvement of Mask-RCNN Performance Using Deep-Learning-Based Arbitrary-Scale Super-Resolution Module (딥러닝 기반 임의적 스케일 초해상도 모듈을 이용한 Mask-RCNN 성능 향상)

  • Ahn, Young-Pill;Park, Hyun-Jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.381-388
    • /
    • 2022
  • In instance segmentation, Mask-RCNN is mostly used as a base model. Increasing the performance of Mask-RCNN is meaningful because it affects the performance of the derived model. Mask-RCNN has a transform module for unifying size of input images. In this paper, to improve the Mask-RCNN, we apply deep-learning-based ASSR to the resizing part in the transform module and inject calculated scale information into the model using IM(Integration Module). The proposed IM improves instance segmentation performance by 2.5 AP higher than Mask-RCNN in the COCO dataset, and in the periment for optimizing the IM location, the best performance was shown when it was located in the 'Top' before FPN and backbone were combined. Therefore, the proposed method can improve the performance of models using Mask-RCNN as a base model.