Improvement of Mask-RCNN Performance Using Deep-Learning-Based Arbitrary-Scale Super-Resolution Module

Ahn, Young-Pill;Park, Hyun-Jun;

doi:10.6109/jkiice.2022.26.3.381

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 26 Issue 3
/
Pages.381-388
/
2022
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Improvement of Mask-RCNN Performance Using Deep-Learning-Based Arbitrary-Scale Super-Resolution Module

딥러닝 기반 임의적 스케일 초해상도 모듈을 이용한 Mask-RCNN 성능 향상

Ahn, Young-Pill (Department of Computer Information Engineering, Cheongju University) ;
Park, Hyun-Jun (Division of Software Convergence, Cheongju University)

안영필 ;
박현준

Received : 2022.01.17
Accepted : 2022.02.27
Published : 2022.03.31

https://doi.org/10.6109/jkiice.2022.26.3.381 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In instance segmentation, Mask-RCNN is mostly used as a base model. Increasing the performance of Mask-RCNN is meaningful because it affects the performance of the derived model. Mask-RCNN has a transform module for unifying size of input images. In this paper, to improve the Mask-RCNN, we apply deep-learning-based ASSR to the resizing part in the transform module and inject calculated scale information into the model using IM(Integration Module). The proposed IM improves instance segmentation performance by 2.5 AP higher than Mask-RCNN in the COCO dataset, and in the periment for optimizing the IM location, the best performance was shown when it was located in the 'Top' before FPN and backbone were combined. Therefore, the proposed method can improve the performance of models using Mask-RCNN as a base model.

인스턴스 분할에서 Mask-RCNN은 베이스 모델로 자주 사용된다. Mask-RCNN의 성능을 높이는 것은 파생된 모델에 영향을 미치기에 의미가 있다. Mask-RCNN에는 입력 이미지 크기를 배치 크기로 통일시키는 변환 모듈(transform module)이 있다. 이 논문에서는 Mask-RCNN의 성능 향상을 위해 변환 모듈의 크기 조정 부분에 딥러닝 기반 ASSR(Arbitrary-Scale Super-Resolution)을 적용하고, 스케일 정보를 모델의 IM(Integration Module)을 이용하여 주입한다. 제안하는 방법을 COCO 데이터세트에 적용하였을 때 인스턴스 분할 성능이 Mask-RCNN 성능보다 2.5 AP 높았다. 그리고 제안하는 IM 위치 최적화를 위한 실험에서는 FPN(Feature Pyramid Network)과 백본(backbone)이 결합하기 전의 'Top' 위치에 배치했을 때 가장 좋은 성능을 보였다. 따라서 제안하는 방법은 Mask-RCNN을 베이스 모델로 사용하는 모델들의 성능을 향상시킬 수 있다.

Keywords

Acknowledgement

This research was supported by the Cheongju University Research Scholarship Grants in 2021.

References

Y. LeCun and Y. Bengio, Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks, 1995.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", arXiv, 2014.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, 2015.
S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, vol. 28, pp. 91-99, 2015.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6517-6525, 2017.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in Proceedings of European conference on computer vision, pp. 21-37, 2016.
C. Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, "Dssd: Deconvolutional single shot detector," arxiv, 2017.
T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, pp. 2999-3007, 2017.
K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017.
B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Simultaneous detection and segmentation," in Proceedings of European conference on computer vision, pp. 297-312, 2014.
B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Hypercolumns for object segmentation and fine-grained localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 447-456, 2015.
J. Dai, K. He, and J. Sun, "Convolutional feature masking for joint object and stuff segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3992-4000, 2015.
J. Dai, K. He, Y. Li, S. Ren, and J. Sun, "Instance-sensitive fully convolutional networks," in Proceedings of European Conference on Computer Vision, pp. 534-549, 2016.
Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, "Fully convolutional instance-aware semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438-4446, 2017.
J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, Apr. 2013. https://doi.org/10.1007/s11263-013-0620-5
P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik, "Multiscale combinatorial grouping," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 328-335, 2014.
Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving into High Quality Object Detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6154-6162, 2018.
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, 2018.
K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, "Hybrid task cascade for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4969-4978, 2019.
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows," arxiv, 2021.
Y. Liu, Y. Wang, S. Wang, T. Liang, Q. Zhao, Z. Tang, and H. Ling, "Cbnet: A novel composite backbone network architecture for object detection," in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 7, pp. 11653-11660, 2020.
T. Liang, X. Chu, Y. Liu, Y. Wang, Z. Tang, W. Chu, J. Chen, and H. Ling, "CBNetV2: A Composite Backbone Network Architecture for Object Detection," arxiv, 2021.
X. Du, T. Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, and X. Song, "SpineNet: Learning scale-permuted backbone for recognition and localization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11592-11601, 2020.
G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T. Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph, "Simple copy-paste is a strong data augmentation method for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2917-2927, 2021.
T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 936-944, 2017.
C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295-307, Feb. 2016. https://doi.org/10.1109/TPAMI.2015.2439281
J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646-1654, 2016.
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, "Photo-realistic single image super-resolution using a generative adversarial network," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 105-114, 2017.
B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced deep residual networks for single image super-resolution," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1132-1140, 2017.
L. Wang, Y. Wang, Z. Lin, J. Yang, W. An, and Y. Guo, "Learning a single network for scale-arbitrary super-resolution," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4781-4790, 2021.
X. Hu, H. Mu, X. Zhang, Z. Wang, T. Tan, and J. Sun, "Meta-SR: A magnification-arbitrary network for super-resolution," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1575-1584, 2019.
X. Xu, Z. Wang, and H. Shi, "UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution," arxiv, 2021.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high performance deep learning library," in Advances in neural information processing systems, vol. 32, pp. 8026-8037, 2019.