Browse > Article
http://dx.doi.org/10.6109/jkiice.2022.26.3.381

Improvement of Mask-RCNN Performance Using Deep-Learning-Based Arbitrary-Scale Super-Resolution Module  

Ahn, Young-Pill (Department of Computer Information Engineering, Cheongju University)
Park, Hyun-Jun (Division of Software Convergence, Cheongju University)
Abstract
In instance segmentation, Mask-RCNN is mostly used as a base model. Increasing the performance of Mask-RCNN is meaningful because it affects the performance of the derived model. Mask-RCNN has a transform module for unifying size of input images. In this paper, to improve the Mask-RCNN, we apply deep-learning-based ASSR to the resizing part in the transform module and inject calculated scale information into the model using IM(Integration Module). The proposed IM improves instance segmentation performance by 2.5 AP higher than Mask-RCNN in the COCO dataset, and in the periment for optimizing the IM location, the best performance was shown when it was located in the 'Top' before FPN and backbone were combined. Therefore, the proposed method can improve the performance of models using Mask-RCNN as a base model.
Keywords
Arbitrary-Scale Super-Resolution; Instance segmentation; Integration module; Mask-RCNN;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, pp. 2999-3007, 2017.
2 B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Hypercolumns for object segmentation and fine-grained localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 447-456, 2015.
3 Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, "Fully convolutional instance-aware semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4438-4446, 2017.
4 Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving into High Quality Object Detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6154-6162, 2018.
5 S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path Aggregation Network for Instance Segmentation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8759-8768, 2018.
6 Y. Liu, Y. Wang, S. Wang, T. Liang, Q. Zhao, Z. Tang, and H. Ling, "Cbnet: A novel composite backbone network architecture for object detection," in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 7, pp. 11653-11660, 2020.
7 J. Kim, J. K. Lee, and K. M. Lee, "Accurate image super-resolution using very deep convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646-1654, 2016.
8 L. Wang, Y. Wang, Z. Lin, J. Yang, W. An, and Y. Guo, "Learning a single network for scale-arbitrary super-resolution," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4781-4790, 2021.
9 W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in Proceedings of European conference on computer vision, pp. 21-37, 2016.
10 Y. LeCun and Y. Bengio, Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks, 1995.
11 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
12 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", arXiv, 2014.
13 R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
14 J. Dai, K. He, and J. Sun, "Convolutional feature masking for joint object and stuff segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3992-4000, 2015.
15 C. Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, "Dssd: Deconvolutional single shot detector," arxiv, 2017.
16 K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, pp. 2961-2969, 2017.
17 B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Simultaneous detection and segmentation," in Proceedings of European conference on computer vision, pp. 297-312, 2014.
18 J. Dai, K. He, Y. Li, S. Ren, and J. Sun, "Instance-sensitive fully convolutional networks," in Proceedings of European Conference on Computer Vision, pp. 534-549, 2016.
19 P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J. Malik, "Multiscale combinatorial grouping," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 328-335, 2014.
20 J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, Apr. 2013.   DOI
21 K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, "Hybrid task cascade for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4969-4978, 2019.
22 T. Liang, X. Chu, Y. Liu, Y. Wang, Z. Tang, W. Chu, J. Chen, and H. Ling, "CBNetV2: A Composite Backbone Network Architecture for Object Detection," arxiv, 2021.
23 C. Dong, C. C. Loy, K. He, and X. Tang, "Image super-resolution using deep convolutional networks," IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295-307, Feb. 2016.   DOI
24 X. Du, T. Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, and X. Song, "SpineNet: Learning scale-permuted backbone for recognition and localization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11592-11601, 2020.
25 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
26 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
27 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
28 S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
29 T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 936-944, 2017.
30 B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, "Enhanced deep residual networks for single image super-resolution," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 1132-1140, 2017.
31 X. Hu, H. Mu, X. Zhang, Z. Wang, T. Tan, and J. Sun, "Meta-SR: A magnification-arbitrary network for super-resolution," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1575-1584, 2019.
32 X. Xu, Z. Wang, and H. Shi, "UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution," arxiv, 2021.
33 R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, pp. 1440-1448, 2015.
34 A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high performance deep learning library," in Advances in neural information processing systems, vol. 32, pp. 8026-8037, 2019.
35 C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, "Photo-realistic single image super-resolution using a generative adversarial network," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 105-114, 2017.
36 J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6517-6525, 2017.
37 S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, vol. 28, pp. 91-99, 2015.
38 Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows," arxiv, 2021.
39 G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T. Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph, "Simple copy-paste is a strong data augmentation method for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2917-2927, 2021.