DOI QR코드

DOI QR Code

Comparison of the Effect of Interpolation on the Mask R-CNN Model

  • Young-Pill, Ahn (Department of Computer Science, Chungbuk National University) ;
  • Kwang Baek, Kim (Department of Artificial Intelligence, Silla University) ;
  • Hyun-Jun, Park (Department of Artificial Intelligence Software, Cheongju University)
  • 투고 : 2022.10.21
  • 심사 : 2022.12.30
  • 발행 : 2023.03.31

초록

Recently, several high-performance instance segmentation models have used the Mask R-CNN model as a baseline, which reached a historical peak in instance segmentation in 2017. There are numerous derived models using the Mask R-CNN model, and if the performance of Mask R-CNN is improved, the performance of the derived models is also anticipated to improve. The Mask R-CNN uses interpolation to adjust the image size, and the input differs depending on the interpolation method. Therefore, in this study, the performance change of Mask R-CNN was compared when various interpolation methods were applied to the transform layer to improve the performance of Mask R-CNN. To train and evaluate the models, this study utilized the PennFudan and Balloon datasets and the AP metric was used to evaluate model performance. As a result of the experiment, the derived Mask R-CNN model showed the best performance when bicubic interpolation was used in the transform layer.

키워드

과제정보

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2022-00166722).

참고문헌

  1. K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, Venice, Italy, pp. 2961-2969, 2017. DOI: 10.1109/ICCV.2017.322.
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 60, no. 6, pp. 84-89, May. 2012. DOI: 10.1145/3065386.
  3. Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
  4. C. Szegedy, W. Liu, Y. Jia, and P. Sermanet, "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA, pp. 1-9, 2015. DOI: 10.1109/cvpr.2015.7298594.
  5. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, [Online]. Available: https://arxiv.org/abs/1409.1556.
  6. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp. 770-778, 2016. DOI: 10.1109/cvpr.2016.90.
  7. M. Oquab, L. Bottou, I. Laptev, and J. Sivic, "Learning and transferring mid-level image representations using convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, USA, pp. 1717-1724, 2014. DOI: 10.1109/cvpr.2014.222.
  8. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, USA, pp. 580-587, 2014. DOI: 10.1109/cvpr.2014.81.
  9. R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp. 1440-1448, 2015. DOI: 10.1109/iccv.2015.169.
  10. S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, pp. 91-99, 2015.
  11. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp. 779-788, 2016. DOI: 10.1109/cvpr.2016.91.
  12. J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp. 6517-6525, 2017. DOI: 10.1109/cvpr.2017.690.
  13. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European conference on computer vision, Amsterdam, Netherlands, pp. 21-37, 2016. DOI: 10.1007/978-3-319-46448-0_2.
  14. C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, Dssd: Deconvolutional single shot detector, 2017, [online]. Available: https://arxiv.org/abs/1701.06659.
  15. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, Venice, Italy, pp. 2980-2988, 2017. DOI: 10.1109/iccv.2017.324.
  16. B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Simultaneous detection and segmentation," in European conference on computer vision, Zurich, Switzerland, pp. 297-312, 2014. DOI: 10.1007/978-3-319-10584-0_20.
  17. B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Hypercolumns for object segmentation and fine-grained localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA, pp. 447-456, 2015. DOI: 10.1109/cvpr.2015.7298642.
  18. J. Dai, K. He, and J. Sun, "Convolutional feature masking for joint object and stuff segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA, pp. 3992-4000, 2015. DOI: 10.1109/cvpr.2015.7299025.
  19. J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, Apr. 2013. DOI: 10.1007/s11263-013-0620-5.
  20. P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, "Multiscale combinatorial grouping," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, USA, pp. 328-335, 2014. DOI: 10.1109/CVPR.2014.49.
  21. J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, pp. 3431-3440, 2015. DOI: 10.1109/cvpr.2015.7298965.
  22. J. Dai, K. He, Y. Li, S. Ren, and J. Sun, "Instance-sensitive fully convolutional networks," in European Conference on Computer Vision, Amsterdam, Netherlands, pp. 534-549, 2016. DOI: 10.1007/978-3-319-46466-4_32.
  23. Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, "Fully convolutional instanceaware semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp. 2359-2367, 2017. DOI: 10.1109/cvpr.2017.472.
  24. Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving into high quality object detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 6154-6162, 2018. DOI: 10.1109/CVPR.2018.00644.
  25. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 8759-8768, 2018. DOI: 10.1109/CVPR.2018.00913.
  26. K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, "Hybrid task cascade for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 4974-4983, 2019. DOI: 10.1109/cvpr.2019.00511.
  27. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, 2021, [online]. Available: https://arxiv.org/abs/2103.14030.
  28. Y. Liu, Y. Wang, S. Wang, T Liang, Q. Zhao, Z. Tang, and H. Ling, "Cbnet: A novel composite backbone network architecture for object detection," in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 11653-11660, Apr. 2020. DOI: 10.1609/aaai.v34i07.6834.
  29. T. Liang, X. Chu, Y. Liu, Y. Wang, Z. Tang, W. Chu, J. Chen, and H. Ling, CBNetV2: A Composite Backbone Network Architecture for Object Detection, 2021, [online]. Available: https://arxiv.org/abs/2107.00420.
  30. X. Du, T-Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, and X. Song, "SpineNet: Learning scale-permuted backbone for recognition and localization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, USA, pp. 11592-11601, 2020. DOI: 10.1109/cvpr42600.2020.01161.
  31. G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph, "Simple copy-paste is a strong data augmentation method for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, pp. 2918-2928, 2021. DOI: 10.1109/cvpr46437.2021.00294.
  32. T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp. 2117-2125, 2017. DOI: 10.1109/cvpr.2017.106.
  33. G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Inc., 2008.
  34. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, "Pytorch: An imperative style, highperformance deep learning library," Advances in neural information processing systems, vol. 32, pp. 8026-8037, 2019.
  35. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosia, M. Bernstein, A. C. Berg, and L. Fei-Fei, "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, no. 3, pp. 211-252, Apr. 2015. DOI: 10.1007/s11263-015-0816-y.
  36. P. T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, "A tutorial on the cross-entropy method," Annals of operations research, vol. 134, no. 1, pp. 19-67, Feb. 2005. DOI: 10.1007/s10479-005-5724-z.
  37. K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp. 1026-1034, 2015. DOI: 10.1109/iccv.2015.123.
  38. R. Padilla, S. L. Netto, and E. A. B. da Silva, "A survey on performance metrics for object-detection algorithms," in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, pp. 237-242, 2020. DOI: 10.1109/iwssip48289.2020. 9145130.
  39. L. Wang, J. Shi, G. Song, and I. Shen, "Object detection combining recognition and segmentation," in Asian conference on computer vision, pp. 189-199, 2007. DOI: 10.1007/978-3-540-76386-4_17.
  40. W. Abdulla, Mask r-cnn for object detection and instance segmentation on keras and tensorflow, 2017, [online]. Available: https://github.com/matterport/Mask_RCNN.