Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2022-00166722).
References
- K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, Venice, Italy, pp. 2961-2969, 2017. DOI: 10.1109/ICCV.2017.322.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 60, no. 6, pp. 84-89, May. 2012. DOI: 10.1145/3065386.
- Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
- C. Szegedy, W. Liu, Y. Jia, and P. Sermanet, "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA, pp. 1-9, 2015. DOI: 10.1109/cvpr.2015.7298594.
- K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014, [Online]. Available: https://arxiv.org/abs/1409.1556.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp. 770-778, 2016. DOI: 10.1109/cvpr.2016.90.
- M. Oquab, L. Bottou, I. Laptev, and J. Sivic, "Learning and transferring mid-level image representations using convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, USA, pp. 1717-1724, 2014. DOI: 10.1109/cvpr.2014.222.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, USA, pp. 580-587, 2014. DOI: 10.1109/cvpr.2014.81.
- R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp. 1440-1448, 2015. DOI: 10.1109/iccv.2015.169.
- S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, pp. 91-99, 2015.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, USA, pp. 779-788, 2016. DOI: 10.1109/cvpr.2016.91.
- J. Redmon and A. Farhadi, "YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp. 6517-6525, 2017. DOI: 10.1109/cvpr.2017.690.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in European conference on computer vision, Amsterdam, Netherlands, pp. 21-37, 2016. DOI: 10.1007/978-3-319-46448-0_2.
- C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, and A. C. Berg, Dssd: Deconvolutional single shot detector, 2017, [online]. Available: https://arxiv.org/abs/1701.06659.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, Venice, Italy, pp. 2980-2988, 2017. DOI: 10.1109/iccv.2017.324.
- B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Simultaneous detection and segmentation," in European conference on computer vision, Zurich, Switzerland, pp. 297-312, 2014. DOI: 10.1007/978-3-319-10584-0_20.
- B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Hypercolumns for object segmentation and fine-grained localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA, pp. 447-456, 2015. DOI: 10.1109/cvpr.2015.7298642.
- J. Dai, K. He, and J. Sun, "Convolutional feature masking for joint object and stuff segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, USA, pp. 3992-4000, 2015. DOI: 10.1109/cvpr.2015.7299025.
- J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers, and A. W. M. Smeulders, "Selective search for object recognition," International journal of computer vision, vol. 104, no. 2, pp. 154-171, Apr. 2013. DOI: 10.1007/s11263-013-0620-5.
- P. Arbelaez, J. Pont-Tuset, J. T. Barron, F. Marques, and J. Malik, "Multiscale combinatorial grouping," in Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, USA, pp. 328-335, 2014. DOI: 10.1109/CVPR.2014.49.
- J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, MA, USA, pp. 3431-3440, 2015. DOI: 10.1109/cvpr.2015.7298965.
- J. Dai, K. He, Y. Li, S. Ren, and J. Sun, "Instance-sensitive fully convolutional networks," in European Conference on Computer Vision, Amsterdam, Netherlands, pp. 534-549, 2016. DOI: 10.1007/978-3-319-46466-4_32.
- Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, "Fully convolutional instanceaware semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp. 2359-2367, 2017. DOI: 10.1109/cvpr.2017.472.
- Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving into high quality object detection," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 6154-6162, 2018. DOI: 10.1109/CVPR.2018.00644.
- S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 8759-8768, 2018. DOI: 10.1109/CVPR.2018.00913.
- K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, "Hybrid task cascade for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 4974-4983, 2019. DOI: 10.1109/cvpr.2019.00511.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, 2021, [online]. Available: https://arxiv.org/abs/2103.14030.
- Y. Liu, Y. Wang, S. Wang, T Liang, Q. Zhao, Z. Tang, and H. Ling, "Cbnet: A novel composite backbone network architecture for object detection," in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, pp. 11653-11660, Apr. 2020. DOI: 10.1609/aaai.v34i07.6834.
- T. Liang, X. Chu, Y. Liu, Y. Wang, Z. Tang, W. Chu, J. Chen, and H. Ling, CBNetV2: A Composite Backbone Network Architecture for Object Detection, 2021, [online]. Available: https://arxiv.org/abs/2107.00420.
- X. Du, T-Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, and X. Song, "SpineNet: Learning scale-permuted backbone for recognition and localization," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, USA, pp. 11592-11601, 2020. DOI: 10.1109/cvpr42600.2020.01161.
- G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph, "Simple copy-paste is a strong data augmentation method for instance segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, pp. 2918-2928, 2021. DOI: 10.1109/cvpr46437.2021.00294.
- T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, USA, pp. 2117-2125, 2017. DOI: 10.1109/cvpr.2017.106.
- G. Bradski and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. O'Reilly Media, Inc., 2008.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, "Pytorch: An imperative style, highperformance deep learning library," Advances in neural information processing systems, vol. 32, pp. 8026-8037, 2019.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosia, M. Bernstein, A. C. Berg, and L. Fei-Fei, "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, no. 3, pp. 211-252, Apr. 2015. DOI: 10.1007/s11263-015-0816-y.
- P. T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, "A tutorial on the cross-entropy method," Annals of operations research, vol. 134, no. 1, pp. 19-67, Feb. 2005. DOI: 10.1007/s10479-005-5724-z.
- K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," in Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp. 1026-1034, 2015. DOI: 10.1109/iccv.2015.123.
- R. Padilla, S. L. Netto, and E. A. B. da Silva, "A survey on performance metrics for object-detection algorithms," in 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, pp. 237-242, 2020. DOI: 10.1109/iwssip48289.2020. 9145130.
- L. Wang, J. Shi, G. Song, and I. Shen, "Object detection combining recognition and segmentation," in Asian conference on computer vision, pp. 189-199, 2007. DOI: 10.1007/978-3-540-76386-4_17.
- W. Abdulla, Mask r-cnn for object detection and instance segmentation on keras and tensorflow, 2017, [online]. Available: https://github.com/matterport/Mask_RCNN.