1 |
S. Ren, K. He, R. Girshick, J. Sun, "Faster r-cnn: Towards Real-time Object Detection with Region Proposal Networks," Proceedings of Advances in Neural Information Processiong Systems, 2015.
|
2 |
J, Redmon, S. Divvala, R. Girshick, A. Farhadi, "You Only Look Once: Unified, Real-time Object Detection," Proceedings of Computer Vision and Pattern Recognition. pp. 779-788, 2016.
|
3 |
K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, D. Tao, "A survey on Vision Transformer," Journals of IEEE Transections on Pattern Analysis and Machine Intelligence, Vol.45, No. 1, pp. 73-86, 2023.
|
4 |
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," Proceedings of International Conference on Learning Representations, 2021.
|
5 |
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, "End-to-end Object Detection with Transformers," Proceedings of European Conference on Computer Vision, pp. 213-229, 2020.
|
6 |
Y. Fang, B. Liao, X. Wang, J. Fang, .J Qi, "You Only Look at one Sequence: Rethinking Transformer in Vision Through Object Detection," Proceedings of Advances in Neural Information Processiong Systems, Vol. 34, pp. 26183-26197, 2021.
|
7 |
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai "Deformable Detr: Deformable Transformers for End-to-end Object Detection," Proceedings of International Conference on Learning Representations, 2021.
|
8 |
B. Roh, J. W. Shin, W. Shin, S. Kim, "Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity," Proceedings of International Conference on Learning Representations, 2022.
|
9 |
D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, J. Wang "Conditional DETR for Fast Training Convergence," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3651-3660 2021.
|
10 |
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, "Microsoft COCO: Common Objects in Context," Proceedings of European Conference on Computer Vision, pp. 740-750, 2014.
|
11 |
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, Vol. 88, No. 2, pp. 303-338, 2010.
DOI
|
12 |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention is all you Need," Advances in Neural Information Processiong Systems, pp. 6000-6010, 2017.
|
13 |
J. Devlin, M.. W. Chang, K. Lee, K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics, Vol. 1, pp. 4171-4186, 2019.
|
14 |
D. W. Otter, J. R. Medina, J. K. Kalita, "A Survey of the Usages of Deep Learning for Natural Language Processing," Journal of IEEE transactions on neural networks and learning systems, Vol. 32, No. 2, pp. 604-624, 2020.
|
15 |
K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
|
16 |
P. Dollar, M. Singh, R. Girshick "Fast and Accurate Model Scaling," Proceedings of Computer Vision and Pattern Recognition, pp. 924-932, 2021.
|
17 |
T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, "Focal Loss for Dense Object Detection," Proceedings of IEEE International Conference on Computer Vision, pp. 2980-2988, 2017.
|
18 |
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," Proceedings of International Conference on Computer Vision, pp. 10012-10022, 2021.
|