Acknowledgement
이 논문은 2022년도 정부 (과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 연구임 (No.2021-0-00994, 지속가능하고 견고한 자율주행 인공지능 교육/개발 통함 플랫폼과 No.RS-2022-00167194, 미션 크리티컬 시스템을 위한 신뢰 가능한 인공지능).
References
- S. Ren, K. He, R. Girshick, J. Sun, "Faster r-cnn: Towards Real-time Object Detection with Region Proposal Networks," Proceedings of Advances in Neural Information Processiong Systems, 2015.
- J, Redmon, S. Divvala, R. Girshick, A. Farhadi, "You Only Look Once: Unified, Real-time Object Detection," Proceedings of Computer Vision and Pattern Recognition. pp. 779-788, 2016.
- K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, D. Tao, "A survey on Vision Transformer," Journals of IEEE Transections on Pattern Analysis and Machine Intelligence, Vol.45, No. 1, pp. 73-86, 2023.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," Proceedings of International Conference on Learning Representations, 2021.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, "End-to-end Object Detection with Transformers," Proceedings of European Conference on Computer Vision, pp. 213-229, 2020.
- Y. Fang, B. Liao, X. Wang, J. Fang, .J Qi, "You Only Look at one Sequence: Rethinking Transformer in Vision Through Object Detection," Proceedings of Advances in Neural Information Processiong Systems, Vol. 34, pp. 26183-26197, 2021.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai "Deformable Detr: Deformable Transformers for End-to-end Object Detection," Proceedings of International Conference on Learning Representations, 2021.
- B. Roh, J. W. Shin, W. Shin, S. Kim, "Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity," Proceedings of International Conference on Learning Representations, 2022.
- D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, J. Wang "Conditional DETR for Fast Training Convergence," Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3651-3660 2021.
- T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, "Microsoft COCO: Common Objects in Context," Proceedings of European Conference on Computer Vision, pp. 740-750, 2014.
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, "The Pascal Visual Object Classes (VOC) Challenge," International Journal of Computer Vision, Vol. 88, No. 2, pp. 303-338, 2010. https://doi.org/10.1007/s11263-009-0275-4
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention is all you Need," Advances in Neural Information Processiong Systems, pp. 6000-6010, 2017.
- J. Devlin, M.. W. Chang, K. Lee, K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics, Vol. 1, pp. 4171-4186, 2019.
- D. W. Otter, J. R. Medina, J. K. Kalita, "A Survey of the Usages of Deep Learning for Natural Language Processing," Journal of IEEE transactions on neural networks and learning systems, Vol. 32, No. 2, pp. 604-624, 2020.
- K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
- P. Dollar, M. Singh, R. Girshick "Fast and Accurate Model Scaling," Proceedings of Computer Vision and Pattern Recognition, pp. 924-932, 2021.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows," Proceedings of International Conference on Computer Vision, pp. 10012-10022, 2021.
- T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, "Focal Loss for Dense Object Detection," Proceedings of IEEE International Conference on Computer Vision, pp. 2980-2988, 2017.