Acknowledgement
The research was funded by the Department of Education of Shaanxi Province, China (No. 21JK0819).
References
- S. M. Azimi, M. Kraus, R. Bahmanyar, and P. Reinartz, "Multiple pedestrians and vehicles tracking in aerial imagery using a convolutional neural network," Remote Sensing, vol. 13, no. 10, article no. 1953, 2021. https://doi.org/10.3390/rs13101953.
- L. Wen, D. Du, P. Zhu, Q. Hu, Q. Wang, L. Bo, and S. Lyu, "Detection, tracking, and counting meets drones in crowds: a benchmark," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Event, 2021, pp. 7812-7821.
- I. Delibasoglu, "UAV images dataset for moving object detection from moving cameras," 2021 [Online]. Available: https://arxiv.org/abs/2103.11460.
- P. Zhang, D. Wang, and H. Lu, "Multi-modal visual tracking: review and experimental comparison," 2020 [Online]. Available: https://arxiv.org/abs/2012.04176.
- P. Zhang, J. Zhao, D. Wang, H. Lu, and X. Ruan, "Visible-thermal UAV tracking: a large-scale benchmark and new baseline," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 2022, pp. 8876-8885.
- C. Li, A. Lu, A. Zheng, Z. Tu, and J. Tang, "Multi-adapter RGBT tracking," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 2262-2270.
- P. Zhang, D. Wang, H. Lu, and X. Yang, "Learning adaptive attribute-driven representation for real-time RGB-T tracking," International Journal of Computer Vision, vol. 129, pp. 2714-2729, 2021. https://doi.org/10.1007/s11263-021-01495-3
- T. Zhang, X. Liu, Q. Zhang, and J. Han, "SiamCDA: complementarity-and distractor-aware RGB-T tracking based on Siamese network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1403-1417, 2021.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: hierarchical vision transformer using shifted windows," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021, pp. 10012-10022.
- H. Zhang, L. Zhang, L. Zhuo, and J. Zhang, "Object tracking in RGB-T videos using modal-aware attention network and competitive learning," Sensors, vol. 20, no. 2, article no. 393, 2020. https://doi.org/10.3390/s20020393.
- J. Peng, H. Zhao, Z. Hu, Z. Yi, and B. Wang, "Siamese infrared and visible light fusion network for RGB-T tracking," 2021 [Online]. Available: https://arxiv.org/abs/2103.07302.
- P. Zhang, J. Zhao, C. Bo, D. Wang, H. Lu, and X. Yang, "Jointly modeling motion and appearance cues for robust RGB-T tracking," IEEE Transactions on Image Processing, vol. 30, pp. 3335-3347, 2021. https://doi.org/10.1109/TIP.2021.3060862
- T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, "TrackFormer: multi-object tracking with transformers," 2022 [Online]. Available: https://arxiv.org/abs/2101.02702.
- P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "TransTrack: multiple object tracking with transformer," 2021 [Online]. Available: https://arxiv.org/abs/2012.15460.
- P. Chu, J. Wang, Q. You, H. Ling, and Z. Liu, "TransMOT: spatial-temporal graph transformer for multiple object tracking," 2021 [Online]. Available: https://arxiv.org/abs/2104.00194.
- Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, et al., "Swin transformer v2: scaling up capacity and resolution," 2022 [Online]. Available: https://arxiv.org/abs/2111.09883.
- Z. Wang, Y. Chen, W. Shao, H. Li, and L. Zhang, "SwinFuse: a residual swin transformer fusion network for infrared and visible images," 2022 [Online]. Available: https://arxiv.org/abs/2204.11436.
- J. Cai, M. Xu, W. Li, Y. Xiong, W. Xia, Z. Tu, and S. Soatto, "MeMOT: multi-object tracking with memory," 2022 [Online]. Available: https://arxiv.org/abs/2203.16761.
- X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable DETR: deformable transformers for end-toend object detection," 2020 [Online]. Available: https://arxiv.org/abs/2010.04159.
- Z. Yang, Y. Wei, and Y. Yang, "Associating objects with transformers for video object segmentation," 2021 [Online]. Available: https://arxiv.org/abs/2106.02638.
- G. Bertasius and L. Torresani, "Classifying, segmenting, and tracking object instances in video with mask propagation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 9736-9745.
- B. T. Polyak and A. B. Juditsky, "Acceleration of stochastic approximation by averaging," SIAM Journal on Control and Optimization, vol. 30, no. 4, pp. 838-855, 1992. https://doi.org/10.1137/0330046
- I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," 2019 [Online]. Available: https://arxiv.org/abs/1711.05101.
- Y. Gao, C. Li, Y. Zhu, J. Tang, T. He, and F. Wang, "Deep adaptive fusion network for high performance RGBT tracking," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 91-99.
- M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J. K. Kamarainen, et al., "The seventh visual object tracking VOT2019 challenge results," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 2206-2241.
- L. Zhang, M. Danelljan, A. Gonzalez-Garcia, J. van de Weijer, and F. Shahbaz Khan, "Multi-modal fusion for end-to-end RGB-T tracking," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 2252-2261.