Dynamic Tracking Aggregation with Transformers for RGB-T Tracking

Xiaohu, Liu;Zhiyong, Lei;

doi:10.3745/JIPS.01.0092

Journal of Information Processing Systems

Volume 19 Issue 1
/
Pages.80-88
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Dynamic Tracking Aggregation with Transformers for RGB-T Tracking

Xiaohu, Liu (School of Mechatronic Engineering, Xi'an Technological University) ;
Zhiyong, Lei (School of Electronic and Information Engineering, Xi'an Technological University)

Received : 2022.08.04
Accepted : 2022.10.10
Published : 2023.02.28

https://doi.org/10.3745/JIPS.01.0092 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

RGB-thermal (RGB-T) tracking using unmanned aerial vehicles (UAVs) involves challenges with regards to the similarity of objects, occlusion, fast motion, and motion blur, among other issues. In this study, we propose dynamic tracking aggregation (DTA) as a unified framework to perform object detection and data association. The proposed approach obtains fused features based a transformer model and an L1-norm strategy. To link the current frame with recent information, a dynamically updated embedding called dynamic tracking identification (DTID) is used to model the iterative tracking process. For object association, we designed a long short-term tracking aggregation module for dynamic feature propagation to match spatial and temporal embeddings. DTA achieved a highly competitive performance in an experimental evaluation on public benchmark datasets.

Keywords

Acknowledgement

The research was funded by the Department of Education of Shaanxi Province, China (No. 21JK0819).

References

S. M. Azimi, M. Kraus, R. Bahmanyar, and P. Reinartz, "Multiple pedestrians and vehicles tracking in aerial imagery using a convolutional neural network," Remote Sensing, vol. 13, no. 10, article no. 1953, 2021. https://doi.org/10.3390/rs13101953.
L. Wen, D. Du, P. Zhu, Q. Hu, Q. Wang, L. Bo, and S. Lyu, "Detection, tracking, and counting meets drones in crowds: a benchmark," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Event, 2021, pp. 7812-7821.
I. Delibasoglu, "UAV images dataset for moving object detection from moving cameras," 2021 [Online]. Available: https://arxiv.org/abs/2103.11460.
P. Zhang, D. Wang, and H. Lu, "Multi-modal visual tracking: review and experimental comparison," 2020 [Online]. Available: https://arxiv.org/abs/2012.04176.
P. Zhang, J. Zhao, D. Wang, H. Lu, and X. Ruan, "Visible-thermal UAV tracking: a large-scale benchmark and new baseline," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, 2022, pp. 8876-8885.
C. Li, A. Lu, A. Zheng, Z. Tu, and J. Tang, "Multi-adapter RGBT tracking," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 2262-2270.
P. Zhang, D. Wang, H. Lu, and X. Yang, "Learning adaptive attribute-driven representation for real-time RGB-T tracking," International Journal of Computer Vision, vol. 129, pp. 2714-2729, 2021. https://doi.org/10.1007/s11263-021-01495-3
T. Zhang, X. Liu, Q. Zhang, and J. Han, "SiamCDA: complementarity-and distractor-aware RGB-T tracking based on Siamese network," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1403-1417, 2021.
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: hierarchical vision transformer using shifted windows," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021, pp. 10012-10022.
H. Zhang, L. Zhang, L. Zhuo, and J. Zhang, "Object tracking in RGB-T videos using modal-aware attention network and competitive learning," Sensors, vol. 20, no. 2, article no. 393, 2020. https://doi.org/10.3390/s20020393.
J. Peng, H. Zhao, Z. Hu, Z. Yi, and B. Wang, "Siamese infrared and visible light fusion network for RGB-T tracking," 2021 [Online]. Available: https://arxiv.org/abs/2103.07302.
P. Zhang, J. Zhao, C. Bo, D. Wang, H. Lu, and X. Yang, "Jointly modeling motion and appearance cues for robust RGB-T tracking," IEEE Transactions on Image Processing, vol. 30, pp. 3335-3347, 2021. https://doi.org/10.1109/TIP.2021.3060862
T. Meinhardt, A. Kirillov, L. Leal-Taixe, and C. Feichtenhofer, "TrackFormer: multi-object tracking with transformers," 2022 [Online]. Available: https://arxiv.org/abs/2101.02702.
P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "TransTrack: multiple object tracking with transformer," 2021 [Online]. Available: https://arxiv.org/abs/2012.15460.
P. Chu, J. Wang, Q. You, H. Ling, and Z. Liu, "TransMOT: spatial-temporal graph transformer for multiple object tracking," 2021 [Online]. Available: https://arxiv.org/abs/2104.00194.
Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, et al., "Swin transformer v2: scaling up capacity and resolution," 2022 [Online]. Available: https://arxiv.org/abs/2111.09883.
Z. Wang, Y. Chen, W. Shao, H. Li, and L. Zhang, "SwinFuse: a residual swin transformer fusion network for infrared and visible images," 2022 [Online]. Available: https://arxiv.org/abs/2204.11436.
J. Cai, M. Xu, W. Li, Y. Xiong, W. Xia, Z. Tu, and S. Soatto, "MeMOT: multi-object tracking with memory," 2022 [Online]. Available: https://arxiv.org/abs/2203.16761.
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable DETR: deformable transformers for end-toend object detection," 2020 [Online]. Available: https://arxiv.org/abs/2010.04159.
Z. Yang, Y. Wei, and Y. Yang, "Associating objects with transformers for video object segmentation," 2021 [Online]. Available: https://arxiv.org/abs/2106.02638.
G. Bertasius and L. Torresani, "Classifying, segmenting, and tracking object instances in video with mask propagation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, 2020, pp. 9736-9745.
B. T. Polyak and A. B. Juditsky, "Acceleration of stochastic approximation by averaging," SIAM Journal on Control and Optimization, vol. 30, no. 4, pp. 838-855, 1992. https://doi.org/10.1137/0330046
I. Loshchilov and F. Hutter, "Decoupled weight decay regularization," 2019 [Online]. Available: https://arxiv.org/abs/1711.05101.
Y. Gao, C. Li, Y. Zhu, J. Tang, T. He, and F. Wang, "Deep adaptive fusion network for high performance RGBT tracking," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 91-99.
M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J. K. Kamarainen, et al., "The seventh visual object tracking VOT2019 challenge results," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 2206-2241.
L. Zhang, M. Danelljan, A. Gonzalez-Garcia, J. van de Weijer, and F. Shahbaz Khan, "Multi-modal fusion for end-to-end RGB-T tracking," in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, South Korea, 2019, pp. 2252-2261.

Journal of Information Processing Systems

Dynamic Tracking Aggregation with Transformers for RGB-T Tracking

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)