DOI QR코드

DOI QR Code

Training of a Siamese Network to Build a Tracker without Using Tracking Labels

샴 네트워크를 사용하여 추적 레이블을 사용하지 않는 다중 객체 검출 및 추적기 학습에 관한 연구

  • 강정규 (한국전자통신연구원 자율주행지능연구실) ;
  • 송유승 (한국전자통신연구원 자율주행지능연구실) ;
  • 민경욱 (한국전자통신연구원 자율주행지능연구실) ;
  • 최정단 (한국전자통신연구원 지능로보틱스연구본부)
  • Received : 2022.09.05
  • Accepted : 2022.10.20
  • Published : 2022.10.31

Abstract

Multi-object tracking has been studied for a long time under computer vision and plays a critical role in applications such as autonomous driving and driving assistance. Multi-object tracking techniques generally consist of a detector that detects objects and a tracker that tracks the detected objects. Various publicly available datasets allow us to train a detector model without much effort. However, there are relatively few publicly available datasets for training a tracker model, and configuring own tracker datasets takes a long time compared to configuring detector datasets. Hence, the detector is often developed separately with a tracker module. However, the separated tracker should be adjusted whenever the former detector model is changed. This study proposes a system that can train a model that performs detection and tracking simultaneously using only the detector training datasets. In particular, a Siam network with augmentation is used to compose the detector and tracker. Experiments are conducted on public datasets to verify that the proposed algorithm can formulate a real-time multi-object tracker comparable to the state-of-the-art tracker models.

이동객체 추적은 컴퓨터 비전 분야에서 오랜 시간 동안 연구가 진행되어 온 분야로 자율주행이나 운전 보조 시스템 등의 시스템에서 아주 중요한 역할을 수행하고 있다. 이동객체 추적 기술은 일반적으로 객체를 검출하는 검출기와 검출된 객체를 추적하는 추적기의 결합으로 이루어져 있다. 검출기는 다양한 데이터셋이 공개되어 사용되고 있기 때문에 쉽게 좋은 모델을 학습할 수 있지만, 추적기의 경우 상대적으로 공개된 데이터셋도 적고 직접 데이터셋을 구성하는 것도 검출기 데이터셋에 비해 굉장히 오랜 시간을 소요한다. 이에 검출기를 따로 개발하고, 별도의 추적기를 학습 기반이 아닌 방식을 활용하여 개발하는 경우가 많은데 이런 경우 두 개의 시스템이 차례로 작동하게 되어 전체 시스템의 속도를 느리게 하고 앞단의 검출기의 성능이 변할 때마다 별도로 추적기 또한 조정해줘야 한다는 단점이 있다. 이에 본 연구는 검출용 데이터셋만을 사용하여 검출과 추적을 동시에 수행하는 모델을 구성하는 방법을 제안한다. 데이터 증강 기술과 샴 네트워크를 사용하여 단일 이미지에서 객체를 검출 및 추적하는 방법을 연구하였다. 공개 데이터셋에 실험을 진행하여 학습 결과 높은 속도로 작동하는 이동객체 검출 및 추적기를 학습할 수 있음을 검증하였다.

Keywords

Acknowledgement

본 연구는 국토교통부/국토교통과학기술진흥원의 지원으로 수행되었음(과제번호 21AMDP-C161756-01).

References

  1. Bewley, A., Ge, Z., Ott, L., Ramos, F. and Upcroft, B.(2016), "Simple online and realtime tracking", 2016 IEEE International Conference on Image Processing(ICIP), pp.3464-3468.
  2. Bochinski, E., Eiselein, V. and Sikora, T.(2017), "High-speed tracking-by-detection without using image information", 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance(AVSS), pp.1-6.
  3. Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K. and Leal-Taixe, L.(2020), Mot20: A benchmark for multi object tracking in crowded scenes, arXiv preprint arXiv:2003.09003.
  4. Everingham, M., Van Gool, L., Williams, C. K., Winn, J. and Zisserman, A.(2010), "The pascal visual object classes (voc) challenge", International Journal of Computer Vision(IJCV), vol. 88, no. 2, pp.303-338. https://doi.org/10.1007/s11263-009-0275-4
  5. Ge, Z., Liu, S., Wang, F., Li, Z. and Sun, J.(2021), Yolox: Exceeding yolo series in 2021, arXiv preprint arXiv:2107.08430.
  6. He, A., Luo, C., Tian, X. and Zeng, W.(2018), "A twofold siamese network for real-time object tracking", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.4834-4843.
  7. Kang, J. G., Kim, M. J. and Min, K. W.(2022), "Dataset Definition and Training Techniques for Road Environment Objects State Recognition", Proceedings of the IEEK Conference, vol. 45, no. 1, pp.2607-2610.
  8. Kuhn, H. W.(1955), "The Hungarian method for the assignment problem", Naval Research Logistics Quarterly, vol. 2 no. 1-2, pp.83-97. https://doi.org/10.1002/nav.3800020109
  9. Leal-Taixe, L., Milan, A., Reid, I., Roth, S. and Schindler, K.(2015), Motchallenge 2015: Towards a benchmark for multi-target tracking, arXiv preprint arXiv:1504.01942.
  10. Lin, T. Y., Goyal, P., Girshick, R., He, K. and Dollar, P.(2017), "Focal loss for dense object detection", Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp.2980-2988.
  11. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll'ar, P. and Zitnick, C. L.(2014), "Microsoft coco: Common objects in context", European Conference on Computer Vision(ECCV), pp.740-755.
  12. LingChen, T. C., Khonsari, A., Lashkari, A., Nazari, M. R., Sambee, J. S. and Nascimento, M. A.(2020), Uniformaugment: A search-free probabilistic data augmentation approach, arXiv preprint arXiv:2003.14348.
  13. Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J.(2018), "Path aggregation network for instance segmentation", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.8759-8768.
  14. Loshchilov, I. and Hutter, F.(2017), Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101.
  15. Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixe, L. and Leibe, B.(2021), "Hota: A higher order metric for evaluating multi-object tracking", International Journal of Computer Vision(IJCV), vol. 129, no. 2, pp.548-578. https://doi.org/10.1007/s11263-020-01375-2
  16. Milan, A., Leal-Taixe, L., Reid, I., Roth, S. and Schindler, K.(2016), MOT16: A benchmark for multi-object tracking, arXiv preprint arXiv:1603.00831.
  17. Redmon, J. and Farhadi, A.(2018), Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767.
  18. Redmon, J., Divvala, S., Girshick, R. and Farhadi, A.(2016), "You only look once: Unified, real-time object detection", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp.779-788.
  19. Ren, S., He, K., Girshick, R. and Sun, J.(2015), "Faster r-cnn: Towards real-time object detection with region proposal networks", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp.1137-1149.
  20. Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X. and Sun, J.(2018), Crowdhuman: A benchmark for detecting human in a crowd, arXiv preprint arXiv:1805.00123.
  21. Shuai, B., Berneshawi, A., Li, X., Modolo, D. and Tighe, J.(2021), "Siammot: Siamese multi-object tracking", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp.12372-12382.
  22. Smith, L. N.(2017), "Cyclical learning rates for training neural networks", 2017 IEEE Winter Conference on Applications of Computer Vision(WACV), pp.464-472.
  23. Sun, P., Cao, J., Jiang, Y., Zhang, R., Xie, E., Yuan, Z., Wang, C. and Luo, P.(2020), Transtrack: Multiple object tracking with transformer, arXiv preprint arXiv:2012.15460.
  24. Tian, Z., Shen, C., Chen, H. and He, T.(2019), "Fcos: Fully convolutional one-stage object detection", Proceedings of the IEEE/CVF International Conference on Computer Vision(CVPR), pp.9627-9636.
  25. Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A. and Leibe, B.(2019), "Mots: Multi-object tracking and segmentation", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp.7942-7951.
  26. Wang, Z., Zheng, L., Liu, Y., Li, Y. and Wang, S.(2020), "Towards real-time multi-object tracking", European Conference on Computer Vision(ECCV), pp.107-122.
  27. Wojke, N., Bewley, A. and Paulus, D.(2017), "Simple online and realtime tracking with a deep association metric", 2017 IEEE International Conference on Image Processing(ICIP), pp.3645-3649.
  28. Zhang, Y., Wang, C., Wang, X., Zeng, W. and Liu, W.(2021), "Fairmot: On the fairness of detection and re-identification in multiple object tracking", International Journal of Computer Vision(IJCV), vol. 129, no. 11, pp.3069-3087. https://doi.org/10.1007/s11263-021-01513-4