DOI QR코드

DOI QR Code

Background memory-assisted zero-shot video object segmentation for unmanned aerial and ground vehicles

  • Kimin Yun (Visual Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Hyung-Il Kim (Visual Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Kangmin Bae (Visual Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Jinyoung Moon (Visual Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)
  • 투고 : 2023.03.24
  • 심사 : 2023.08.08
  • 발행 : 2023.10.20

초록

Unmanned aerial vehicles (UAV) and ground vehicles (UGV) require advanced video analytics for various tasks, such as moving object detection and segmentation; this has led to increasing demands for these methods. We propose a zero-shot video object segmentation method specifically designed for UAV and UGV applications that focuses on the discovery of moving objects in challenging scenarios. This method employs a background memory model that enables training from sparse annotations along the time axis, utilizing temporal modeling of the background to detect moving objects effectively. The proposed method addresses the limitations of the existing state-of-the-art methods for detecting salient objects within images, regardless of their movements. In particular, our method achieved mean J and F values of 82.7 and 81.2 on the DAVIS'16, respectively. We also conducted extensive ablation studies that highlighted the contributions of various input compositions and combinations of datasets used for training. In future developments, we will integrate the proposed method with additional systems, such as tracking and obstacle avoidance functionalities.

키워드

과제정보

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-00004, Development of Previsional Intelligence based on Long-term Visual Memory Network (50%), No. 2014-3-00123, Development of High Performance Visual BigData Discovery Platform for Large-Scale Realtime Data Analysis (30%), and No.2022-0-00124, Development of Artificial Intelligence Technology for Self-Improving Competency-Aware Learning Capabilities (20%)).

참고문헌

  1. K. Bae, K. Yun, J. Cho, and Y. Bae, The dataset and baseline models to detect human postural states robustly against irregular postures, (IEEE International Conference on Advanced Video and Signal Based Surveillance, Washington, DC, UDS), 2021. https://doi.org/10.1109/AVSS52988.2021.9663782
  2. K. Bae, K. Yun, H.-I. Kim, Y. Lee, and J. Park, Anti-litter surveillance based on person understanding via multi-task learning, (The British Machine Vision Conference (BMVC), Online, UK), 2020.
  3. K. Yun, Y. Kwon, S. Oh, J. Moon, and J. Park, Vision-based garbage dumping action detection for real-world surveillance platform, ETRI J. 41 (2019), no. 4, 494-505. https://doi.org/10.4218/etrij.2018-0520
  4. Y. Hong, S. Kim, Y. Kim, and J. Cha, Quadrotor path planning using A* search algorithm and minimum snap trajectory generation, ETRI J. 43 (2021), no. 6, 1013-1023. https://doi.org/10.4218/etrij.2020-0085
  5. K. Lee, H. J. Chang, J. Choi, B. Heo, A. Leonardis, and J. Y. Choi, Motion-aware ensemble of three-mode trackers for unmanned aerial vehicles, Mach. Vis. Appl. 32 (2021), 1-12. https://doi.org/10.1007/s00138-020-01119-9
  6. K. Yun, Y. Yoo, and J. Y. Choi, Motion interaction field for abnormal interactions, Mach. Vis. Appl. 28 (2016), no. 1-2, 1-17.
  7. D. Choi, S.-J. Han, K.-W. Min, and J. Choi, PathGAN: local path planning with attentive generative adversarial networks, ETRI J. 44 (2022), no. 6, 1004-1019. https://doi.org/10.4218/etrij.2021-0192
  8. S. Jung, H. Lee, D. H. Shim, and A. Agha-mohammadi, Collision-free local planner for unknown subterranean navigation, ETRI J. 43 (2021), no. 4, 580-593. https://doi.org/10.4218/etrij.2021-0087
  9. P. A. Phuong, H. C. Phap, and Q. H. Tho, Building a mathematics model for lane-change technology of autonomous vehicles, ETRI J. 44 (2022), no. 4, 641-653. https://doi.org/10.4218/etrij.2021-0129
  10. S. Seo and H. Jung, A robust collision prediction and detection method based on neural network for autonomous delivery robots, ETRI J. 45 (2023), no. 2, 329-337. https://doi.org/10.4218/etrij.2021-0397
  11. H. Shin, K.-I. Na, J. Chang, and T. Uhm, Multimodal layer surveillance map based on anomaly detection using multi-agents for smart city security, ETRI J. 44 (2022), no. 2, 183-193. https://doi.org/10.4218/etrij.2021-0395
  12. R. A. Rensink, Change detection, Annu. Rev. Psychol. 53 (2002), no. 1, 245-277. https://doi.org/10.1146/annurev.psych.53.100901.135125
  13. R. A. Rensink, J. K. O'Regan, and J. J. Clark, To see or not to see: the need for attention to perceive changes in scenes, Psychol. Sci. 8 (1997), no. 5, 368-373. https://doi.org/10.1111/j.1467-9280.1997.tb00427.x
  14. T. Zhou, S. Wang, Y. Zhou, Y. Yao, J. Li, and L. Shao, Motion-attentive transition for zero-shot video object segmentation, AAAI Conference on Artificial Intelligence (AAAI) 34 (2020). https://doi.org/10.1609/aaai.v34i07.7008
  15. K. Yun, H. Kim, K. Bae, and J. Park, Unsupervised moving object detection through background models for PTZ camera, (International Conference on Pattern Recognition (ICPR), Milan, Italy), 2021. https://doi.org/10.1109/ICPR48806.2021.9413085
  16. S. Caelles, J. Pont-Tuset, F. Perazzi, A. Montes, K.-K. Maninis, and L. Van Gool, The 2019 Davis challenge on VOS: Unsupervised multi-object segmentation, arXiv preprint, 2019. https://doi.org/10.48550/arXiv.1905.00737
  17. J. Choi, H. J. Chang, Y. J. Yoo, and J. Y. Choi, Robust moving object detection against fast illumination change, Comput. Vision Image Underst. 116 (2012), no. 2, 179-193. https://doi.org/10.1016/j.cviu.2011.10.007
  18. C. Stauffer and W. E. L. Grimson, Adaptive background mixture models for real-time tracking, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Fort Collins, CO, USA), 1999. https://doi.org/10.1109/CVPR.1999.784637
  19. B. N. Subudhi, M. K. Panda, T. Veerakumar, V. Jakhetiya, and S. Esakkirajan, Kernel-induced possibilistic fuzzy associate background subtraction for video scene, IEEE Trans. Comput. Soc. Syst. (2022), 1-12.
  20. O. Barnich and M. Van Droogenbroeck, ViBe: a universal background subtraction algorithm for video sequences, IEEE Trans. Image Process. 20 (2011), no. 6, 1709-1724. https://doi.org/10.1109/TIP.2010.2101613
  21. X. Bou, T. Ehret, G. Facciolo, J.-M. Morel, and R. G. von Gioi, Reviewing ViBe, a popular background subtraction algorithm for real-time applications, Image Process. Line 12 (2022), 527-549. https://doi.org/10.5201/ipol.2022.434
  22. F.-C. Cheng, B.-H. Chen, and S.-C. Huang, A background model re-initialization method based on sudden luminance change detection, Eng. Appl. Artif. Intel. 38 (2015), 138-146. https://doi.org/10.1016/j.engappai.2014.10.023
  23. T. Kryjak, M. Komorkiewicz, and M. Gorgon, Real-time implementation of foreground object detection from a moving camera using the vibe algorithm, Comput. Sci. Inform. Syst. 11 (2014), no. 4, 1617-1637. https://doi.org/10.2298/CSIS131218055K
  24. P. St-Charles, G. Bilodeau, and R. Bergevin, SuBSENSE: a universal change detection method with local adaptive sensitivity, IEEE Trans. Image Process. 24 (2015), no. 1, 359-373. https://doi.org/10.1109/TIP.2014.2378053
  25. S. Javed, P. Narayanamurthy, T. Bouwmans, and N. Vaswani, Robust PCA and robust subspace tracking: a comparative evaluation, (IEEE Statistical Signal Processing Workshop, Freiburg im Breisgau, Germany), 2018. https://doi.org/10.1109/SSP.2018.8450718
  26. M. Mandal and S. K. Vipparthi, An empirical review of deep learning frameworks for change detection: model design, experimental frameworks, challenges and research needs, IEEE Trans. Intell. Transp. Syst. 23 (2022), no. 7, 6101-6122. https://doi.org/10.1109/TITS.2021.3077883
  27. S. Messelodi, C. M. Modena, N. Segata, and M. Zanin, A Kalman filter based background updating algorithm robust to sharp illumination changes, (International Conference on Image Analysis and Processing (ICIAP), Cagliari, Italy), 2005, pp. 163-170.
  28. J. H. Giraldo and T. Bouwmans, Semi-supervised background subtraction of unseen videos: minimization of the total variation of graph signals, (IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirateds), 2020. https://doi.org/10.1109/ICIP40778.2020.9190887
  29. J. H. Giraldo, S. Javed, M. Sultana, S. K. Jung, and T. Bouwmans, The emerging field of graph signal processing for moving object segmentation, (Frontiers of Computer Vision (FCV), Daegu, Republic of Korea), 2021, pp. 31-45.
  30. B. Garcia-Garcia, T. Bouwmans, and A. J. R. Silva, Background subtraction in real applications: challenges, current models and future directions, Comput. Sci. Rev. 35 (2020), 100204.
  31. S. W. Kim, K. Yun, K. M. Yi, S. J. Kim, and J. Y. Choi, Detection of moving objects with a moving camera using nonpanoramic background model, Mach. Vis. Appl. 24 (2013), 1015-1028. https://doi.org/10.1007/s00138-012-0448-y
  32. F. J. Lopez-Rubio and E. Lopez-Rubio, Foreground detection for moving cameras with stochastic approximation, Pattern Recognit. Lett. 68 (2015), 161-168.
  33. T. Minematsu, H. Uchiyama, A. Shimada, H. Nagahara, and R. Taniguchi, Evaluation of foreground detection methodology for a moving camera, (Frontiers of Computer Vision (FCV), Mokpo, Republic of Korea), 2015. https://doi.org/10.1109/FCV.2015.7103752
  34. K. M. Yi, K. Yun, S. W. Kim, H. J. Chang, H. Jeong, and J. Y. Choi, Detection of moving objects with non-stationary cameras in 5.8 ms: bringing motion detection to your mobile device, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Porland, OR, USA), 2013. https://doi.org/10.1109/CVPRW.2013.9
  35. K. Yun and J. Y. Choi, Robust and fast moving object detection in a non-stationary camera via foreground probability based sampling, (IEEE International Conference on Image Processing (ICIP), Quebec, Canada), 2015. https://doi.org/10.1109/ICIP.2015.7351738
  36. K. Yun, J. Lim, and J. Y. Choi, Scene conditional background update for moving object detection in a moving camera, Pattern Recognit. Lett. 88 (2017), 57-63.
  37. M.-N. Chapel and T. Bouwmans, Moving objects detection with a moving camera: a comprehensive review, Comput. Sci. Rev. 38 (2020), 100310.
  38. T. Bouwmans, S. Javed, M. Sultana, and S. K. Jung, Deep neural network concepts for background subtraction: a systematic review and comparative evaluation, Neural Netw. 117 (2019), 8-66. https://doi.org/10.1016/j.neunet.2019.04.024
  39. S. Caelles, K. K. Maninis, J. Pont-Tuset, L. Leal-Taixe, D. Cremers, and L. Van Gool, One-shot video object segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017.
  40. S. W. Oh, J. Y. Lee, K. Sunkavalli, and S. J. Kim, Fast video object segmentation by reference-guided mask propagation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake, UT, USA), 2018.
  41. F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. Sorkine-Hornung, Learning video object segmentation from static images, (IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake, UT, USA), 2017.
  42. P. Voigtlaender and B. Leibe, Online adaptation of convolutional neural networks for video object segmentation, (British Machine Vision Conference (BMVC), London, UK), 2017.
  43. S. W. Oh, J.-Y. Lee, N. Xu, and S. J. Kim, Video object segmentation using space-time memory networks, (IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea), 2019. https://doi.org/10.1109/ICCV.2019.00932
  44. S. Cho, M. Lee, S. Lee, C. Park, D. Kim, and S. Lee, Treating motion as option to reduce motion dependency in unsupervised video object segmentation, (IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA), 2023, pp. 5140-5149.
  45. A. Dave, P. Tokmakov, and D. Ramanan, Towards segmenting anything that moves, (IEEE/CVF International Conference on Computer Vision Workshops, Seo), 2019. https://doi.org/10.1109/ICCVW.2019.00187
  46. M. Lee, S. Cho, S. Lee, C. Park, and S. Lee, Unsupervised video object segmentation via prototype memory network, (IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA), 2023. https://doi.org/10.1109/WACV56688.2023.00587
  47. P. Tokmakov, C. Schmid, and K. Alahari, Learning to segment moving objects, Int. J. Comput. Vis. 127 (2019), no. 3, 282-301. https://doi.org/10.1007/s11263-018-1122-2
  48. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA), 2018. https://doi.org/10.1109/CVPR.2018.00931
  49. D. Kim, S. Woo, J.-Y. Lee, and I. S. Kweon, Deep video inpainting, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CAU, USA), 2019. https://doi.org/10.1109/CVPR.2019.00594
  50. Y. Zeng, J. Fu, and H. Chao, Learning joint spatial-temporal transformations for video in painting, (European Conference on Computer Vision (ECCV), Glasgow, UK), 2020, pp. 528-543.
  51. X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, U2-Net: going deeper with nested U-structure for salient object detection, Pattern Recognit. 106 (2020), 107404.
  52. O. Ronneberger, P. Fischer, and T. Brox, U-net: convolutional networks for biomedical image segmentation, (Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany), 2015, pp. 234-241.
  53. M. A. Fischler and R. C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Commun. ACM. 24 (1981), no. 6, 381-395. https://dl.acm.org/doi/abs/10.1145/358669.358692
  54. S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski, A database and evaluation methodology for optical flow, Int. J. Comput. Vision 92 (2011), 1-31. https://doi.org/10.1007/s11263-010-0390-2
  55. F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, A benchmark dataset and evaluation methodology for video object segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA), 2016.
  56. N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. Price, S. Cohen, and T. Huang, Youtube-VOS: sequence-to-sequence video object segmentation, (European Conference on Computer Vision (ECCV), Munich, Germany) 2018, pp. 603-619.
  57. Y. Wang, P.-M. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, and P. Ishwar, CDnet 2014: an expanded change detection benchmark dataset, (IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Colmbus, OH, USA), 2014.
  58. D. P. Kingma and J. Ba, Adam: a method for stochastic optimization, 2014. https://doi.org/10.48550/arXiv.1412.6980
  59. J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, A simple pooling-based design for real-time salient object detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00404
  60. N. Liu, J. Han, and M.-H. Yang, Picanet: learning pixel-wise contextual attention for saliency detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA), 2018. https://doi.org/10.1109/CVPR.2018.00326
  61. I. Akhter, M. Ali, M. Faisal, and R. Hartley, EpO-Net: exploiting geometric constraints on dense trajectories for motion saliency, (IEEE/CVFWinter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA), 2020. https://doi.org/10.1109/WACV45572.2020.9093589
  62. J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, Segflow: joint learning for video object segmentation and optical flow, (IEEE/CVF International Conference on Computer Vision (ICCV), Venice, Italy), 2017.
  63. B. Griffin and J. Corso, Tukey-inspired video object segmentation, (IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA), 2019. https://doi.org/10.1109/WACV.2019.00188
  64. S. D. Jain, B. Xiong, and K. Grauman, Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017.
  65. M. Keuper, B. Andres, and T. Brox, Motion trajectory segmentation via minimum cost multicuts, (IEEE/CVF International Conference on Computer Vision (ICCV), Santiago, Chile), 2015. https://doi.org/10.1109/ICCV.2015.374
  66. Y. J. Koh and C.-S. Kim, Primary object segmentation in videos based on region augmentation and reduction, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR. 2017.784
  67. D. Lao and G. Sundaramoorthi, Extending layered models to 3D motion, (European Conference on Computer Vision (ECCV), Munich, Germany), 2018, pp. 441-457.
  68. X. Lu, W. Wang, C. Ma, J. Shen, L. Shao, and F. Porikli, See more, know more: unsupervised video object segmentation with co-attention siamese networks, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00374
  69. X. Lu, W. Wang, J. Shen, Y.-W. Tai, D. J. Crandall, and S. C. H. Hoi, Learning video object segmentation from unlabeled videos, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA), 2020. https://doi.org/10.1109/CVPR42600.2020.00898
  70. M. Siam, C. Jiang, S. Lu, L. Petrich, M. Gamal, M. Elhoseiny, and M. Jagersand, Video object segmentation using teacher-student adaptation in a human robot interaction (HRI) setting, (International Conference on Robotics and Automation (ICRA), Montreal, Canada), 2019. https://doi.org/10.1109/ICRA.2019.8794254
  71. H. Song, W. Wang, S. Zhao, J. Shen, and K.-M. Lam, Pyramid dilated deeper ConvLSTM for video salient object detection, (European Conference on Computer Vision (ECCV), Munich, Germany), 2018, pp. 744-760.
  72. B. Taylor, V. Karasev, and S. Soatto, Causal video object segmentation from persistence of occlusions, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA), 2015. https://doi.org/10.1109/CVPR.2015.7299055
  73. P. Tokmakov, K. Alahari, and C. Schmid, Learning motion patterns in videos, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017. https://doi.org/10.1109/CVPR.2017.64
  74. W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. H. Hoi, and H. Ling, Learning unsupervised video object segmentation through visual attention, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA), 2019. https://doi.org/10.1109/CVPR.2019.00318
  75. Z. Yang, Q. Wang, L. Bertinetto, W. Hu, S. Bai, and P. H. S. Torr, Anchor diffusion for unsupervised video object segmentation, (IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea) 2019. https://doi.org/10.1109/ICCV.2019.00102
  76. T. Zhuo, Z. Cheng, P. Zhang, Y. Wong, and M. Kankanhalli, Unsupervised online video object segmentation with motion property understanding, IEEE Trans. Image Process. 29 (2019), 237-249. https://doi.org/10.1109/TIP.2019.2930152
  77. X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, BASNet: boundary-aware salient object detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA) 2019. https://doi.org/10.1109/CVPR.2019.00766