Large-scale Language-image Model-based Bag-of-Objects Extraction for Visual Place Recognition

영상 기반 위치 인식을 위한 대규모 언어-이미지 모델 기반의 Bag-of-Objects 표현

  • Seung Won Jung (School of Mechanical Engineering, Korea University of Technology and Education) ;
  • Byungjae Park (School of Mechanical Engineering, Korea University of Technology and Education)
  • 정승운 (한국기술교육대학교 기계공학부) ;
  • 박병재 (한국기술교육대학교 기계공학부)
  • Received : 2024.02.16
  • Accepted : 2024.03.12
  • Published : 2024.03.31


We proposed a method for visual place recognition that represents images using objects as visual words. Visual words represent the various objects present in urban environments. To detect various objects within the images, we implemented and used a zero-shot detector based on a large-scale image language model. This zero-shot detector enables the detection of various objects in urban environments without additional training. In the process of creating histograms using the proposed method, frequency-based weighting was applied to consider the importance of each object. Through experiments with open datasets, the potential of the proposed method was demonstrated by comparing it with another method, even in situations involving environmental or viewpoint changes.



이 논문은 2023년도 정부(과학기술정통부)의 재원으로 한국과학재단의 지원을 받아 수행된 연구임 (No. 2021R1F1A1057949).


