Acknowledgement
이 논문은 2023년도 정부(과학기술정통부)의 재원으로 한국과학재단의 지원을 받아 수행된 연구임 (No. 2021R1F1A1057949).
References
- S. Singh, A. Gupta, and A. A. Efros, "Unsupervised discovery of mid-level discriminative patches", Proc. of Computer Vision-ECCV 2012: 12th European Conf. Computer Vision, pp. 73-86, Florence, Italy, 2012.
- D. G. Lowe, "Object recognition from local scale-invariant features", Proc. of the seventh IEEE international conf. computer vision, pp. 1150-1157, Kerkyra, Greece, 1999.
- H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features", Proc. of Computer Vision-ECCV 2006: 9th European Conf. Computer Vision, pp. 404-417, Graz, Austria, 2006.
- G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, "Visual categorization with bags of keypoints", Proc. of Workshop on statistical learning in computer vision, Vol. 1. No. 1-22, pp. 1-6, 2004.
- H. Jegou, M. Douze, C. Schmid, and P. Perez, "Aggregating local descriptors into a compact image representation", Proc. of 2010 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 3304-3311, San Francisco, USA, 2010.
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, "NetVLAD: CNN Architecture for Weakly Supervised Place Recognition", IEEE Trans. Pattern Anal. Mach. Intell., Vol. 40, No. 6, pp. 1437-1451, 2018.
- D. Filliat, "A visual bag of words method for interactive qualitative localization and mapping", Proc. of 2007 IEEE International Conf. Robotics and Automation, pp. 1-7, Rome, Italy, 2007.
- M. Cummins and P. Newman, "Appearance-only SLAM at large scale with FAB-MAP 2.0", Int. J. Rob. Res., Vol. 30, No. 9, pp. 1100-1123, 2011.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions", Proc. of the IEEE conf. computer vision and pattern recognition, pp. 73-86, Boston, USA, 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition", Proc. of the IEEE conf. computer vision and pattern recognition, pp. 770-778, Las Vegas, USA, 2016.
- J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks", Proc. of the IEEE conf. computer vision and pattern recognition, pp.7132-7141, Salt Lake City, USA, 2018.
- J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, "Bam: Bottleneck attention module", arXiv preprint arXiv:1807.06514, pp. 1-14, 2018.
- S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module", Proc. of the European conf. computer vision (ECCV), pp. 3-19, Munich, Germany, 2018.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need", Proc. of 31st Annual Conf. Neural Information Processing Systems (NIPS 2017), pp. 1-11, California, USA, 2017.
- M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, and I. Sutskever, "Generative pretraining from pixels", Proc. of In International conf. machine learning (ICML), pp. 1691-1703, 2020.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale", arXiv preprint arXiv:2010.11929, pp. 1-22, 2020.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, "Learning transferable visual models from natural language supervision", Proc. of International conf. machine learning (ICML), pp. 8748-8763, 2020.
- C. L. Zitnick and P. Dollar, "Edge boxes: Locating object proposals from edges", Proc. of Computer Vision-ECCV 2014: 13th European Conf., pp. 391-405, Zurich, Switzerland, 2014.
- X. Tan, K. Xu, Y. Cao, Y. Zhang, L. Ma, and R. W. H. Lau, "Night-time scene parsing with a large real dataset". IEEE Trans. Image Process., Vol. 30, pp. 9085-9098, 2021.