DOI QR코드

DOI QR Code

Expanded Object Localization Learning Data Generation Using CAM and Selective Search and Its Retraining to Improve WSOL Performance

CAM과 Selective Search를 이용한 확장된 객체 지역화 학습데이터 생성 및 이의 재학습을 통한 WSOL 성능 개선

  • 고수연 (숙명여자대학교 컴퓨터과학과) ;
  • 최영우 (숙명여자대학교 컴퓨터과학과)
  • Received : 2021.02.09
  • Accepted : 2021.05.31
  • Published : 2021.09.30

Abstract

Recently, a method of finding the attention area or localization area for an object of an image using CAM (Class Activation Map)[1] has been variously carried out as a study of WSOL (Weakly Supervised Object Localization). The attention area extraction from the object heat map using CAM has a disadvantage in that it cannot find the entire area of the object by focusing mainly on the part where the features are most concentrated in the object. To improve this, using CAM and Selective Search[6] together, we first expand the attention area in the heat map, and a Gaussian smoothing is applied to the extended area to generate retraining data. Finally we train the data to expand the attention area of the objects. The proposed method requires retraining only once, and the search time to find an localization area is greatly reduced since the selective search is not needed in this stage. Through the experiment, the attention area was expanded from the existing CAM heat maps, and in the calculation of IOU (Intersection of Union) with the ground truth for the bounding box of the expanded attention area, about 58% was improved compared to the existing CAM.

최근 CAM[1]을 이용해서 이미지의 객체에 대한 주의 영역 또는 지역화(Localization) 영역을 찾는 방법이 WSOL의 연구로서 다양하게 수행되고 있다. CAM을 이용한 객체의 히트(Heat) 맵에서 주의 영역 추출은 객체의 특징이 가장 많이 모여 있는 영역만을 주로 집중해서 객체의 전체적인 영역을 찾지 못하는 단점이 있다. 여기서는 이를 개선하기 위해서 먼저 CAM과 Selective Search[6]를 함께 이용하여 CAM 히트맵의 주의 영역을 확장하고, 확장된 영역에 가우시안 스무딩을 적용하여 재학습 데이터를 만든 후, 이를 학습하여 객체의 주의 영역이 확장되는 방법을 제안한다. 제안 방법은 단 한 번의 재학습만이 필요하며, 학습 후 지역화를 수행할 때는 Selective Search를 실행하지 않기 때문에 처리 시간이 대폭 줄어든다. 실험에서 기존 CAM의 히트맵들과 비교했을 때 핵심 특징 영역으로부터 주의 영역이 확장되고, 확장된 주의 영역 바운딩 박스에 대한 Ground Truth와의 IOU 계산에서 기존 CAM보다 약 58%가 개선되었다.

Keywords

Acknowledgement

이 논문은 한국연구재단 기초연구과제에 의하여 연구되었음(No. NRF-2017R1D1A1B04035633).

References

  1. B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," Computer Vision and Pattern Recognition, pp.2921-2929, 2016.
  2. X. Zhang, Y. Wei, J. Feng, Y. Yang, and T. Huang, "Adversarial complementary learning for weakly supervised object localization," in IEEE Computer Vision and Pattern Recognition, pp.1325-1334, 2018.
  3. K. K. Singh and Y. J. Lee, "Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization," arXiv preprint arXiv:1704.04232, 2017.
  4. S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, "Cutmix: Regularization strategy to train strong classifiers with localizable features," in International Conference on Computer Vision, pp.6022-6031, 2019.
  5. X. Zhang, Y. Wei, Y. Yang and F. Wu. Rethinking Localization Map: Towards Accurate Object Perception with Self-Enhancement Maps. Computer Vision and Pattern Recognition preprint, 2020.
  6. J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, "Selective search for object recognition," International Computer of Computer Vision, Vol.104, pp.154-171, 2013. https://doi.org/10.1007/s11263-013-0620-5
  7. L. Bazzani, A. Bergamo, D. Anguelov, and L. Torresani, "Self-taught object localization with deep networks," 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2016.
  8. A. J. Bency, H. Kwon, H. Lee, S. Karthikeyan, and B. Manjunath, "Weakly supervised localization using deep feature maps," European Conference on Computer Vision, pp.714-731, Springer, 2016.
  9. D. Li, J. B. Huang, Y. Li, S. Wang, and M. H. Yang, "Weakly supervised object localization with progressive domain adaption," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  10. P. Felzenszwalb and D. Huttenlocher, "Efficient graph-based image segmentation," International Journal of Computer Vision, Vol.59, No.2, Sep. 2004.
  11. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  12. S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, Vol.39, pp.1137-1149, 2015.
  13. A. Kolesnikov and C. H. Lampert, "Seed, expand and constrain: Three principles for weakly-supervised image segmentation," In European Conference on Computer Vision, pp.695-711, 2016.
  14. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," Conference on Computer Vision and Pattern Recognition, pp.248-255, 2009.
  15. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.