DOI QR코드

DOI QR Code

Integration of Multi-scale CAM and Attention for Weakly Supervised Defects Localization on Surface Defective Apple

  • Nguyen Bui Ngoc Han (ICT System and Convergence from Chonnam National University) ;
  • Ju Hwan Lee (ICT System and Convergence from Chonnam National University) ;
  • Jin Young Kim (Department of Electrical Engineering, Chonnam National University)
  • 투고 : 2023.08.22
  • 발행 : 2023.10.31

초록

Weakly supervised object localization (WSOL) is a task of localizing an object in an image using only image-level labels. Previous studies have followed the conventional class activation mapping (CAM) pipeline. However, we reveal the current CAM approach suffers from problems which cause original CAM could not capture the complete defects features. This work utilizes a convolutional neural network (CNN) pretrained on image-level labels to generate class activation maps in a multi-scale manner to highlight discriminative regions. Additionally, a vision transformer (ViT) pretrained was treated to produce multi-head attention maps as an auxiliary detector. By integrating the CNN-based CAMs and attention maps, our approach localizes defective regions without requiring bounding box or pixel-level supervision during training. We evaluate our approach on a dataset of apple images with only image-level labels of defect categories. Experiments demonstrate our proposed method aligns with several Object Detection models performance, hold a promise for improving localization.

키워드

과제정보

This work was supported by the Technological Innovation R&D Program(S3294129) funded by the Ministry of SMEs and Startups(MSS, Korea)

참고문헌

  1. X. Zhang, Y. Wei, J. Feng, Y. Yang and T. Huang, "Adversarial Complementary Learning for Weakly Supervised Object Localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019.
  2. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  3. A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in neural information processing systems 25, 2012.
  4. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
  5. J. Long, N. Zhang and T. Darrell, "Do convnets learn correspondence?," in Advances in neural information processing systems, 27, 2014.
  6. B. Wang, C. Yuan, B. Li, X. Ding, Z. Li, Y. Wu and W. Hu, "Multi-scale low discriminative feature reactivation for weakly supervised object localization," IEEE Transactions on Image Processing, vol. 30, pp. 6050-6065, 2021. https://doi.org/10.1109/TIP.2021.3091833
  7. X. Zhou, Y. Li, G. Cao and W. Cao, "Master-CAM: Multi-scale fusion guided by Master map for high-quality class activation maps," Displays, vol. 76, p. 102339, 2023.
  8. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv preprint arXiv:2010.11929, 2020.
  9. T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick and P. Dollar, "Microsoft COCO: Common Objects in Context," in Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 2014.
  10. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009.
  11. A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, T. Duerig and V. Ferrari, "The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale," International Journal of Computer Vision, vol. 128(7), pp. 1956-1981, 2020.
  12. W. Bae, J. Noh and G. Kim, "Rethinking class activation mapping for weakly supervised object localization," in Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XV 16, 2020.
  13. D. Zhang, J. Han, G. Cheng and M.-H. Yang, "Weakly Supervised Object Localization and Detection: A Survey," IEEE transactions on pattern analysis and machine intelligence, vol. 44(9), pp. 5866-5885, 2021.
  14. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization," in Proceedings of the IEEE international conference on computer vision, 2017.
  15. K. K. Singh and Y. J. Lee, "Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization," in Proceedings of the IEEE International Conference on Computer Vision, 2017.
  16. Z. W. Haofan Wang, M. Du, F. Yang, Z. Zhang, S. Ding, P. Mardziel and X. Hu, "Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020.
  17. X. Zhang, Y. Wei, G. Kang, Y. Yang and T. Huang, "Self-produced Guidance for Weakly-supervised Object Localization," in Proceedings of the European conference on computer vision (ECCV), 2018.
  18. J. Choe and H. Shim, "Attention-based Dropout Layer for Weakly Supervised Object Localization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  19. H. Xue, C. Liu, F. Wan, J. Jiao, X. Ji and Q. Ye, "DANet: Divergent Activation for Weakly Supervised Object Localization," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
  20. S. Desai and H. G. Ramaswamy, "Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization," in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020.
  21. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, "Object Detectors Emerge in Deep Scene CNNs," arXiv preprint arXiv:1412.6856, 2014.
  22. X. Ma, Z. Ji, S. Niu, T. Leng, D. L. Rubin and Q. Chen, "MS-CAM: Multi-Scale Class Activation Maps for Weakly-Supervised Segmentation of Geographic Atrophy Lesions in SD-OCT Images," IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 12, pp. 3443-3455, 2020. https://doi.org/10.1109/JBHI.2020.2999588
  23. T. Liu, H. Zheng, J. Bao, P. Zheng, J. Wang, C. Yang and J. Gu, "An Explainable Laser Welding Defect Recognition Method Based on Multi-Scale Class Activation Mapping," IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-12, 2022. https://doi.org/10.1109/TIM.2022.3148739
  24. C. Robinson, L. Hou, K. Malkin, R. Soobitsky, J. Czawlytko and B. Dilkina, "Large Scale High-Resolution Land Cover Mapping with Multi-Resolution Data," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  25. K. KC, Z. Yin, D. Li and Z. Wu, "Impacts of background removal on convolutional neural networks for plant disease classification in-situ," Agriculture, vol. 11, no. 9, p. 827, 2021.
  26. K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  27. M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in In International conference on machine learning, 2019.
  28. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015.
  29. H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, "Pyramid Scene Parsing Network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
  30. G. Lin, A. Milan, C. Shen and I. Reid, "RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  31. K. H. R. G. J. S. Shaoqing Ren, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in Advances in neural information processing systems, 2015.
  32. T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, "Focal Loss for Dense Object Detection," in Proceedings of the IEEE international conference on computer vision, 2017.
  33. D. A. D. E. Wei Liu, C. Szegedy, S. Reed, C.-Y. Fu and A. C. Berg, "SSD: Single Shot MultiBox Detector," in Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I 14, 2016.
  34. Caron, Mathilde, et al. "Emerging properties in self-supervised vision transformers." Proceedings of the IEEE/CVF international conference on computer vision. 2021.