Visual Explanation of Black-box Models Using Layer-wise Class Activation Maps from Approximating Neural Networks

신경망 근사에 의한 다중 레이어의 클래스 활성화 맵을 이용한 블랙박스 모델의 시각적 설명 기법

  • Received : 2021.06.22
  • Accepted : 2021.07.22
  • Published : 2021.08.31


In this paper, we propose a novel visualization technique to explain the predictions of deep neural networks. We use knowledge distillation (KD) to identify the interior of a black-box model for which we know only inputs and outputs. The information of the black box model will be transferred to a white box model that we aim to create through the KD. The white box model will learn the representation of the black-box model. Second, the white-box model generates attention maps for each of its layers using Grad-CAM. Then we combine the attention maps of different layers using the pixel-wise summation to generate a final saliency map that contains information from all layers of the model. The experiments show that the proposed technique found important layers and explained which part of the input is important. Saliency maps generated by the proposed technique performed better than those of Grad-CAM in deletion game.



이 논문은 정부 (과학기술통신부)의 재원으로 한국연구재단 (No. 2019R1F1A1061941) 및 전북대학교 인공지능응용기술연구센터의 지원을 받아 수행된 연구임.


