DOI QR코드

DOI QR Code

Adversarial Attacks on Reinforce Learning Model and Countermeasures Using Image Filtering Method

강화학습 모델에 대한 적대적 공격과 이미지 필터링 기법을 이용한 대응 방안

  • Seungyeol Lee (Hoseo University) ;
  • Jaecheol Ha (Hoseo University)
  • 이승열 (호서대학교) ;
  • 하재철 (호서대학교)
  • Received : 2024.09.03
  • Accepted : 2024.10.14
  • Published : 2024.10.31

Abstract

Recently, deep neural network-based reinforcement learning models have been applied in various advanced industrial fields such as autonomous driving, smart factories, and home networks, but it has been shown to be vulnerable to malicious adversarial attack. In this paper, we applied deep reinforcement learning models, DQN and PPO, to the autonomous driving simulation environment HighwayEnv and conducted three adversarial attacks: FGSM(Fast Gradient Sign Method), BIM(Basic Iterative Method), PGD(Projected Gradient Descent) and CW(Carlini and Wagner). In order to respond to adversarial attack, we proposed a method for deep learning models based on reinforcement learning to operate normally by removing noise from adversarial images using a bilateral filter algorithm. Furthermore, we analyzed performance of adversarial attacks using two popular metrics such as average of episode duration and the average of the reward obtained by the agent. In our experiments on a model that removes noise of adversarial images using a bilateral filter, we confirmed that the performance is maintained as good as when no adversarial attack was performed.

최근 심층 신경망을 이용한 강화학습 모델들이 자율주행, 스마트 팩토리, 홈 네트워크 등 다양한 첨단 산업 분야에 사용되고 있으나 적대적 공격(adversarial attacks)에 취약하다는 것이 밝혀졌다. 본 논문에서는 강화학습 기반의 딥러닝 모델인 DQN과 PPO를 자율주행 가상환경 HighwayEnv에 적용하여 FGSM(Fast Gradient Sign Method), BIM(Basic Iterative Method), PGD(Projected Gradient Descent) 그리고 CW(Carlini and Wagner)을 이용하여 적대적 공격을 수행하였다. 적대적 공격에 대응하기 위해 양방향 필터(bilateral filter) 알고리즘을 사용하여 적대적 이미지의 잡음을 제거함으로써 강화학습 기반의 딥러닝 모델들이 정상적으로 작동할 수 있는 방법을 제안하였다. 그리고 HighwayEnv 환경에서 에피소드 수행 길이(episode during)의 평균과 에이전트가 획득한 보상(episode reward)의 평균을 성능평가 지표로 사용하여 공격의 성능을 평가하였다. 실험 결과 양방향 필터를 통해 적대적 이미지의 잡음을 제거한 결과, 적대적 공격이 수행되기 이전의 성능을 유지할 수 있음을 보였다.

Keywords

Acknowledgement

본 논문은 2024년도 교육부의 재원으로 한국연구재단의 지원을 받아 수행된 지자체-대학 협력 기반 지역혁신 사업의 결과입니다. (No. 2021RIS-004)

References

  1. R. S. Sutton and A. G. Barto. "Reinforcement Learning: An Introduction," 2nd ed. MIT Press, 2018.
  2. Y. C. Lin, Z.W. Hong, Y. H. Liao, M. L. Shih, M. Y. Liu, and M. Sun, "Tactics of adversarial attack on deep reinforcement learning agents," in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), 2017, pp. 3756-3762
  3. S. S. Zheng, Y. Song, T. Leung and I. Goodfellow, "Improving the robustness of deep neural networks via stability training," IEEE Conference on Computer Vision and Pattern Recognition(CVPR'16), pp. 4480-4488, 2016.
  4. W. Zhao, S. Alwidian and Q. H. Mahmoud, "Adversarial Training Methods for Deep Learning: A Systematic Review," Journal of Algorithms, Vol. 15, Issue 8(283), 2022.
  5. W. Xu, D. Evans, and Y. Qi, "Feature squeezing:Detecting adversarial examples in deep neural networks," Network and Distributed System-Security Symposium(NDSS'18), 2018.
  6. S. L. Yin., X. L. Zhang, and L. Y. Zuo, "Defending against adversarial attacks using spherical sampling-based variational auto-ender", Neurocomputing, pp. 1-10, 2022.
  7. X. Feng, Y. Xie. , M. Ye, X. Tang, and B. Yuan. "Fake Gradient: A Security and Privacy Protection Framework for CNN-based Image Classification," In Proc ACM, pp. 5510-5518, 2021.
  8. Leurent, E.: An environment for auton omous driving decisionmaking,: https://github.com/eleurent/highway-env,last accessed 2018.
  9. I. J. Goodfellow, J. Shelns, and C. zegedy, "Explaining and harnessing dversarial examples," In International Conference on Learning Representations, pp. 1-11, 2015.
  10. Kurakin, A., Goodfellow. I., and Bengio. S.: Adversarial examples in the physical world. In: Int. Conf. Learning Representations, pp. 1-14, 2018.
  11. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, "Towards deep learning models resistant to adversarial attacks," In International Conference on Learning Representations, 2018.
  12. N. Carlini and D. Wagner, "Towards evaluating the robustness of neural networks," In IEEE symposium on scurity and privacy, pp. 39-57, 2017.
  13. S. Paris, F. Durand, "A fast approximation of the bilateral filter using a signal processing approach", International Journal of Computer Vision, Vol. 81, No. 1, pp. 24-52, Jan 2009.
  14. Xu. Long and N. H. Kim, "An Improved Adaptive Median Filter for Impulse Noise Removal", Journal of KIICE, Vol. 17, No. 4, pp. 989-995, April 2013.
  15. M. Elad, "On the origin of the bilateral filter and ways to improve it," IEEE Trans. Image Processing, Vol. 11, No. 10, pp. 1141-1151, October 2002.
  16. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Humanlevel control through deep reinforcement learning," Nature, vol. 581, no. 7549, pp. 529-533, 2015.
  17. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," Jul. 2017.
  18. I. J. Goodfellow, J. Shlens, and C. Szegedy, " Explaining and harnessing adversarial examples," in Proc ICLR'15, pp. 1-11, 2015.
  19. M. Laurens and H. Geoffrey, "Visualizing Data using t-SNE," Journal of Machine Learning Research pp. 2579-2605, 2018.