DOI QR코드

DOI QR Code

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

심층 강화학습을 이용한 시변 비례 항법 유도 기법

  • Chae, Hyeok-Joo (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
  • Lee, Daniel (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
  • Park, Su-Jeong (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
  • Choi, Han-Lim (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
  • Park, Han-Sol (Avionics R&D Center, Hanwha Systems) ;
  • An, Kyeong-Soo (Avionics R&D Center, Hanwha Systems)
  • 채혁주 (한국과학기술원 항공우주공학과) ;
  • 이단일 (한국과학기술원 항공우주공학과) ;
  • 박수정 (한국과학기술원 항공우주공학과) ;
  • 최한림 (한국과학기술원 항공우주공학과) ;
  • 박한솔 (한화시스템(주) 항공연구센터) ;
  • 안경수 (한화시스템(주) 항공연구센터)
  • Received : 2020.04.11
  • Accepted : 2020.06.26
  • Published : 2020.08.05

Abstract

In this paper, we propose a time-varying proportional navigation guidance law that determines the proportional navigation gain in real-time according to the operating situation. When intercepting a target, an unidentified evasion strategy causes a loss of optimality. To compensate for this problem, proper proportional navigation gain is derived at every time step by solving an optimal control problem with the inferred evader's strategy. Recently, deep reinforcement learning algorithms are introduced to deal with complex optimal control problem efficiently. We adapt the actor-critic method to build a proportional navigation gain network and the network is trained by the Proximal Policy Optimization(PPO) algorithm to learn an evasion strategy of the target. Numerical experiments show the effectiveness and optimality of the proposed method.

Keywords

References

  1. Hangju Cho, "Navigation Constants in PNG Law and the Associated Optimal Control Problems," Proc. Korean Automatic Control Conference, Seoul, Korea, pp. 578-583, 1992.
  2. Vitalij Garber, "Optimum Intercept Laws for Accelerating Targets," AIAA Journal, Vol. 6, No. 11, pp. 2196-2198, 1968. https://doi.org/10.2514/3.4962
  3. In-Soo Jeon, and Jin-Ik Lee, "Analysis on Optimality of Proportional Navigation with Timevarying Velocity," Journal of the Korean Society for Aeronautical & Space Sciences, Vol. 37, No. 10, pp. 998-1001, 2009. https://doi.org/10.5139/JKSAS.2009.37.10.998
  4. Christopher JCH Watkins and Peter Dayan, "Q-learning," Machine Learning, Vol. 8, No. 3-4, pp. 279-292, 1992. https://doi.org/10.1007/BF00992698
  5. David Silver, et al., "Mastering the Game of Go with Deep Neural Networks and Tree Search," Nature, Vol. 529, No. 7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961
  6. Yan Duan, et al., "Benchmarking Deep Reinforcement Learning for Continuous Control," International Conference on Machine Learning, pp. 1329-1338, 2016.
  7. Tuomas Haarnoja, et al., "Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," arXiv preprint arXiv:1801.01290, 2018.
  8. Ernest Cockayne, "Plane Pursuit with Curvature Constraints," SIAM Journal on Applied Mathematics, Vol. 15, No. 6, pp. 1511-1516, 1967. https://doi.org/10.1137/0115133
  9. G. T. Rublein, "On Pursuit with Curvature Constraints," SIAM Journal on Control, Vol. 10, No. 1, pp. 37-39, 1972. https://doi.org/10.1137/0310003
  10. Josef Shinar, Moshe Guelman, and Alon Green, "An Optimal Guidance Law for a Planar Pursuit-evasion Game of Kind," Computers & Mathematics with Applications, Vol. 18, No. 1-3, pp. 35-44, 1989. https://doi.org/10.1016/0898-1221(89)90122-3
  11. John Schulman, et al., "Proximal Policy Optimization Algorithms," arXiv preprint arXiv:1707.06347, 2017.
  12. Vijay R. Konda, and John N. Tsitsiklis, "Actor-critic Algorithms," Advances in Neural Information Processing Systems, pp. 1008-1014, 2000.
  13. Volodymyr Mnih, et al., "Asynchronous Methods for Deep Reinforcement Learning," International Conference on Machine Learning, 2016.