Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

Chae, Hyeok-Joo;Lee, Daniel;Park, Su-Jeong;Choi, Han-Lim;Park, Han-Sol;An, Kyeong-Soo;

doi:10.9766/KIMST.2020.23.4.399

Journal of the Korea Institute of Military Science and Technology (한국군사과학기술학회지)

Volume 23 Issue 4
/
Pages.399-406
/
2020
/
1598-9127(pISSN)
/
2636-0640(eISSN)

The Korea Institute of Military Science and Technology (한국군사과학기술학회)

DOI QR Code

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

심층 강화학습을 이용한 시변 비례 항법 유도 기법

Chae, Hyeok-Joo (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Lee, Daniel (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Park, Su-Jeong (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Choi, Han-Lim (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Park, Han-Sol (Avionics R&D Center, Hanwha Systems) ;
An, Kyeong-Soo (Avionics R&D Center, Hanwha Systems)

채혁주 (한국과학기술원 항공우주공학과) ;
이단일 (한국과학기술원 항공우주공학과) ;
박수정 (한국과학기술원 항공우주공학과) ;
최한림 (한국과학기술원 항공우주공학과) ;
박한솔 (한화시스템(주) 항공연구센터) ;
안경수 (한화시스템(주) 항공연구센터)

Received : 2020.04.11
Accepted : 2020.06.26
Published : 2020.08.05

https://doi.org/10.9766/KIMST.2020.23.4.399 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a time-varying proportional navigation guidance law that determines the proportional navigation gain in real-time according to the operating situation. When intercepting a target, an unidentified evasion strategy causes a loss of optimality. To compensate for this problem, proper proportional navigation gain is derived at every time step by solving an optimal control problem with the inferred evader's strategy. Recently, deep reinforcement learning algorithms are introduced to deal with complex optimal control problem efficiently. We adapt the actor-critic method to build a proportional navigation gain network and the network is trained by the Proximal Policy Optimization(PPO) algorithm to learn an evasion strategy of the target. Numerical experiments show the effectiveness and optimality of the proposed method.

Keywords

References

Hangju Cho, "Navigation Constants in PNG Law and the Associated Optimal Control Problems," Proc. Korean Automatic Control Conference, Seoul, Korea, pp. 578-583, 1992.
Vitalij Garber, "Optimum Intercept Laws for Accelerating Targets," AIAA Journal, Vol. 6, No. 11, pp. 2196-2198, 1968. https://doi.org/10.2514/3.4962
In-Soo Jeon, and Jin-Ik Lee, "Analysis on Optimality of Proportional Navigation with Timevarying Velocity," Journal of the Korean Society for Aeronautical & Space Sciences, Vol. 37, No. 10, pp. 998-1001, 2009. https://doi.org/10.5139/JKSAS.2009.37.10.998
Christopher JCH Watkins and Peter Dayan, "Q-learning," Machine Learning, Vol. 8, No. 3-4, pp. 279-292, 1992. https://doi.org/10.1007/BF00992698
David Silver, et al., "Mastering the Game of Go with Deep Neural Networks and Tree Search," Nature, Vol. 529, No. 7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961
Yan Duan, et al., "Benchmarking Deep Reinforcement Learning for Continuous Control," International Conference on Machine Learning, pp. 1329-1338, 2016.
Tuomas Haarnoja, et al., "Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," arXiv preprint arXiv:1801.01290, 2018.
Ernest Cockayne, "Plane Pursuit with Curvature Constraints," SIAM Journal on Applied Mathematics, Vol. 15, No. 6, pp. 1511-1516, 1967. https://doi.org/10.1137/0115133
G. T. Rublein, "On Pursuit with Curvature Constraints," SIAM Journal on Control, Vol. 10, No. 1, pp. 37-39, 1972. https://doi.org/10.1137/0310003
Josef Shinar, Moshe Guelman, and Alon Green, "An Optimal Guidance Law for a Planar Pursuit-evasion Game of Kind," Computers & Mathematics with Applications, Vol. 18, No. 1-3, pp. 35-44, 1989. https://doi.org/10.1016/0898-1221(89)90122-3
John Schulman, et al., "Proximal Policy Optimization Algorithms," arXiv preprint arXiv:1707.06347, 2017.
Vijay R. Konda, and John N. Tsitsiklis, "Actor-critic Algorithms," Advances in Neural Information Processing Systems, pp. 1008-1014, 2000.
Volodymyr Mnih, et al., "Asynchronous Methods for Deep Reinforcement Learning," International Conference on Machine Learning, 2016.

Journal of the Korea Institute of Military Science and Technology (한국군사과학기술학회지)

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

심층 강화학습을 이용한 시변 비례 항법 유도 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)