Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

Chae, Hyeok-Joo;Lee, Daniel;Park, Su-Jeong;Choi, Han-Lim;Park, Han-Sol;An, Kyeong-Soo;

doi:10.9766/KIMST.2020.23.4.399

한국군사과학기술학회지 (Journal of the Korea Institute of Military Science and Technology)

제23권4호
/
Pages.399-406
/
2020
/
1598-9127(pISSN)
/
2636-0640(eISSN)

한국군사과학기술학회 (The Korea Institute of Military Science and Technology)

DOI QR Code

심층 강화학습을 이용한 시변 비례 항법 유도 기법

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

채혁주 (한국과학기술원 항공우주공학과) ;
이단일 (한국과학기술원 항공우주공학과) ;
박수정 (한국과학기술원 항공우주공학과) ;
최한림 (한국과학기술원 항공우주공학과) ;
박한솔 (한화시스템(주) 항공연구센터) ;
안경수 (한화시스템(주) 항공연구센터)

Chae, Hyeok-Joo (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Lee, Daniel (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Park, Su-Jeong (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Choi, Han-Lim (Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology) ;
Park, Han-Sol (Avionics R&D Center, Hanwha Systems) ;
An, Kyeong-Soo (Avionics R&D Center, Hanwha Systems)

투고 : 2020.04.11
심사 : 2020.06.26
발행 : 2020.08.05

https://doi.org/10.9766/KIMST.2020.23.4.399 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we propose a time-varying proportional navigation guidance law that determines the proportional navigation gain in real-time according to the operating situation. When intercepting a target, an unidentified evasion strategy causes a loss of optimality. To compensate for this problem, proper proportional navigation gain is derived at every time step by solving an optimal control problem with the inferred evader's strategy. Recently, deep reinforcement learning algorithms are introduced to deal with complex optimal control problem efficiently. We adapt the actor-critic method to build a proportional navigation gain network and the network is trained by the Proximal Policy Optimization(PPO) algorithm to learn an evasion strategy of the target. Numerical experiments show the effectiveness and optimality of the proposed method.

키워드

참고문헌

Hangju Cho, "Navigation Constants in PNG Law and the Associated Optimal Control Problems," Proc. Korean Automatic Control Conference, Seoul, Korea, pp. 578-583, 1992.
Vitalij Garber, "Optimum Intercept Laws for Accelerating Targets," AIAA Journal, Vol. 6, No. 11, pp. 2196-2198, 1968. https://doi.org/10.2514/3.4962
In-Soo Jeon, and Jin-Ik Lee, "Analysis on Optimality of Proportional Navigation with Timevarying Velocity," Journal of the Korean Society for Aeronautical & Space Sciences, Vol. 37, No. 10, pp. 998-1001, 2009. https://doi.org/10.5139/JKSAS.2009.37.10.998
Christopher JCH Watkins and Peter Dayan, "Q-learning," Machine Learning, Vol. 8, No. 3-4, pp. 279-292, 1992. https://doi.org/10.1007/BF00992698
David Silver, et al., "Mastering the Game of Go with Deep Neural Networks and Tree Search," Nature, Vol. 529, No. 7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961
Yan Duan, et al., "Benchmarking Deep Reinforcement Learning for Continuous Control," International Conference on Machine Learning, pp. 1329-1338, 2016.
Tuomas Haarnoja, et al., "Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor," arXiv preprint arXiv:1801.01290, 2018.
Ernest Cockayne, "Plane Pursuit with Curvature Constraints," SIAM Journal on Applied Mathematics, Vol. 15, No. 6, pp. 1511-1516, 1967. https://doi.org/10.1137/0115133
G. T. Rublein, "On Pursuit with Curvature Constraints," SIAM Journal on Control, Vol. 10, No. 1, pp. 37-39, 1972. https://doi.org/10.1137/0310003
Josef Shinar, Moshe Guelman, and Alon Green, "An Optimal Guidance Law for a Planar Pursuit-evasion Game of Kind," Computers & Mathematics with Applications, Vol. 18, No. 1-3, pp. 35-44, 1989. https://doi.org/10.1016/0898-1221(89)90122-3
John Schulman, et al., "Proximal Policy Optimization Algorithms," arXiv preprint arXiv:1707.06347, 2017.
Vijay R. Konda, and John N. Tsitsiklis, "Actor-critic Algorithms," Advances in Neural Information Processing Systems, pp. 1008-1014, 2000.
Volodymyr Mnih, et al., "Asynchronous Methods for Deep Reinforcement Learning," International Conference on Machine Learning, 2016.

한국군사과학기술학회지 (Journal of the Korea Institute of Military Science and Technology)

심층 강화학습을 이용한 시변 비례 항법 유도 기법

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)