• Title/Summary/Keyword: Proximal Policy Optimization

Search Result 24, Processing Time 0.019 seconds

Optimal deployment of sonobuoy for unmanned aerial vehicles using reinforcement learning considering the target movement (표적의 이동을 고려한 강화학습 기반 무인항공기의 소노부이 최적 배치)

  • Geunyoung Bae;Juhwan Kang;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.214-224
    • /
    • 2024
  • Sonobuoys are disposable devices that utilize sound waves for information gathering, detecting engine noises, and capturing various acoustic characteristics. They play a crucial role in accurately detecting underwater targets, making them effective detection systems in anti-submarine warfare. Existing sonobuoy deployment methods in multistatic systems often rely on fixed patterns or heuristic-based rules, lacking efficiency in terms of the number of sonobuoys deployed and operational time due to the unpredictable mobility of the underwater targets. Thus, this paper proposes an optimal sonobuoy placement strategy for Unmanned Aerial Vehicles (UAVs) to overcome the limitations of conventional sonobuoy deployment methods. The proposed approach utilizes reinforcement learning in a simulation-based experimental environment that considers the movements of the underwater targets. The Unity ML-Agents framework is employed, and the Proximal Policy Optimization (PPO) algorithm is utilized for UAV learning in a virtual operational environment with real-time interactions. The reward function is designed to consider the number of sonobuoys deployed and the cost associated with sound sources and receivers, enabling effective learning. The proposed reinforcement learning-based deployment strategy compared to the conventional sonobuoy deployment methods in the same experimental environment demonstrates superior performance in terms of detection success rate, deployed sonobuoy count, and operational time.

Time-varying Proportional Navigation Guidance using Deep Reinforcement Learning (심층 강화학습을 이용한 시변 비례 항법 유도 기법)

  • Chae, Hyeok-Joo;Lee, Daniel;Park, Su-Jeong;Choi, Han-Lim;Park, Han-Sol;An, Kyeong-Soo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.23 no.4
    • /
    • pp.399-406
    • /
    • 2020
  • In this paper, we propose a time-varying proportional navigation guidance law that determines the proportional navigation gain in real-time according to the operating situation. When intercepting a target, an unidentified evasion strategy causes a loss of optimality. To compensate for this problem, proper proportional navigation gain is derived at every time step by solving an optimal control problem with the inferred evader's strategy. Recently, deep reinforcement learning algorithms are introduced to deal with complex optimal control problem efficiently. We adapt the actor-critic method to build a proportional navigation gain network and the network is trained by the Proximal Policy Optimization(PPO) algorithm to learn an evasion strategy of the target. Numerical experiments show the effectiveness and optimality of the proposed method.

Adaptive Fast Calibration Method for Active Phased Array Antennas using PPO Algorithm (PPO 알고리즘을 이용한 능동위상배열안테나 적응형 고속 보정 방법)

  • Sunge Lee;Kisik Byun;Hong-Jib, Yoon
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.636-643
    • /
    • 2023
  • In this paper, a high-speed calibration method for phased array antennas in the far-field is presented A max calibration, which is a simplification of the rotating-element electric-field vector (REV) method that calibrates each antenna element only through received power, and a method of grouping calibrations by sub-array unit rather than each antenna element were proposed. Using the Proximal Policy Optimization (PPO) algorithm, we found a partitioning optimized for the distribution of phased array antennas and calibrated it on a subarray basis. An adaptive max calibration method that allows faster calibration than the conventional method was proposed and verified through simulation. Not only is the gain of the phased array antenna higher while calibration is being made to the target, but the beam pattern is closer to the ideal beam pattern than the conventional method.

Comparison of Learning Performance by Reinforcement Learning Agent Visibility Information Difference (강화학습 에이전트 시야 정보 차이에 의한 학습 성능 비교)

  • Kim, Chan Sub;Jang, Si-Hwan;Yang, Seong-Il;Kang, Shin Jin
    • Journal of Korea Game Society
    • /
    • v.21 no.5
    • /
    • pp.17-28
    • /
    • 2021
  • Reinforcement learning, in which artificial intelligence develops itself to find the best solution to problems, is a technology that is highly valuable in many fields. In particular, the game field has the advantage of providing a virtual environment for problem-solving to reinforcement learning artificial intelligence, and reinforcement learning agents solve problems about their environment by identifying information about their situation and environment using observations. In this experiment, the instant dungeon environment of the RPG game was simplified and produced and various observation variables related to the field of view were set to the agent. As a result of the experiment, it was possible to figure out how much each set variable affects the learning speed, and these results can be referred to in the study of game RPG reinforcement learning.