• Title/Summary/Keyword: Proximal Policy Optimization

Search Result 23, Processing Time 0.024 seconds

Experimental Analysis of A3C and PPO in the OpenAI Gym Environment (OpenAI Gym 환경에서 A3C와 PPO의 실험적 분석)

  • Hwang, Gyu-Young;Lim, Hyun-Kyo;Heo, Joo-Seong;Han, Youn-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.545-547
    • /
    • 2019
  • Policy Gradient 방식의 학습은 최근 강화학습 분야에서 많이 연구되고 있는 주제로, 본 논문에서는 강화학습을 적용시킬 수 있는 OpenAi Gym 의 'CartPole-v0' 와 'Pendulum-v0' 환경에서 Policy Gradient 방식의 Asynchronous Advantage Actor-Critic (A3C) 알고리즘과 Proximal Policy Optimization (PPO) 알고리즘의 학습 성능을 비교 분석한 결과를 제시한다. 딥러닝 모델 등 두 알고리즘이 동일하게 지닐 수 있는 조건들은 가능한 동일하게 맞추면서 Episode 진행에 따른 Score 변화 과정을 실험하였다. 본 실험을 통해서 두 가지 서로 다른 환경에서 PPO 가 A3C 보다 더 나은 성능을 보임을 확인하였다.

Design and Implementation of Reinforcement Learning Agent Using PPO Algorithim for Match 3 Gameplay (매치 3 게임 플레이를 위한 PPO 알고리즘을 이용한 강화학습 에이전트의 설계 및 구현)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.3
    • /
    • pp.1-6
    • /
    • 2021
  • Most of the match-3 puzzle games supports automatic play using the MCTS algorithm. However, implementing reinforcement learning agents is not an easy job because it requires both the knowledge of machine learning and the way of complex interactions within the development environment. This study proposes a method in which we can easily design reinforcement learning agents and implement game play agents by applying PPO(Proximal Policy Optimization) algorithms. And we could identify the performance was increased about 44% than the conventional method. The tools we used are the Unity 3D game engine and Unity ML SDK. The experimental result shows that agents became to learn game rules and make better strategic decisions as experiments go on. On average, the puzzle gameplay agents implemented in this study played puzzle games better than normal people. It is expected that the designed agent could be used to speed up the game level design process.

Reinforcement learning portfolio optimization based on portfolio theory (강화학습을 이용한 포트폴리오 투자 프로세스 최적화에 대한 연구)

  • Hyeong-Jin Son;Lim Donhui;Young-Woo Han
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.961-962
    • /
    • 2023
  • 포트폴리오 구성문제는 과거부터 현재까지 많은 연구가 이루어지고 있다. 현재는 강화학습을 통해 포트폴리오를 구성하는 연구가 많이 진행되고있다. 포트폴리오를 구성함에 있어 종목선택과 각 종목을 얼만큼 투자할 것인지는 둘 다 중요한 문제이다. 본 연구에서는 과거부터 많이 사용해오던 방식을 차용하여 강화학습 방법과 접목시켰고 이를 통해 설명력이 높은 모델을 만들려고 노력하였다. 강화학습에 사용한 모델은 PPO(Proximal Policy Optimization)을 기본으로 하였고 인공신경망은 LSTM을 활용하였다. 실험결과 실험 기간 동안(2023년 3월 30일 부터 108 영업일 까지)의 코스피 수익률은 5%인데 반해 본 연구에서 제시한 모델의 수익률은 평균 약 9%를 기록했다.

Application of reinforcement learning to fire suppression system of an autonomous ship in irregular waves

  • Lee, Eun-Joo;Ruy, Won-Sun;Seo, Jeonghwa
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.12 no.1
    • /
    • pp.910-917
    • /
    • 2020
  • In fire suppression, continuous delivery of water or foam to the fire source is essential. The present study concerns fire suppression in a ship under sea condition, by introducing reinforcement learning technique to aiming of fire extinguishing nozzle, which works in a ship compartment with six degrees of freedom movement by irregular waves. The physical modeling of the water jet and compartment motion was provided using Unity 3D engine. In the reinforcement learning, the change of the nozzle angle during the scenario was set as the action, while the reward is proportional to the ratio of the water particle delivered to the fire source area. The optimal control of nozzle aiming for continuous delivery of water jet could be derived. Various algorithms of reinforcement learning were tested to select the optimal one, the proximal policy optimization.

Exploring reward efficacy in traffic management using deep reinforcement learning in intelligent transportation system

  • Paul, Ananya;Mitra, Sulata
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.194-207
    • /
    • 2022
  • In the last decade, substantial progress has been achieved in intelligent traffic control technologies to overcome consistent difficulties of traffic congestion and its adverse effect on smart cities. Edge computing is one such advanced progress facilitating real-time data transmission among vehicles and roadside units to mitigate congestion. An edge computing-based deep reinforcement learning system is demonstrated in this study that appropriately designs a multiobjective reward function for optimizing different objectives. The system seeks to overcome the challenge of evaluating actions with a simple numerical reward. The selection of reward functions has a significant impact on agents' ability to acquire the ideal behavior for managing multiple traffic signals in a large-scale road network. To ascertain effective reward functions, the agent is trained withusing the proximal policy optimization method in several deep neural network models, including the state-of-the-art transformer network. The system is verified using both hypothetical scenarios and real-world traffic maps. The comprehensive simulation outcomes demonstrate the potency of the suggested reward functions.

A study on application of reinforcement learning to autonomous navigation of unmanned surface vehicle (소형무인선의 자율운행을 위한 강화학습기법 적용에 관한 연구)

  • Hee-Yong Lee
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.11a
    • /
    • pp.232-235
    • /
    • 2023
  • This study suggests how to build a training environment for the application of reinforcement learning techniques to USV, and Ihow to apply the training result to a real USV. The purpose of RL is to move USV from departure point to destination point autonomously using rudder.

  • PDF

Reinforcement learning-based control with application to the once-through steam generator system

  • Cheng Li;Ren Yu;Wenmin Yu;Tianshu Wang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.10
    • /
    • pp.3515-3524
    • /
    • 2023
  • A reinforcement learning framework is proposed for the control problem of outlet steam pressure of the once-through steam generator(OTSG) in this paper. The double-layer controller using Proximal Policy Optimization(PPO) algorithm is applied in the control structure of the OTSG. The PPO algorithm can train the neural networks continuously according to the process of interaction with the environment and then the trained controller can realize better control for the OTSG. Meanwhile, reinforcement learning has the characteristic of difficult application in real-world objects, this paper proposes an innovative pretraining method to solve this problem. The difficulty in the application of reinforcement learning lies in training. The optimal strategy of each step is summed up through trial and error, and the training cost is very high. In this paper, the LSTM model is adopted as the training environment for pretraining, which saves training time and improves efficiency. The experimental results show that this method can realize the self-adjustment of control parameters under various working conditions, and the control effect has the advantages of small overshoot, fast stabilization speed, and strong adaptive ability.

Evaluation of Human Demonstration Augmented Deep Reinforcement Learning Policy Optimization Methods Using Object Manipulation with an Anthropomorphic Robot Hand (휴먼형 로봇 손의 사물 조작 수행을 이용한 인간 행동 복제 강화학습 정책 최적화 방법 성능 평가)

  • Park, Na Hyeon;Oh, Ji Heon;Ryu, Ga Hyun;Anazco, Edwin Valarezo;Lopez, Patricio Rivera;Won, Da Seul;Jeong, Jin Gyun;Chang, Yun Jung;Kim, Tae-Seong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.858-861
    • /
    • 2020
  • 로봇이 사람과 같이 다양하고 복잡한 사물 조작을 하기 위해서 휴먼형 로봇손의 사물 파지 작업이 필수적이다. 자유도 (Degree of Freedom, DoF)가 높은 휴먼형(anthropomorphic) 로봇손을 학습시키기 위하여 사람 데모(human demonstration)가 결합된 강화학습 최적화 방법이 제안되었다. 본 연구에서는 강화학습 최적화 방법에 사람 데모가 결합된 Demonstration Augmented Natural Policy Gradient(DA-NPG)와 NPG 의 성능 비교를 통하여 행동 복제의 효율성을 확인하고, DA-NPG, DA-Trust Region Policy Optimization (DA-TRPO), DA-Proximal Policy Optimization (DA-PPO)의 최적화 방법의 성능 평가를 위하여 6 종의 물체에 대한 휴먼형 로봇손의 사물 조작 작업을 수행한다. 그 결과, DA-NPG 와 NPG를 비교한 결과를 통해 휴먼형 로봇손의 사물 조작 강화학습에 행동 복제가 효율적임을 증명하였다. 또한, DA-NPG 는 DA-TRPO 와 유사한 성능을 보이면서 모든 물체에 대한 사물 파지에 성공하여 가장 안정적이었다. 반면, DA-TRPO 와 DA-PPO 는 사물 조작에 실패한 물체가 존재하여 불안정한 성능을 보였다. 본 연구에서 제안하는 방법은 향후 실제 휴먼형 로봇에 적용하여 휴먼형 로봇 손의 사물조작 지능 개발에 유용할 것으로 전망된다.

Drone Obstacle Avoidance Algorithm using Camera-based Reinforcement Learning (카메라 기반 강화학습을 이용한 드론 장애물 회피 알고리즘)

  • Jo, Si-hun;Kim, Tae-Young
    • Journal of the Korea Computer Graphics Society
    • /
    • v.27 no.5
    • /
    • pp.63-71
    • /
    • 2021
  • Among drone autonomous flight technologies, obstacle avoidance is a very important technology that can prevent damage to drones or surrounding environments and prevent danger. Although the LiDAR sensor-based obstacle avoidance method shows relatively high accuracy and is widely used in recent studies, it has disadvantages of high unit price and limited processing capacity for visual information. Therefore, this paper proposes an obstacle avoidance algorithm for drones using camera-based PPO(Proximal Policy Optimization) reinforcement learning, which is relatively inexpensive and highly scalable using visual information. Drone, obstacles, target points, etc. are randomly located in a learning environment in the three-dimensional space, stereo images are obtained using a Unity camera, and then YOLov4Tiny object detection is performed. Next, the distance between the drone and the detected object is measured through triangulation of the stereo camera. Based on this distance, the presence or absence of obstacles is determined. Penalties are set if they are obstacles and rewards are given if they are target points. The experimennt of this method shows that a camera-based obstacle avoidance algorithm can be a sufficiently similar level of accuracy and average target point arrival time compared to a LiDAR-based obstacle avoidance algorithm, so it is highly likely to be used.

Optimal deployment of sonobuoy for unmanned aerial vehicles using reinforcement learning considering the target movement (표적의 이동을 고려한 강화학습 기반 무인항공기의 소노부이 최적 배치)

  • Geunyoung Bae;Juhwan Kang;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.214-224
    • /
    • 2024
  • Sonobuoys are disposable devices that utilize sound waves for information gathering, detecting engine noises, and capturing various acoustic characteristics. They play a crucial role in accurately detecting underwater targets, making them effective detection systems in anti-submarine warfare. Existing sonobuoy deployment methods in multistatic systems often rely on fixed patterns or heuristic-based rules, lacking efficiency in terms of the number of sonobuoys deployed and operational time due to the unpredictable mobility of the underwater targets. Thus, this paper proposes an optimal sonobuoy placement strategy for Unmanned Aerial Vehicles (UAVs) to overcome the limitations of conventional sonobuoy deployment methods. The proposed approach utilizes reinforcement learning in a simulation-based experimental environment that considers the movements of the underwater targets. The Unity ML-Agents framework is employed, and the Proximal Policy Optimization (PPO) algorithm is utilized for UAV learning in a virtual operational environment with real-time interactions. The reward function is designed to consider the number of sonobuoys deployed and the cost associated with sound sources and receivers, enabling effective learning. The proposed reinforcement learning-based deployment strategy compared to the conventional sonobuoy deployment methods in the same experimental environment demonstrates superior performance in terms of detection success rate, deployed sonobuoy count, and operational time.