• 제목/요약/키워드: Actor-Critic Deep Reinforcement Learning

검색결과 17건 처리시간 0.024초

Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘 (Improved Deep Q-Network Algorithm Using Self-Imitation Learning)

  • 선우영민;이원창
    • 전기전자학회논문지
    • /
    • 제25권4호
    • /
    • pp.644-649
    • /
    • 2021
  • Self-Imitation Learning은 간단한 비활성 정책 actor-critic 알고리즘으로써 에이전트가 과거의 좋은 경험을 활용하여 최적의 정책을 찾을 수 있도록 해준다. 그리고 actor-critic 구조를 갖는 강화학습 알고리즘에 결합되어 다양한 환경들에서 알고리즘의 상당한 개선을 보여주었다. 하지만 Self-Imitation Learning이 강화학습에 큰 도움을 준다고 하더라도 그 적용 분야는 actor-critic architecture를 가지는 강화학습 알고리즘으로 제한되어 있다. 본 논문에서 Self-Imitation Learning의 알고리즘을 가치 기반 강화학습 알고리즘인 DQN에 적용하는 방법을 제안하고, Self-Imitation Learning이 적용된 DQN 알고리즘의 학습을 다양한 환경에서 진행한다. 아울러 그 결과를 기존의 결과와 비교함으로써 Self-Imitation Leaning이 DQN에도 적용될 수 있으며 DQN의 성능을 개선할 수 있음을 보인다.

현실 세계에서의 로봇 파지 작업을 위한 정책/가치 심층 강화학습 플랫폼 개발 (Development of an Actor-Critic Deep Reinforcement Learning Platform for Robotic Grasping in Real World)

  • 김태원;박예성;김종복;박영빈;서일홍
    • 로봇학회논문지
    • /
    • 제15권2호
    • /
    • pp.197-204
    • /
    • 2020
  • In this paper, we present a learning platform for robotic grasping in real world, in which actor-critic deep reinforcement learning is employed to directly learn the grasping skill from raw image pixels and rarely observed rewards. This is a challenging task because existing algorithms based on deep reinforcement learning require an extensive number of training data or massive computational cost so that they cannot be affordable in real world settings. To address this problems, the proposed learning platform basically consists of two training phases; a learning phase in simulator and subsequent learning in real world. Here, main processing blocks in the platform are extraction of latent vector based on state representation learning and disentanglement of a raw image, generation of adapted synthetic image using generative adversarial networks, and object detection and arm segmentation for the disentanglement. We demonstrate the effectiveness of this approach in a real environment.

분포형 강화학습을 활용한 맵리스 네비게이션 (Mapless Navigation with Distributional Reinforcement Learning)

  • 짠 반 마잉;김곤우
    • 로봇학회논문지
    • /
    • 제19권1호
    • /
    • pp.92-97
    • /
    • 2024
  • This paper provides a study of distributional perspective on reinforcement learning for application in mobile robot navigation. Mapless navigation algorithms based on deep reinforcement learning are proven to promising performance and high applicability. The trial-and-error simulations in virtual environments are encouraged to implement autonomous navigation due to expensive real-life interactions. Nevertheless, applying the deep reinforcement learning model in real tasks is challenging due to dissimilar data collection between virtual simulation and the physical world, leading to high-risk manners and high collision rate. In this paper, we present distributional reinforcement learning architecture for mapless navigation of mobile robot that adapt the uncertainty of environmental change. The experimental results indicate the superior performance of distributional soft actor critic compared to conventional methods.

Multi-Agent Deep Reinforcement Learning for Fighting Game: A Comparative Study of PPO and A2C

  • Yoshua Kaleb Purwanto;Dae-Ki Kang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권3호
    • /
    • pp.192-198
    • /
    • 2024
  • This paper investigates the application of multi-agent deep reinforcement learning in the fighting game Samurai Shodown using Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C) algorithms. Initially, agents are trained separately for 200,000 timesteps using Convolutional Neural Network (CNN) and Multi-Layer Perceptron (MLP) with LSTM networks. PPO demonstrates superior performance early on with stable policy updates, while A2C shows better adaptation and higher rewards over extended training periods, culminating in A2C outperforming PPO after 1,000,000 timesteps. These findings highlight PPO's effectiveness for short-term training and A2C's advantages in long-term learning scenarios, emphasizing the importance of algorithm selection based on training duration and task complexity. The code can be found in this link https://github.com/Lexer04/Samurai-Shodown-with-Reinforcement-Learning-PPO.

Deep reinforcement learning for a multi-objective operation in a nuclear power plant

  • Junyong Bae;Jae Min Kim;Seung Jun Lee
    • Nuclear Engineering and Technology
    • /
    • 제55권9호
    • /
    • pp.3277-3290
    • /
    • 2023
  • Nuclear power plant (NPP) operations with multiple objectives and devices are still performed manually by operators despite the potential for human error. These operations could be automated to reduce the burden on operators; however, classical approaches may not be suitable for these multi-objective tasks. An alternative approach is deep reinforcement learning (DRL), which has been successful in automating various complex tasks and has been applied in automation of certain operations in NPPs. But despite the recent progress, previous studies using DRL for NPP operations have limitations to handle complex multi-objective operations with multiple devices efficiently. This study proposes a novel DRL-based approach that addresses these limitations by employing a continuous action space and straightforward binary rewards supported by the adoption of a soft actor-critic and hindsight experience replay. The feasibility of the proposed approach was evaluated for controlling the pressure and volume of the reactor coolant while heating the coolant during NPP startup. The results show that the proposed approach can train the agent with a proper strategy for effectively achieving multiple objectives through the control of multiple devices. Moreover, hands-on testing results demonstrate that the trained agent is capable of handling untrained objectives, such as cooldown, with substantial success.

2D 레이싱 게임 학습 에이전트를 위한 강화 학습 알고리즘 비교 분석 (Comparison of Reinforcement Learning Algorithms for a 2D Racing Game Learning Agent)

  • 이동철
    • 한국인터넷방송통신학회논문지
    • /
    • 제20권1호
    • /
    • pp.171-176
    • /
    • 2020
  • 강화 학습은 인공지능 에이전트가 비디오 게임을 학습할 때 가장 효과적으로 사용되는 방법이다. 강화 학습을 위해 여지껏 많은 알고리즘들이 제시되어 왔지만 알고리즘마다 적용되는 분야에 따라 다른 성능을 보여주었다. 본 논문은 최근 강화 학습에서 주로 사용되는 알고리즘들의 성능이 2D 레이싱 게임에서 어떻게 달라지는지 비교 평가한다. 이를 위해 평가에서 사용할 성능 메트릭을 정의하고 각 알고리즘에 따른 메트릭의 값을 그래프로 비교하였다. 그 결과 ACER (Actor Critic with Experience Replay)를 사용할 경우 게임의 보상이 다른 알고리즘보다 평균적으로 높은 것을 알 수 있었고, 보상 값이 가장 낮은 알고리즘과의 차이는 157%였다.

종방향 주행성능향상을 위한 Latent SAC 강화학습 보상함수 설계 (On the Reward Function of Latent SAC Reinforcement Learning to Improve Longitudinal Driving Performance)

  • 조성빈;정한유
    • 전기전자학회논문지
    • /
    • 제25권4호
    • /
    • pp.728-734
    • /
    • 2021
  • 최근 심층강화학습을 활용한 종단간 자율주행에 대한 관심이 크게 증가하고 있다. 본 논문에서는 차량의 종방향 주행 성능을 개선하는 잠재 SAC 기반 심층강화학습의 보상함수를 제시한다. 기존 강화학습 보상함수는 주행 안전성과 효율성이 크게 저하되는 반면 제시하는 보상함수는 전방 차량과의 충돌위험을 회피하면서 적절한 차간거리를 유지할 수 있음을 보인다.

수중운동체의 롤 제어를 위한 Deep Deterministic Policy Gradient 기반 강화학습 (Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle)

  • 김수용;황연걸;문성웅
    • 한국군사과학기술학회지
    • /
    • 제24권5호
    • /
    • pp.558-568
    • /
    • 2021
  • The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

A3C 기반의 강화학습을 사용한 DASH 시스템 (A DASH System Using the A3C-based Deep Reinforcement Learning)

  • 최민제;임경식
    • 대한임베디드공학회논문지
    • /
    • 제17권5호
    • /
    • pp.297-307
    • /
    • 2022
  • The simple procedural segment selection algorithm commonly used in Dynamic Adaptive Streaming over HTTP (DASH) reveals severe weakness to provide high-quality streaming services in the integrated mobile networks of various wired and wireless links. A major issue could be how to properly cope with dynamically changing underlying network conditions. The key to meet it should be to make the segment selection algorithm much more adaptive to fluctuation of network traffics. This paper presents a system architecture that replaces the existing procedural segment selection algorithm with a deep reinforcement learning algorithm based on the Asynchronous Advantage Actor-Critic (A3C). The distributed A3C-based deep learning server is designed and implemented to allow multiple clients in different network conditions to stream videos simultaneously, collect learning data quickly, and learn asynchronously, resulting in greatly improved learning speed as the number of video clients increases. The performance analysis shows that the proposed algorithm outperforms both the conventional DASH algorithm and the Deep Q-Network algorithm in terms of the user's quality of experience and the speed of deep learning.

스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토 (Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD)

  • 강주원;김현수
    • 한국공간구조학회논문집
    • /
    • 제21권2호
    • /
    • pp.41-48
    • /
    • 2021
  • A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.