• Title/Summary/Keyword: soft actor-critic

Search Result 6, Processing Time 0.021 seconds

Mapless Navigation with Distributional Reinforcement Learning (분포형 강화학습을 활용한 맵리스 네비게이션)

  • Van Manh Tran;Gon-Woo Kim
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.1
    • /
    • pp.92-97
    • /
    • 2024
  • This paper provides a study of distributional perspective on reinforcement learning for application in mobile robot navigation. Mapless navigation algorithms based on deep reinforcement learning are proven to promising performance and high applicability. The trial-and-error simulations in virtual environments are encouraged to implement autonomous navigation due to expensive real-life interactions. Nevertheless, applying the deep reinforcement learning model in real tasks is challenging due to dissimilar data collection between virtual simulation and the physical world, leading to high-risk manners and high collision rate. In this paper, we present distributional reinforcement learning architecture for mapless navigation of mobile robot that adapt the uncertainty of environmental change. The experimental results indicate the superior performance of distributional soft actor critic compared to conventional methods.

Deep reinforcement learning for a multi-objective operation in a nuclear power plant

  • Junyong Bae;Jae Min Kim;Seung Jun Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.9
    • /
    • pp.3277-3290
    • /
    • 2023
  • Nuclear power plant (NPP) operations with multiple objectives and devices are still performed manually by operators despite the potential for human error. These operations could be automated to reduce the burden on operators; however, classical approaches may not be suitable for these multi-objective tasks. An alternative approach is deep reinforcement learning (DRL), which has been successful in automating various complex tasks and has been applied in automation of certain operations in NPPs. But despite the recent progress, previous studies using DRL for NPP operations have limitations to handle complex multi-objective operations with multiple devices efficiently. This study proposes a novel DRL-based approach that addresses these limitations by employing a continuous action space and straightforward binary rewards supported by the adoption of a soft actor-critic and hindsight experience replay. The feasibility of the proposed approach was evaluated for controlling the pressure and volume of the reactor coolant while heating the coolant during NPP startup. The results show that the proposed approach can train the agent with a proper strategy for effectively achieving multiple objectives through the control of multiple devices. Moreover, hands-on testing results demonstrate that the trained agent is capable of handling untrained objectives, such as cooldown, with substantial success.

Strategy to coordinate actions through a plant parameter prediction model during startup operation of a nuclear power plant

  • Jae Min Kim;Junyong Bae;Seung Jun Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.3
    • /
    • pp.839-849
    • /
    • 2023
  • The development of automation technology to reduce human error by minimizing human intervention is accelerating with artificial intelligence and big data processing technology, even in the nuclear field. Among nuclear power plant operation modes, the startup and shutdown operations are still performed manually and thus have the potential for human error. As part of the development of an autonomous operation system for startup operation, this paper proposes an action coordinating strategy to obtain the optimal actions. The lower level of the system consists of operating blocks that are created by analyzing the operation tasks to achieve local goals through soft actor-critic algorithms. However, when multiple agents try to perform conflicting actions, a method is needed to coordinate them, and for this, an action coordination strategy was developed in this work as the upper level of the system. Three quantification methods were compared and evaluated based on the future plant state predicted by plant parameter prediction models using long short-term memory networks. Results confirmed that the optimal action to satisfy the limiting conditions for operation can be selected by coordinating the action sets. It is expected that this methodology can be generalized through future research.

Visual Object Manipulation Based on Exploration Guided by Demonstration (시연에 의해 유도된 탐험을 통한 시각 기반의 물체 조작)

  • Kim, Doo-Jun;Jo, HyunJun;Song, Jae-Bok
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.1
    • /
    • pp.40-47
    • /
    • 2022
  • A reward function suitable for a task is required to manipulate objects through reinforcement learning. However, it is difficult to design the reward function if the ample information of the objects cannot be obtained. In this study, a demonstration-based object manipulation algorithm called stochastic exploration guided by demonstration (SEGD) is proposed to solve the design problem of the reward function. SEGD is a reinforcement learning algorithm in which a sparse reward explorer (SRE) and an interpolated policy using demonstration (IPD) are added to soft actor-critic (SAC). SRE ensures the training of the critic of SAC by collecting prior data and IPD limits the exploration space by making SEGD's action similar to the expert's action. Through these two algorithms, the SEGD can learn only with the sparse reward of the task without designing the reward function. In order to verify the SEGD, experiments were conducted for three tasks. SEGD showed its effectiveness by showing success rates of more than 96.5% in these experiments.

On the Reward Function of Latent SAC Reinforcement Learning to Improve Longitudinal Driving Performance (종방향 주행성능향상을 위한 Latent SAC 강화학습 보상함수 설계)

  • Jo, Sung-Bean;Jeong, Han-You
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.728-734
    • /
    • 2021
  • In recent years, there has been a strong interest in the end-to-end autonomous driving based on deep reinforcement learning. In this paper, we present a reward function of latent SAC deep reinforcement learning to improve the longitudinal driving performance of an agent vehicle. While the existing reward function significantly degrades the driving safety and efficiency, the proposed reward function is shown to maintain an appropriate headway distance while avoiding the front vehicle collision.

Enhancing Smart Grid Efficiency through SAC Reinforcement Learning: Renewable Energy Integration and Optimal Demand Response in the CityLearn Environment (SAC 강화 학습을 통한 스마트 그리드 효율성 향상: CityLearn 환경에서 재생 에너지 통합 및 최적 수요 반응)

  • Esanov Alibek Rustamovich;Seung Je Seong;Chang-Gyoon Lim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.93-104
    • /
    • 2024
  • Demand response is a strategy that encourages customers to adjust their consumption patterns at times of peak demand with the aim to improve the reliability of the power grid and minimize expenses. The integration of renewable energy sources into smart grids poses significant challenges due to their intermittent and unpredictable nature. Demand response strategies, coupled with reinforcement learning techniques, have emerged as promising approaches to address these challenges and optimize grid operations where traditional methods fail to meet such kind of complex requirements. This research focuses on investigating the application of reinforcement learning algorithms in demand response for renewable energy integration. The objectives include optimizing demand-side flexibility, improving renewable energy utilization, and enhancing grid stability. The results emphasize the effectiveness of demand response strategies based on reinforcement learning in enhancing grid flexibility and facilitating the integration of renewable energy.