• Title/Summary/Keyword: DDPG Algorithm

Search Result 7, Processing Time 0.025 seconds

Autonomous control of bicycle using Deep Deterministic Policy Gradient Algorithm (Deep Deterministic Policy Gradient 알고리즘을 응용한 자전거의 자율 주행 제어)

  • Choi, Seung Yoon;Le, Pham Tuyen;Chung, Tae Choong
    • Convergence Security Journal
    • /
    • v.18 no.3
    • /
    • pp.3-9
    • /
    • 2018
  • The Deep Deterministic Policy Gradient (DDPG) algorithm is an algorithm that learns by using artificial neural network s and reinforcement learning. Among the studies related to reinforcement learning, which has been recently studied, the D DPG algorithm has an advantage of preventing the cases where the wrong actions are accumulated and affecting the learn ing because it is learned by the off-policy. In this study, we experimented to control the bicycle autonomously by applyin g the DDPG algorithm. Simulation was carried out by setting various environments and it was shown that the method us ed in the experiment works stably on the simulation.

  • PDF

Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD (스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토)

  • Kang, Joo-Won;Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.2
    • /
    • pp.41-48
    • /
    • 2021
  • A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.

Learning Optimal Trajectory Generation for Low-Cost Redundant Manipulator using Deep Deterministic Policy Gradient(DDPG) (저가 Redundant Manipulator의 최적 경로 생성을 위한 Deep Deterministic Policy Gradient(DDPG) 학습)

  • Lee, Seunghyeon;Jin, Seongho;Hwang, Seonghyeon;Lee, Inho
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.1
    • /
    • pp.58-67
    • /
    • 2022
  • In this paper, we propose an approach resolving inaccuracy of the low-cost redundant manipulator workspace with low encoder and low stiffness. When the manipulators are manufactured with low-cost encoders and low-cost links, the robots can run into workspace inaccuracy issues. Furthermore, trajectory generation based on conventional forward/inverse kinematics without taking into account inaccuracy issues will introduce the risk of end-effector fluctuations. Hence, we propose an optimization for the trajectory generation method based on the DDPG (Deep Deterministic Policy Gradient) algorithm for the low-cost redundant manipulators reaching the target position in Euclidean space. We designed the DDPG algorithm minimizing the distance along with the jacobian condition number. The training environment is selected with an error rate of randomly generated joint spaces in a simulator that implemented real-world physics, the test environment is a real robotic experiment and demonstrated our approach.

Singularity Avoidance Path Planning on Cooperative Task of Dual Manipulator Using DDPG Algorithm (DDPG 알고리즘을 이용한 양팔 매니퓰레이터의 협동작업 경로상의 특이점 회피 경로 계획)

  • Lee, Jonghak;Kim, Kyeongsoo;Kim, Yunjae;Lee, Jangmyung
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.2
    • /
    • pp.137-146
    • /
    • 2021
  • When controlling manipulator, degree of freedom is lost in singularity so specific joint velocity does not propagate to the end effector. In addition, control problem occurs because jacobian inverse matrix can not be calculated. To avoid singularity, we apply Deep Deterministic Policy Gradient(DDPG), algorithm of reinforcement learning that rewards behavior according to actions then determines high-reward actions in simulation. DDPG uses off-policy that uses 𝝐-greedy policy for selecting action of current time step and greed policy for the next step. In the simulation, learning is given by negative reward when moving near singulairty, and positive reward when moving away from the singularity and moving to target point. The reward equation consists of distance to target point and singularity, manipulability, and arrival flag. Dual arm manipulators hold long rod at the same time and conduct experiments to avoid singularity by simulated path. In the learning process, if object to be avoided is set as a space rather than point, it is expected that avoidance of obstacles will be possible in future research.

Research on Optimal Deployment of Sonobuoy for Autonomous Aerial Vehicles Using Virtual Environment and DDPG Algorithm (가상환경과 DDPG 알고리즘을 이용한 자율 비행체의 소노부이 최적 배치 연구)

  • Kim, Jong-In;Han, Min-Seok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.2
    • /
    • pp.152-163
    • /
    • 2022
  • In this paper, we present a method to enable an unmanned aerial vehicle to drop the sonobuoy, an essential element of anti-submarine warfare, in an optimal deployment. To this end, an environment simulating the distribution of sound detection performance was configured through the Unity game engine, and the environment directly configured using Unity ML-Agents and the reinforcement learning algorithm written in Python from the outside communicated with each other and learned. In particular, reinforcement learning is introduced to prevent the accumulation of wrong actions and affect learning, and to secure the maximum detection area for the sonobuoy while the vehicle flies to the target point in the shortest time. The optimal placement of the sonobuoy was achieved by applying the Deep Deterministic Policy Gradient (DDPG) algorithm. As a result of the learning, the agent flew through the sea area and passed only the points to achieve the optimal placement among the 70 target candidates. This means that an autonomous aerial vehicle that deploys a sonobuoy in the shortest time and maximum detection area, which is the requirement for optimal placement, has been implemented.

Controller Learning Method of Self-driving Bicycle Using State-of-the-art Deep Reinforcement Learning Algorithms

  • Choi, Seung-Yoon;Le, Tuyen Pham;Chung, Tae-Choong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.23-31
    • /
    • 2018
  • Recently, there have been many studies on machine learning. Among them, studies on reinforcement learning are actively worked. In this study, we propose a controller to control bicycle using DDPG (Deep Deterministic Policy Gradient) algorithm which is the latest deep reinforcement learning method. In this paper, we redefine the compensation function of bicycle dynamics and neural network to learn agents. When using the proposed method for data learning and control, it is possible to perform the function of not allowing the bicycle to fall over and reach the further given destination unlike the existing method. For the performance evaluation, we have experimented that the proposed algorithm works in various environments such as fixed speed, random, target point, and not determined. Finally, as a result, it is confirmed that the proposed algorithm shows better performance than the conventional neural network algorithms NAF and PPO.

Determination of Ship Collision Avoidance Path using Deep Deterministic Policy Gradient Algorithm (심층 결정론적 정책 경사법을 이용한 선박 충돌 회피 경로 결정)

  • Kim, Dong-Ham;Lee, Sung-Uk;Nam, Jong-Ho;Furukawa, Yoshitaka
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.1
    • /
    • pp.58-65
    • /
    • 2019
  • The stability, reliability and efficiency of a smart ship are important issues as the interest in an autonomous ship has recently been high. An automatic collision avoidance system is an essential function of an autonomous ship. This system detects the possibility of collision and automatically takes avoidance actions in consideration of economy and safety. In order to construct an automatic collision avoidance system using reinforcement learning, in this work, the sequential decision problem of ship collision is mathematically formulated through a Markov Decision Process (MDP). A reinforcement learning environment is constructed based on the ship maneuvering equations, and then the three key components (state, action, and reward) of MDP are defined. The state uses parameters of the relationship between own-ship and target-ship, the action is the vertical distance away from the target course, and the reward is defined as a function considering safety and economics. In order to solve the sequential decision problem, the Deep Deterministic Policy Gradient (DDPG) algorithm which can express continuous action space and search an optimal action policy is utilized. The collision avoidance system is then tested assuming the $90^{\circ}$intersection encounter situation and yields a satisfactory result.