• Title/Summary/Keyword: Deep Deterministic Policy Gradient(DDPG)

Search Result 12, Processing Time 0.027 seconds

Autonomous control of bicycle using Deep Deterministic Policy Gradient Algorithm (Deep Deterministic Policy Gradient 알고리즘을 응용한 자전거의 자율 주행 제어)

  • Choi, Seung Yoon;Le, Pham Tuyen;Chung, Tae Choong
    • Convergence Security Journal
    • /
    • v.18 no.3
    • /
    • pp.3-9
    • /
    • 2018
  • The Deep Deterministic Policy Gradient (DDPG) algorithm is an algorithm that learns by using artificial neural network s and reinforcement learning. Among the studies related to reinforcement learning, which has been recently studied, the D DPG algorithm has an advantage of preventing the cases where the wrong actions are accumulated and affecting the learn ing because it is learned by the off-policy. In this study, we experimented to control the bicycle autonomously by applyin g the DDPG algorithm. Simulation was carried out by setting various environments and it was shown that the method us ed in the experiment works stably on the simulation.

  • PDF

Learning Optimal Trajectory Generation for Low-Cost Redundant Manipulator using Deep Deterministic Policy Gradient(DDPG) (저가 Redundant Manipulator의 최적 경로 생성을 위한 Deep Deterministic Policy Gradient(DDPG) 학습)

  • Lee, Seunghyeon;Jin, Seongho;Hwang, Seonghyeon;Lee, Inho
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.1
    • /
    • pp.58-67
    • /
    • 2022
  • In this paper, we propose an approach resolving inaccuracy of the low-cost redundant manipulator workspace with low encoder and low stiffness. When the manipulators are manufactured with low-cost encoders and low-cost links, the robots can run into workspace inaccuracy issues. Furthermore, trajectory generation based on conventional forward/inverse kinematics without taking into account inaccuracy issues will introduce the risk of end-effector fluctuations. Hence, we propose an optimization for the trajectory generation method based on the DDPG (Deep Deterministic Policy Gradient) algorithm for the low-cost redundant manipulators reaching the target position in Euclidean space. We designed the DDPG algorithm minimizing the distance along with the jacobian condition number. The training environment is selected with an error rate of randomly generated joint spaces in a simulator that implemented real-world physics, the test environment is a real robotic experiment and demonstrated our approach.

Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle (수중운동체의 롤 제어를 위한 Deep Deterministic Policy Gradient 기반 강화학습)

  • Kim, Su Yong;Hwang, Yeon Geol;Moon, Sung Woong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.5
    • /
    • pp.558-568
    • /
    • 2021
  • The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

Performance Evaluation of Reinforcement Learning Algorithm for Control of Smart TMD (스마트 TMD 제어를 위한 강화학습 알고리즘 성능 검토)

  • Kang, Joo-Won;Kim, Hyun-Su
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.2
    • /
    • pp.41-48
    • /
    • 2021
  • A smart tuned mass damper (TMD) is widely studied for seismic response reduction of various structures. Control algorithm is the most important factor for control performance of a smart TMD. This study used a Deep Deterministic Policy Gradient (DDPG) among reinforcement learning techniques to develop a control algorithm for a smart TMD. A magnetorheological (MR) damper was used to make the smart TMD. A single mass model with the smart TMD was employed to make a reinforcement learning environment. Time history analysis simulations of the example structure subject to artificial seismic load were performed in the reinforcement learning process. Critic of policy network and actor of value network for DDPG agent were constructed. The action of DDPG agent was selected as the command voltage sent to the MR damper. Reward for the DDPG action was calculated by using displacement and velocity responses of the main mass. Groundhook control algorithm was used as a comparative control algorithm. After 10,000 episode training of the DDPG agent model with proper hyper-parameters, the semi-active control algorithm for control of seismic responses of the example structure with the smart TMD was developed. The simulation results presented that the developed DDPG model can provide effective control algorithms for smart TMD for reduction of seismic responses.

A Study on the Portfolio Performance Evaluation using Actor-Critic Reinforcement Learning Algorithms (액터-크리틱 모형기반 포트폴리오 연구)

  • Lee, Woo Sik
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.3
    • /
    • pp.467-476
    • /
    • 2022
  • The Bank of Korea raised the benchmark interest rate by a quarter percentage point to 1.75 percent per year, and analysts predict that South Korea's policy rate will reach 2.00 percent by the end of calendar year 2022. Furthermore, because market volatility has been significantly increased by a variety of factors, including rising rates, inflation, and market volatility, many investors have struggled to meet their financial objectives or deliver returns. Banks and financial institutions are attempting to provide Robo-Advisors to manage client portfolios without human intervention in this situation. In this regard, determining the best hyper-parameter combination is becoming increasingly important. This study compares some activation functions of the Deep Deterministic Policy Gradient(DDPG) and Twin-delayed Deep Deterministic Policy Gradient (TD3) Algorithms to choose a sequence of actions that maximizes long-term reward. The DDPG and TD3 outperformed its benchmark index, according to the results. One reason for this is that we need to understand the action probabilities in order to choose an action and receive a reward, which we then compare to the state value to determine an advantage. As interest in machine learning has grown and research into deep reinforcement learning has become more active, finding an optimal hyper-parameter combination for DDPG and TD3 has become increasingly important.

Task offloading scheme based on the DRL of Connected Home using MEC (MEC를 활용한 커넥티드 홈의 DRL 기반 태스크 오프로딩 기법)

  • Ducsun Lim;Kyu-Seek Sohn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.6
    • /
    • pp.61-67
    • /
    • 2023
  • The rise of 5G and the proliferation of smart devices have underscored the significance of multi-access edge computing (MEC). Amidst this trend, interest in effectively processing computation-intensive and latency-sensitive applications has increased. This study investigated a novel task offloading strategy considering the probabilistic MEC environment to address these challenges. Initially, we considered the frequency of dynamic task requests and the unstable conditions of wireless channels to propose a method for minimizing vehicle power consumption and latency. Subsequently, our research delved into a deep reinforcement learning (DRL) based offloading technique, offering a way to achieve equilibrium between local computation and offloading transmission power. We analyzed the power consumption and queuing latency of vehicles using the deep deterministic policy gradient (DDPG) and deep Q-network (DQN) techniques. Finally, we derived and validated the optimal performance enhancement strategy in a vehicle based MEC environment.

Determination of Ship Collision Avoidance Path using Deep Deterministic Policy Gradient Algorithm (심층 결정론적 정책 경사법을 이용한 선박 충돌 회피 경로 결정)

  • Kim, Dong-Ham;Lee, Sung-Uk;Nam, Jong-Ho;Furukawa, Yoshitaka
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.1
    • /
    • pp.58-65
    • /
    • 2019
  • The stability, reliability and efficiency of a smart ship are important issues as the interest in an autonomous ship has recently been high. An automatic collision avoidance system is an essential function of an autonomous ship. This system detects the possibility of collision and automatically takes avoidance actions in consideration of economy and safety. In order to construct an automatic collision avoidance system using reinforcement learning, in this work, the sequential decision problem of ship collision is mathematically formulated through a Markov Decision Process (MDP). A reinforcement learning environment is constructed based on the ship maneuvering equations, and then the three key components (state, action, and reward) of MDP are defined. The state uses parameters of the relationship between own-ship and target-ship, the action is the vertical distance away from the target course, and the reward is defined as a function considering safety and economics. In order to solve the sequential decision problem, the Deep Deterministic Policy Gradient (DDPG) algorithm which can express continuous action space and search an optimal action policy is utilized. The collision avoidance system is then tested assuming the $90^{\circ}$intersection encounter situation and yields a satisfactory result.

Singularity Avoidance Path Planning on Cooperative Task of Dual Manipulator Using DDPG Algorithm (DDPG 알고리즘을 이용한 양팔 매니퓰레이터의 협동작업 경로상의 특이점 회피 경로 계획)

  • Lee, Jonghak;Kim, Kyeongsoo;Kim, Yunjae;Lee, Jangmyung
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.2
    • /
    • pp.137-146
    • /
    • 2021
  • When controlling manipulator, degree of freedom is lost in singularity so specific joint velocity does not propagate to the end effector. In addition, control problem occurs because jacobian inverse matrix can not be calculated. To avoid singularity, we apply Deep Deterministic Policy Gradient(DDPG), algorithm of reinforcement learning that rewards behavior according to actions then determines high-reward actions in simulation. DDPG uses off-policy that uses 𝝐-greedy policy for selecting action of current time step and greed policy for the next step. In the simulation, learning is given by negative reward when moving near singulairty, and positive reward when moving away from the singularity and moving to target point. The reward equation consists of distance to target point and singularity, manipulability, and arrival flag. Dual arm manipulators hold long rod at the same time and conduct experiments to avoid singularity by simulated path. In the learning process, if object to be avoided is set as a space rather than point, it is expected that avoidance of obstacles will be possible in future research.

Controller Learning Method of Self-driving Bicycle Using State-of-the-art Deep Reinforcement Learning Algorithms

  • Choi, Seung-Yoon;Le, Tuyen Pham;Chung, Tae-Choong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.23-31
    • /
    • 2018
  • Recently, there have been many studies on machine learning. Among them, studies on reinforcement learning are actively worked. In this study, we propose a controller to control bicycle using DDPG (Deep Deterministic Policy Gradient) algorithm which is the latest deep reinforcement learning method. In this paper, we redefine the compensation function of bicycle dynamics and neural network to learn agents. When using the proposed method for data learning and control, it is possible to perform the function of not allowing the bicycle to fall over and reach the further given destination unlike the existing method. For the performance evaluation, we have experimented that the proposed algorithm works in various environments such as fixed speed, random, target point, and not determined. Finally, as a result, it is confirmed that the proposed algorithm shows better performance than the conventional neural network algorithms NAF and PPO.

Computation Offloading with Resource Allocation Based on DDPG in MEC

  • Sungwon Moon;Yujin Lim
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.226-238
    • /
    • 2024
  • Recently, multi-access edge computing (MEC) has emerged as a promising technology to alleviate the computing burden of vehicular terminals and efficiently facilitate vehicular applications. The vehicle can improve the quality of experience of applications by offloading their tasks to MEC servers. However, channel conditions are time-varying due to channel interference among vehicles, and path loss is time-varying due to the mobility of vehicles. The task arrival of vehicles is also stochastic. Therefore, it is difficult to determine an optimal offloading with resource allocation decision in the dynamic MEC system because offloading is affected by wireless data transmission. In this paper, we study computation offloading with resource allocation in the dynamic MEC system. The objective is to minimize power consumption and maximize throughput while meeting the delay constraints of tasks. Therefore, it allocates resources for local execution and transmission power for offloading. We define the problem as a Markov decision process, and propose an offloading method using deep reinforcement learning named deep deterministic policy gradient. Simulation shows that, compared with existing methods, the proposed method outperforms in terms of throughput and satisfaction of delay constraints.