• Title/Summary/Keyword: Actor-Critic Method

Search Result 24, Processing Time 0.026 seconds

Kernel-based actor-critic approach with applications

  • Chu, Baek-Suk;Jung, Keun-Woo;Park, Joo-Young
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.11 no.4
    • /
    • pp.267-274
    • /
    • 2011
  • Recently, actor-critic methods have drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. In this paper, we consider a new type of actor-critic algorithms employing the kernel methods, which have recently shown to be very effective tools in the various fields of machine learning, and have performed investigations on combining the actor-critic strategy together with kernel methods. More specifically, this paper studies actor-critic algorithms utilizing the kernel-based least-squares estimation and policy gradient, and in its critic's part, the study uses a sliding-window-based kernel least-squares method, which leads to a fast and efficient value-function-estimation in a nonparametric setting. The applicability of the considered algorithms is illustrated via a robot locomotion problem and a tunnel ventilation control problem.

Actor-Critic Algorithm with Transition Cost Estimation

  • Sergey, Denisov;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.270-275
    • /
    • 2016
  • We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning (액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어)

  • Moon, Young-Joon;Lee, Jae-Hoon;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.519-524
    • /
    • 2009
  • Recently, reinforcement learning methods have drawn much interests in the area of machine learning. Dominant approaches in researches for the reinforcement learning include the value-function approach, the policy search approach, and the actor-critic approach, among which pertinent to this paper are algorithms studied for problems with continuous states and continuous actions along the line of the actor-critic strategy. In particular, this paper focuses on presenting a method combining the so-called ACFRL(actor-critic fuzzy reinforcement learning), which is an actor-critic type reinforcement learning based on fuzzy theory, together with the RLS-NAC which is based on the RLS filters and natural actor-critic methods. The presented method is applied to a control problem for crawling robots, and some results are reported from comparison of learning performance.

Improved Deep Q-Network Algorithm Using Self-Imitation Learning (Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘)

  • Sunwoo, Yung-Min;Lee, Won-Chang
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.644-649
    • /
    • 2021
  • Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Robot Locomotion via RLS-based Actor-Critic Learning (RLS 기반 Actor-Critic 학습을 이용한 로봇이동)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.7
    • /
    • pp.893-898
    • /
    • 2005
  • Due to the merits that only a small amount of computation is needed for solutions and stochastic policies can be handled explicitly, the actor-critic algorithm, which is a class of reinforcement learning methods, has recently attracted a lot of interests in the area of artificial intelligence. The actor-critic network composes of tile actor network for selecting control inputs and the critic network for estimating value functions, and in its training stage, the actor and critic networks take the strategy, of changing their parameters adaptively in order to select excellent control inputs and yield accurate approximation for value functions as fast as possible. In this paper, we consider a new actor-critic algorithm employing an RLS(Recursive Least Square) method for critic learning, and policy gradients for actor learning. The applicability of the considered algorithm is illustrated with experiments on the two linked robot arm.

Adaptive Actor-Critic Learning of Mobile Robots Using Actual and Simulated Experiences

  • Rafiuddin Syam;Keigo Watanabe;Kiyotaka Izumi;Kazuo Kiguchi;Jin, Sang-Ho
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.43.6-43
    • /
    • 2001
  • In this paper, we describe an actor-critic method as a kind of temporal difference (TD) algorithms. The value function is regarded as a current estimator, in which two value functions have different inputs: one is an actual experience; the other is a simulated experience obtained through a predictive model. Thus, the parameter´s updating for the actor and critic parts is based on actual and simulated experiences, where the critic is constructed by a radial-basis function neural network (RBFNN) and the actor is composed of a kinematic-based controller. As an example application of the present method, a tracking control problem for the position coordinates and azimuth of a nonholonomic mobile robot is considered. The effectiveness is illustrated by a simulation.

  • PDF

Analysis of Reinforcement Learning Methods for BS Switching Operation (기지국 상태 조정을 위한 강화 학습 기법 분석)

  • Park, Hyebin;Lim, Yujin
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.2
    • /
    • pp.351-358
    • /
    • 2018
  • Reinforcement learning is a machine learning method which aims to determine a policy to get optimal actions in dynamic and stochastic environments. But reinforcement learning has high computational complexity and needs a lot of time to get solution, so it is not easily applicable to uncertain and continuous environments. To tackle the complexity problem, AC (actor-critic) method is used and it separates an action-value function into a value function and an action decision policy. Also, in transfer learning method, the knowledge constructed in one environment is adapted to another environment, so it reduces the time to learn in a reinforcement learning method. In this paper, we present AC method and transfer learning method to solve the problem of a reinforcement learning method. Finally, we analyze the case study which a transfer learning method is used to solve BS(base station) switching problem in wireless access networks.

Suspension Control using Reinforcement Learning (강화학습에 의한 현가장치의 제어)

  • Jeong, Gyu-Baek;Mun, Yeong-Jun;Park, Ju-Yeong
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.163-166
    • /
    • 2007
  • 최근에 국내외의 인공지능 분야에서는, 강화학습(reinforcement learning)에 관한 연구가 활발히 진행되고 있다. 본 논문에서는 능동형 현가장치(active-suspension)의 제어를 위하여 RLS 기반 NAC(natural actor-critic)을 활용한 강화학습 기법을 적용해보고, 그 성능을 시뮬레이션을 통해 확인해본다.

  • PDF

Tunnel Ventilation Controller Design Employing RLS-Based Natural Actor-Critic Algorithm (RLS 기반의 Natural Actor-Critic 알고리즘을 이용한 터널 환기제어기 설계)

  • Chu B.;Kim D.;Hong D.;Park J.;Chung J.T.;Kim T.H.
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.53-54
    • /
    • 2006
  • The main purpose of tunnel ventilation system is to maintain CO pollutant and VI (visibility index) under an adequate level to provide drivers with safe driving condition. Moreover, it is necessary to minimize power consumption used to operate ventilation system. To achieve the objectives, the control algorithm used in this research is reinforcement teaming (RL) method. RL is a goal-directed teaming of a mapping from situations to actions. The goal of RL is to maximize a reward which is an evaluative feedback from the environment. Constructing the reward of the tunnel ventilation system, two objectives listed above are included. RL algorithm based on actor-critic architecture and natural gradient method is adopted to the system. Also, the recursive least-squares (RLS) is employed to the learning process to improve the efficiency of the use of data. The simulation results performed with real data collected from existing tunnel are provided in this paper. It is confirmed that with the suggested controller, the pollutant level inside the tunnel was well maintained under allowable limit and the performance of energy consumption was improved compared to conventional control scheme.

  • PDF

Robot locomotion via IRPO based Actor-Critic Learning Method (IRPO 기반 Actor-Critic 학습 기법을 이용한 로봇이동)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Proceedings of the KIEE Conference
    • /
    • 2005.07d
    • /
    • pp.2933-2935
    • /
    • 2005
  • The IRPO(Intensive Randomized Policy Optimizer) algorithm is a recently developed tool in the area of reinforcement leaming. And it has been shown to be very successful in several application problems. To compare with a general RL method, IRPO has some difference in that policy utilizes the entire history of agent -environment interaction. The policy is derived from the history directly, not through any kind of a model of the environment. In this paper, we consider a robot-control problem utilizing a IRPO algorithm. We also developed a MATLAH-based animation program, by which the effectiveness of the training algorithms were observed.

  • PDF