Robot Locomotion via RLS-based Actor-Critic Learning

Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young;

doi:10.5391/JKIIS.2005.15.7.893

Journal of the Korean Institute of Intelligent Systems (한국지능시스템학회논문지)

Volume 15 Issue 7
/
Pages.893-898
/
2005
/
1976-9172(pISSN)
/
2288-2324(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

Robot Locomotion via RLS-based Actor-Critic Learning

RLS 기반 Actor-Critic 학습을 이용한 로봇이동

김종호 (고려대학교 제어계측공학과) ;
강대성 (고려대학교 제어계측공학과) ;
박주영 (고려대학교 제어계측공학과)

Published : 2005.12.01

https://doi.org/10.5391/JKIIS.2005.15.7.893 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Due to the merits that only a small amount of computation is needed for solutions and stochastic policies can be handled explicitly, the actor-critic algorithm, which is a class of reinforcement learning methods, has recently attracted a lot of interests in the area of artificial intelligence. The actor-critic network composes of tile actor network for selecting control inputs and the critic network for estimating value functions, and in its training stage, the actor and critic networks take the strategy, of changing their parameters adaptively in order to select excellent control inputs and yield accurate approximation for value functions as fast as possible. In this paper, we consider a new actor-critic algorithm employing an RLS(Recursive Least Square) method for critic learning, and policy gradients for actor learning. The applicability of the considered algorithm is illustrated with experiments on the two linked robot arm.

강화학습 방법론 중 하나의 부류인 액터-크리틱 알고리즘은 제어압력 선택 문제에 있어서 최소한의 계산만을 필요로 하고, 확률적 정책을 명시정으로 다룰 수 있는 장점 때문에 최근에 인공지능 분야에서 많은 관심을 끌고 있다. 액터-크리틱 네트워크는 제어압력 선택 전략을 위한 액터 네트워크와 가치 함수 근사를 위한 크리틱 네트워크로 구성되며, 우수한 제어입력의 서택과 정화한 가치 함수 관사를 최대한 신속하게 달성하기 위하여, 학습 과정 동안 액터와 크리틱은 자신들의 파라미터 백터를 적응적으로 변화시키는 전략을 구사한다. 본 논문은 크리틱의 학습을 위해 빠른 수렴성을 보장하는 RLS (Recursive Least Square)를 사용하고, 액터의 학습을 위해 정책의 기울기(Policy Gradient)를 이용하는 새로운 종류의 알고리즘을 고려한다. 고려된 알고리즘의 적용 가능성은 두개의 링크를 갖는 로봇에 대한 실험을 통하여 예시된다.

Keywords

References

A. Nedic and D. P. Bertsekas 'Least square policy evaluation algorithms with linear function approximation', Journal of Discrete Event Dynamic Systems, Vol. 13, pp. 79-110, 2003 https://doi.org/10.1023/A:1022192903948
J. Peters, S. Vijayakumar and S. Schaal 'Reinforcement learning for humanoid robotics,' Proceedings of 3rd IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, 2003
X. Xu, H. He and D. Hu, 'Efficient reinforcement learning using recursive least-Square methods,' Journal of Artificial Intelligence Research, vol 16, pp. 259-292, 2002
J. Boyan, 'Least-squares temporal difference learning.' Proceedings of the sixteenth International Conference(ICML), pp. 49-56, 1999
J. Boyan, 'Technical update: least-squares temporal difference learning', Machine Learning, vol. 49, pp. 233-246, 2002 https://doi.org/10.1023/A:1017936530646
H. Kimura, K. Miyazaki, and S. Kobayashi, 'Reinforcement learing in POMDPs with function approximation,' Proceedings of the 14th International Conference on Machine Learning (ICML '97), pp. 152-160, 1997
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998
L. Ljung, 'Analysis of recursive stochastic algorithm,' IEEE Transactions on Automatic Control, vol, 22, pp. 551-575, 1977 https://doi.org/10.1109/TAC.1977.1101561
김종호, 강대성, 박주영, 'RPO기반 강화학습 알고리즘을 이용한 로봇제어' 한국 퍼지 및 지능 시템 학회 2005년도 춘계학술 대회 논문집, 15권 1호, pp, 505-507, 2005년 4월

Journal of the Korean Institute of Intelligent Systems (한국지능시스템학회논문지)

Robot Locomotion via RLS-based Actor-Critic Learning

RLS 기반 Actor-Critic 학습을 이용한 로봇이동

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)