DOI QR코드

DOI QR Code

액터-크리틱 퍼지 강화학습을 이용한 기는 로봇의 제어

Control of Crawling Robot using Actor-Critic Fuzzy Reinforcement Learning

  • 문영준 (대우조선해양 미래연구소) ;
  • 이재훈 (고려대학교 제어계측공학과) ;
  • 박주영 (고려대학교 제어계측공학과)
  • 투고 : 2009.04.06
  • 심사 : 2009.06.26
  • 발행 : 2009.08.25

초록

최근에 강화학습 기법은 기계학습 분야에서 많은 관심을 끌어왔다. 강화학습 관련 연구에서 가장 유력하게 사용되어 온 방법들로는 가치함수를 활용하는 기법, 제어규칙(policy) 탐색 기법 및 액터-크리틱 기법 등이 있는데, 본 논문에서는 이들 중 연속 상태 및 연속 입력을 갖는 문제를 위하여 액터-크리틱 기법의 틀에서 제안된 알고리즘들과 관련된 내용을 다룬다. 특히 본 논문은 퍼지 이론에 기반을 둔 액터-크리틱 계열 강화학습 기법인 ACFRL 알고리즘과, RLS 필터와 NAC(natural actor-critic) 기법에 기반을 둔 RLS-NAC 기법을 접목하는 방안을 집중적으로 고찰한다. 고찰된 방법론은 기는 로봇의 제어문제에 적용되고, 학습 성능의 비교로부터 얻어진 몇 가지 결과가 보고된다.

Recently, reinforcement learning methods have drawn much interests in the area of machine learning. Dominant approaches in researches for the reinforcement learning include the value-function approach, the policy search approach, and the actor-critic approach, among which pertinent to this paper are algorithms studied for problems with continuous states and continuous actions along the line of the actor-critic strategy. In particular, this paper focuses on presenting a method combining the so-called ACFRL(actor-critic fuzzy reinforcement learning), which is an actor-critic type reinforcement learning based on fuzzy theory, together with the RLS-NAC which is based on the RLS filters and natural actor-critic methods. The presented method is applied to a control problem for crawling robots, and some results are reported from comparison of learning performance.

키워드

참고문헌

  1. Q. Yang, J. B. Vance, and S. Jagannathan, 'Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks,' IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol. 38, no. 4, pp. 994-1001, 2008 https://doi.org/10.1109/TSMCB.2008.926607
  2. J. Valasek, J. Doebbler, M. D. Tandale, and A. J. Meade, 'Improved adaptive-reinforcement learning control for morphing unmanned air vehicles,' IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 4, pp. 1014-1020, 2008 https://doi.org/10.1109/TSMCB.2008.922018
  3. K.-H. Park, Y.-J. Kim, and J.-H. Kim, 'Modular Q-learning based multi-agent cooperation for robot soccer,' Robotics and Autonomous Systems, vol. 35, no. 2, pp. 109-122, 2001
  4. J. Moody and M. Saffell, 'Learning to trade via direct reinforcement,' IEEE Transactions on Neural Networks, vol. 12, no. 4, pp. 875-889, 2001 https://doi.org/10.1109/72.935097
  5. R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998
  6. H. R. Berenji and D. Vengerov, 'A convergent actor-critic-based RFL algorithm with application to power management of wireless transmitters', IEEE Transactions on Fuzzy Systems, vol. 11, no. 4, August, 2003
  7. X. Xu, H. He, and D. Hu, 'Efficient reinforcement learning using recursive least-squares methods', Journal of Artificial Intelligent Research, vol. 16, pp. 259-292, 2002
  8. R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, 'Policy gradient methods for reinforcement learning with function approximation', Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 2000
  9. V. Konda and J. N. Tsitsiklis, 'Actor-Critic Algorithms', SIAM Journal on Control and Optimization, vol. 42. no. 4, pp. 1143-1166, 2003 https://doi.org/10.1137/S0363012901385691
  10. J. Peters, S. Vijayakumar, and S. Schaal, 'Reinforcement learning for humanoid robotics', In Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, 2003
  11. J. Park, J. Kim, and D. Kang. 'An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm', Lecture Notes in Artificial Intelligence, vol. 3801, pp. 65-72, December, 2005
  12. H. Kimura, K. Mivazaki, and S. Kobayashi, 'Reinforcement learning in POMDPs with function approximation', In Proceedings of the 14th International Conference on Machine Learning(ICML 1997), pp. 152-160, 1997
  13. 김종호, 강화학습 알고리즘을 이용한 시스템 제어에 대한 연구, 고려대학교 제어계측공학과 석사학위논문, 2005
  14. L. X. Wang, Adaptive Fuzzy Systems and Control: Design and Stability Analysis, Prentice-Hall, 1994
  15. 박종진, 최규석, 퍼지 제어 시스템, 교우사, 2001
  16. T. Takagi and M. Sugeno, 'Fuzzy identification of systems and its applications to modeling and control,' IEEE Transactions on Systems, Man, and Cybernetics, vol. 15, pp. 116-132, 1985
  17. 박주영, 정규백, 문영준, '강화학습에 의해 학습된 기는 로봇의 성능 비교', 한국 퍼지 및 지능시스템학회 논문집, 17권, 1호, pp. 33-36, 2007