DOI QR코드

DOI QR Code

Kernel-based actor-critic approach with applications

  • Chu, Baek-Suk (Department of Intelligent Mechanical Engineering, Kumoh National Institute of Technology) ;
  • Jung, Keun-Woo (Department of Control and Instrumentation Engineering, Korea University) ;
  • Park, Joo-Young (Department of Control and Instrumentation Engineering, Korea University)
  • Received : 2011.08.25
  • Accepted : 2011.10.22
  • Published : 2011.12.25

Abstract

Recently, actor-critic methods have drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. In this paper, we consider a new type of actor-critic algorithms employing the kernel methods, which have recently shown to be very effective tools in the various fields of machine learning, and have performed investigations on combining the actor-critic strategy together with kernel methods. More specifically, this paper studies actor-critic algorithms utilizing the kernel-based least-squares estimation and policy gradient, and in its critic's part, the study uses a sliding-window-based kernel least-squares method, which leads to a fast and efficient value-function-estimation in a nonparametric setting. The applicability of the considered algorithms is illustrated via a robot locomotion problem and a tunnel ventilation control problem.

Keywords

References

  1. A. G. Barto, R. S. Sutton, C. W. Anderson, "Neuronlike elements that can solve difficult learning control problems," IEEE Transactions on Systems Man and Cybernetics, vol. 13, pp. 835-846, 1983, .
  2. H. R. Berenji, D. Vengerov, "A convergent actor-criticbased FRL algorithm with application to power management of wireless transmitters," IEEE Tranactions on Fuzzy Systems, vol. 11, pp. 478-485, 2003, . https://doi.org/10.1109/TFUZZ.2003.814834
  3. H. Kimura, S. Kobayashi, "An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function," In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 111-116, 1998.
  4. J. Park, J. Kim, D. Kang, "An RLS-based natural actorcritic algorithm for locomotion of a two-linked robot arm," Lecture Notes in Artificial Intelligence, vol. 3801, pp. 65-72, 2005.
  5. J. Peters, S. Vijayakumar, S. Schaal, "Reinforcement learning for humanoid robotics," In Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots (Humanoids2003), 2003.
  6. P. Thomas, M. Branicky, N. Kobori, K. Suzuki, P. Hartono, S. Hashimoto, "Learning to control a joint driven double inverted pendulum using nested actor/critic algorithm," In Proceedings of the 9th International Conference on Neural Information Processing, 2002.
  7. J. Park, D. Kang, J. Lee, D. Nam, "An actor-critic algorithm using kernel-based least-squares estimation: An application to robot locomotion," In Proceedings of 2009 CACS International Automatic Control Conference, 2009.
  8. R. S. Sutton, A. G. Barto, Reinforcement Learning: an Introduction, MIT Press, Cambridge, 1998.
  9. B. Scholkopf, A. J. Smola, Learning with Kernels, MIT Press, Cambridge, 2002.
  10. J. Park, D. Nam, J. Lee, "Some observations on kernelbased function approximation steps for actor-critic methods," In Proceedings of KIIS Fall Conference, vol. 19, no. 2, pp. 79-82, 2009.
  11. J. A. Boyan, "Technical update: Least-squares temporal difference learning," Machine Learning, vol. 49, pp. 233-246, 2002. https://doi.org/10.1023/A:1017936530646
  12. S. V. Vaerenbergh, J. Vıa, I. Santamarıa, "Nonlinear system identification using a new sliding-window kernel RLS algorithm," Journal of Communications, vol. 2, no. 3, pp. 1-8, 2007.
  13. R. S. Sutton, D. McAllester, S. Singh, Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 1999.
  14. H. Kimura, K. Miyazaki, S. Kobayashi, "Reinforcement learning in POMDPs with function approximation," In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 152-160, 1997.
  15. B. Chu, D. Kim, D. Hong, J. Park, J. T. Chung, T.-H. Kim, "Tunnel ventilation control using reinforcement learning methodology," JSME International Series C, vol. 47, no. 4, pp. 939-945, 2006.
  16. D. Hong, B. Chu, W. D. Kim, J. T. Chung, T.-H. Kim, "Pollution level estimation for tunnel ventilation," JSME International Series B, vol. 46, no. 2, pp. 278-286, 2003. https://doi.org/10.1299/jsmeb.46.278
  17. D. Kim, B. Chu, D. Hong, J. T. Chung, T.-H. Kim, "Design of alternating operation algorithm for tunnel ventilation systems," In Proceedings of the Society of Airconditioning and Refrigerating Engineering of Korea 2005 Summer Conference, pp. 872-877, 2005.