[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/IJFIS.2011.11.4.267

Kernel-based actor-critic approach with applications

Chu, Baek-Suk (Department of Intelligent Mechanical Engineering, Kumoh National Institute of Technology)
Jung, Keun-Woo (Department of Control and Instrumentation Engineering, Korea University)
Park, Joo-Young (Department of Control and Instrumentation Engineering, Korea University)

Publication Information

International Journal of Fuzzy Logic and Intelligent Systems / v.11, no.4, 2011 , pp. 267-274 More about this Journal

Abstract

Recently, actor-critic methods have drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. In this paper, we consider a new type of actor-critic algorithms employing the kernel methods, which have recently shown to be very effective tools in the various fields of machine learning, and have performed investigations on combining the actor-critic strategy together with kernel methods. More specifically, this paper studies actor-critic algorithms utilizing the kernel-based least-squares estimation and policy gradient, and in its critic's part, the study uses a sliding-window-based kernel least-squares method, which leads to a fast and efficient value-function-estimation in a nonparametric setting. The applicability of the considered algorithms is illustrated via a robot locomotion problem and a tunnel ventilation control problem.

Keywords

reinforcement learning; actor-critic algorithm; kernel methods; least-squares; sliding-windows;

Citations & Related Records

Reference

1	B. Chu, D. Kim, D. Hong, J. Park, J. T. Chung, T.-H. Kim, "Tunnel ventilation control using reinforcement learning methodology," JSME International Series C, vol. 47, no. 4, pp. 939-945, 2006.
2	D. Hong, B. Chu, W. D. Kim, J. T. Chung, T.-H. Kim, "Pollution level estimation for tunnel ventilation," JSME International Series B, vol. 46, no. 2, pp. 278-286, 2003. DOI ScienceOn
3	D. Kim, B. Chu, D. Hong, J. T. Chung, T.-H. Kim, "Design of alternating operation algorithm for tunnel ventilation systems," In Proceedings of the Society of Airconditioning and Refrigerating Engineering of Korea 2005 Summer Conference, pp. 872-877, 2005.
4	R. S. Sutton, A. G. Barto, Reinforcement Learning: an Introduction, MIT Press, Cambridge, 1998.
5	B. Scholkopf, A. J. Smola, Learning with Kernels, MIT Press, Cambridge, 2002.
6	J. Park, D. Nam, J. Lee, "Some observations on kernelbased function approximation steps for actor-critic methods," In Proceedings of KIIS Fall Conference, vol. 19, no. 2, pp. 79-82, 2009.
7	J. A. Boyan, "Technical update: Least-squares temporal difference learning," Machine Learning, vol. 49, pp. 233-246, 2002. DOI ScienceOn
8	S. V. Vaerenbergh, J. Vıa, I. Santamarıa, "Nonlinear system identification using a new sliding-window kernel RLS algorithm," Journal of Communications, vol. 2, no. 3, pp. 1-8, 2007.
9	R. S. Sutton, D. McAllester, S. Singh, Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 1999.
10	H. Kimura, K. Miyazaki, S. Kobayashi, "Reinforcement learning in POMDPs with function approximation," In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 152-160, 1997.
11	H. Kimura, S. Kobayashi, "An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function," In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 111-116, 1998.
12	J. Peters, S. Vijayakumar, S. Schaal, "Reinforcement learning for humanoid robotics," In Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots (Humanoids2003), 2003.
13	P. Thomas, M. Branicky, N. Kobori, K. Suzuki, P. Hartono, S. Hashimoto, "Learning to control a joint driven double inverted pendulum using nested actor/critic algorithm," In Proceedings of the 9th International Conference on Neural Information Processing, 2002.
14	J. Park, D. Kang, J. Lee, D. Nam, "An actor-critic algorithm using kernel-based least-squares estimation: An application to robot locomotion," In Proceedings of 2009 CACS International Automatic Control Conference, 2009.
15	A. G. Barto, R. S. Sutton, C. W. Anderson, "Neuronlike elements that can solve difficult learning control problems," IEEE Transactions on Systems Man and Cybernetics, vol. 13, pp. 835-846, 1983, .
16	H. R. Berenji, D. Vengerov, "A convergent actor-criticbased FRL algorithm with application to power management of wireless transmitters," IEEE Tranactions on Fuzzy Systems, vol. 11, pp. 478-485, 2003, . DOI ScienceOn
17	J. Park, J. Kim, D. Kang, "An RLS-based natural actorcritic algorithm for locomotion of a two-linked robot arm," Lecture Notes in Artificial Intelligence, vol. 3801, pp. 65-72, 2005.