[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/IJFIS.2016.16.4.270

Actor-Critic Algorithm with Transition Cost Estimation

Sergey, Denisov (Department of Electrical and Computer Engineering, Sungkyunkwan University)
Lee, Jee-Hyong (Department of Electrical and Computer Engineering, Sungkyunkwan University)

Publication Information

International Journal of Fuzzy Logic and Intelligent Systems / v.16, no.4, 2016 , pp. 270-275 More about this Journal

Abstract

We present an approach for acceleration actor-critic algorithm for reinforcement learning with continuous action space. Actor-critic algorithm has already proved its robustness to the infinitely large action spaces in various high dimensional environments. Despite that success, the main problem of the actor-critic algorithm remains the same-speed of convergence to the optimal policy. In high dimensional state and action space, a searching for the correct action in each state takes enormously long time. Therefore, in this paper we suggest a search accelerating function that allows to leverage speed of algorithm convergence and reach optimal policy faster. In our method, we assume that actions may have their own distribution of preference, that independent on the state. Since in the beginning of learning agent act randomly in the environment, it would be more efficient if actions were taken according to the some heuristic function. We demonstrate that heuristically-accelerated actor-critic algorithm learns optimal policy faster, using Educational Process Mining dataset with records of students' course learning process and their grades.

Keywords

Actor-critic algorithm; Reinforcement learning; Continuous action space; Heuristic function;

Citations & Related Records

Reference

1	H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double Q-learning," Available https://arxiv.org/abs/1509.06461
2	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al, "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, 2015. http://dx.doi.org/10.1038/nature14236 DOI
3	C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3, pp. 279-292, 1992. http://dx.doi.org/10.1023/A:1022676722315
4	D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
5	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning," Available https://arxiv.org/abs/1312.5602
6	Y. Tkachenko, "Autonomous CRM control via CLV approximation with deep reinforcement learning in discrete and continuous action space," Available https://arxiv.org/abs/1504.01840
7	M. Riedmiller, "Neural fitted Q iteration: first experiences with a data efficient neural reinforcement learning method," in Machine learning: ECML 2005, J. Gama, R. Camacho, P. B. Brazdil, A. M. Jorge, and L. Torgo, Eds. Berlin: Springer Berlin Heidelberg, 2005, pp. 317-328. http://dx.doi.org/10.1007/1156409632
8	T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," Available https://arxiv.org/abs/1509.02971
9	R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems 12, vol. 99, pp. 1057-1063, 2000.
10	D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," in Proceedings of the 31st International Conference on Machine Learning (ICML-14), Beijing, China, 2014, pp. 387-395.
11	L. A. Celiberto, C. H. C. Ribeiro, A. H. R. Costa, and R. A. C. Bianchi, "Heuristic reinforcement learning applied to robocup simulation agents," in RoboCup 2007: Robot Soccer World Cup XI, U. Visser, F. Ribeiro, T. Ohashi, and F. Dellaert, Eds. Berlin: Springer Berlin Heidelberg, 2008, pp 220-227. http://dx.doi.org/10.1007/978-3-540-68847-119
12	R. A. C. Bianchi, M. F. Martins, C. H. C. Ribeiro, and A. H. R. Costa, "Heuristically-accelerated multiagent reinforcement learning," IEEE Transactions on Cybernetics, vol. 44, no. 2, pp. 252-265, 2014. http://dx.doi.org/10.1109/TCYB.2013.2253094 DOI
13	R. Sutton, "Generalization in reinforcement learning: successful examples using sparse coarse coding," Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996.