[KSCI] Korea Science Citation Index Service

Barycentric Approximator for Reinforcement Learning Control

Whang Cho (Department of control and instrumentation, Kwangwoon University)

Publication Information

International Journal of Precision Engineering and Manufacturing / v.3, no.1, 2002 , pp. 33-42 More about this Journal

Abstract

Recently, various experiments to apply reinforcement learning method to the self-learning intelligent control of continuous dynamic system have been reported in the machine learning related research community. The reports have produced mixed results of some successes and some failures, and show that the success of reinforcement learning method in application to the intelligent control of continuous control systems depends on the ability to combine proper function approximation method with temporal difference methods such as Q-learning and value iteration. One of the difficulties in using function approximation method in connection with temporal difference method is the absence of guarantee for the convergence of the algorithm. This paper provides a proof of convergence of a particular function approximation method based on \"barycentric interpolator\" which is known to be computationally more efficient than multilinear interpolation .

Keywords

Reinforcement Learning; Q-learning; Barycentric Interpolation; Multilinear Interpolation;

Citations & Related Records

Reference

1	C. J. C. H. Watkins, 'Leaming from delayed rewards,' Ph.D thesis, King's college, Cambridge, England, 1989
2	G. Tesauro, 'Neurogammon: a neural network backgammon program,' IN IJCNN Proceedings III pages33-39, 1990
3	A. G. Barto, and R. S. Sutton, 'Reinforcement Leaming: An Introduction,' The MIT Press, Cambhdge, Massachusetts, 1998
4	R. H. Crites, and A. G. Barto, ' Improving elevator performance using reinforcement leaming,' Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1017-1023. MIT Press, Cambridge, MA
5	J. A. Boyan, and A. W. Moore, 'Generalization in reinforcement leaming: safely approximating the value t'unction,' Advances in Neural Information Processing Systems, volume 7. Morean Kaufmann, 1995
6	J. N. Tsitsiklis, 'Asynchronous stochastic approximation and Q-learning,' Machine Learning, 16(3): 185-202, 1994 DOI
7	T. Jaakkola, M. I. Jordan, and S. P. Singh, 'On the convergence of stochastic iterative dynamic programming algorithms,' Neural computation, 6(6): 1185-1201,1994 DOI ScienceOn
8	D. T. Bertsekas, and J. N. Tsitsiklis, 'Parallel and Distributed Computation: Numerical Methods,' Prentice Hall, 1989
9	J. C. Santamana, R. S. Sutton, and A. Ram, 'Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces,' COINS Technical Report 96-088, Dec. 1996 DOI ScienceOn
10	D. W. Moore, 'Simplical Mesh Generation with Applications,' PhD. Thesis. Report no. 92-1322, Comell University, 1992
11	A. W. Moore, 'Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces,' Machine Learning: Proceedings of the eighth international workshop, Morgan Kaufmann, 1991
12	R. S. Sutton, 'Leaming to predict by the methods of tempora) differences,' Machine Leaming, 3(1):9-44, 1988 DOI
13	J. A, Boyan, and A. W. Moore, 'Generalization in reinforcement learning: safely approximating the value function,' Advances in Neural Information Processing Systems, volume 7, Morgan Kaufmann, 1995