Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2002.9B.2.155

Online Reinforcement Learning to Search the Shortest Path in Maze Environments  

Kim, Byeong-Cheon (한경대학교 컴퓨터공학과)
Kim, Sam-Geun (한경대학교 컴퓨터공학과)
Yun, Byeong-Ju (명지대학교 컴퓨터공학과)
Abstract
Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$-learning using $TD(\lambda{)}$ in the maze environments.
Keywords
Citations & Related Records
연도 인용수 순위
  • Reference
1 G. Rummery and M. Niranjan, 'On-line Q-learning using connectionist systems,' Technical Report CUED/F-INFENG-TR 166, Cambridge University, U.K., 1994
2 R. S. Sutton and A. G. Barto, An Introduction to Reinforcement Learning : An Introduction, MIT Press, 1998
3 G. A. Rummery, Problem Solving with Reinforcement Learning, Ph.D. thesis, Cambridge University, 1995
4 R. H. Crites and A. G. Barto, 'Improving Elevator Performance Using Reinforcement Learning,' Advances in Neural Information Processing Systems, 8, MIT Press, Cambridge MA, 1996
5 P. Cichosz, 'Truncating temporal differences : On the efficient implementation of TD($\lambda$) for reinforcement learning,' Journal of Artificial Intelligence Research, 2, pp.287-318, 1995
6 S. P. Singh and R. S. Sutton, 'Reinforcement Learning with Replacing Eligibility Traces,' Machine Learning, 22, pp 123-158, 1996   DOI
7 J. Peng and R. Williams, 'Incremental multi-step Q-learning,' Machine Learning, 22, pp.283-290, 1996   DOI
8 M. L. Minsky, 'Steps towards artificial intelligence,' In Proceedings of the Institute of Radio Engineers, 49, pp.8-30, 1961
9 P. Dayan, 'Navigating through temporal difference,' In Advances in Neural Information Processing Systems, 3, Morgan Kaufmann, 1991
10 M. L. Minsky, Theory of Neural-Analog Reinforcement Systems and Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, Princeton, 1954
11 F. S. Ho, 'Traffic flow modeling and control using artificial neural networks,' IEEE Control Systems, 16(5), pp.16-26, 1996   DOI   ScienceOn
12 A. G. Barto, D. A. White and D. A. Sofge, 'Reinforcement learning and adaptive critic methods,' Handbook of Intelligent Control, pp.469-491, 1992
13 A. W. Moore and C. G. Atkeson, 'Prioritized sweeping : Reinforcement Learning with less data and less real time,' Machine Learning, 13, pp.103-130, 1993   DOI
14 C. W. Anderson, 'Learning to control an inverted pendulum using neural networks,' IEEE Control Systems Magazine, pp.31-37, 1989   DOI   ScienceOn