[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTB.2002.9B.2.155

Online Reinforcement Learning to Search the Shortest Path in Maze Environments

Kim, Byeong-Cheon (한경대학교 컴퓨터공학과)
Kim, Sam-Geun (한경대학교 컴퓨터공학과)
Yun, Byeong-Ju (명지대학교 컴퓨터공학과)

Publication Information

The KIPS Transactions:PartB / v.9B, no.2, 2002 , pp. 155-162 More about this Journal

Abstract

Reinforcement learning is a learning method that uses trial-and-error to perform Learning by interacting with dynamic environments. It is classified into online reinforcement learning and delayed reinforcement learning. In this paper, we propose an online reinforcement learning system (ONRELS : Outline REinforcement Learning System). ONRELS updates the estimate-value about all the selectable (state, action) pairs before making state-transition at the current state. The ONRELS learns by interacting with the compressed environments through trial-and-error after it compresses the state space of the mage environments. Through experiments, we can see that ONRELS can search the shortest path faster than Q-learning using TD-ewor and $Q(\lambda{)}$ -learning using $TD(\lambda{)}$ in the maze environments.

Keywords

Citations & Related Records

Reference

1	G. Rummery and M. Niranjan, 'On-line Q-learning using connectionist systems,' Technical Report CUED/F-INFENG-TR 166, Cambridge University, U.K., 1994
2	R. S. Sutton and A. G. Barto, An Introduction to Reinforcement Learning : An Introduction, MIT Press, 1998
3	G. A. Rummery, Problem Solving with Reinforcement Learning, Ph.D. thesis, Cambridge University, 1995
4	R. H. Crites and A. G. Barto, 'Improving Elevator Performance Using Reinforcement Learning,' Advances in Neural Information Processing Systems, 8, MIT Press, Cambridge MA, 1996
5	P. Cichosz, 'Truncating temporal differences : On the efficient implementation of TD( $\lambda$ ) for reinforcement learning,' Journal of Artificial Intelligence Research, 2, pp.287-318, 1995
6	S. P. Singh and R. S. Sutton, 'Reinforcement Learning with Replacing Eligibility Traces,' Machine Learning, 22, pp 123-158, 1996 DOI
7	J. Peng and R. Williams, 'Incremental multi-step Q-learning,' Machine Learning, 22, pp.283-290, 1996 DOI
8	M. L. Minsky, 'Steps towards artificial intelligence,' In Proceedings of the Institute of Radio Engineers, 49, pp.8-30, 1961
9	P. Dayan, 'Navigating through temporal difference,' In Advances in Neural Information Processing Systems, 3, Morgan Kaufmann, 1991
10	M. L. Minsky, Theory of Neural-Analog Reinforcement Systems and Application to the Brain-Model Problem, Ph.D. Thesis, Princeton University, Princeton, 1954
11	F. S. Ho, 'Traffic flow modeling and control using artificial neural networks,' IEEE Control Systems, 16(5), pp.16-26, 1996 DOI ScienceOn
12	A. G. Barto, D. A. White and D. A. Sofge, 'Reinforcement learning and adaptive critic methods,' Handbook of Intelligent Control, pp.469-491, 1992
13	A. W. Moore and C. G. Atkeson, 'Prioritized sweeping : Reinforcement Learning with less data and less real time,' Machine Learning, 13, pp.103-130, 1993 DOI
14	C. W. Anderson, 'Learning to control an inverted pendulum using neural networks,' IEEE Control Systems Magazine, pp.31-37, 1989 DOI ScienceOn

KSCI

Online Reinforcement Learning to Search the Shortest Path in Maze Environments 미로 환경에서 최단 경로 탐색을 위한 실시간 강화 학습

Online Reinforcement Learning to Search the Shortest Path in Maze Environments