An Improved Reinforcement Learning Technique for Mission Completion

;;;

The Transactions of the Korean Institute of Electrical Engineers D (대한전기학회논문지:시스템및제어부문D)

Volume 52 Issue 9
/
Pages.533-539
/
2003
/
1229-6287(pISSN)

The Korean Institute of Electrical Engineers (대한전기학회)

An Improved Reinforcement Learning Technique for Mission Completion

임무수행을 위한 개선된 강화학습 방법

권우영 (한양대학 정보통신대학원) ;
이상훈 (한양대학 전기전자제어계측학과) ;
서일홍 (한양대학 정보통신대학원)

Published : 2003.09.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Reinforcement learning (RL) has been widely used as a learning mechanism of an artificial life system. However, RL usually suffers from slow convergence to the optimum state-action sequence or a sequence of stimulus-response (SR) behaviors, and may not correctly work in non-Markov processes. In this paper, first, to cope with slow-convergence problem, if some state-action pairs are considered as disturbance for optimum sequence, then they no to be eliminated in long-term memory (LTM), where such disturbances are found by a shortest path-finding algorithm. This process is shown to let the system get an enhanced learning speed. Second, to partly solve a non-Markov problem, if a stimulus is frequently met in a searching-process, then the stimulus will be classified as a sequential percept for a non-Markov hidden state. And thus, a correct behavior for a non-Markov hidden state can be learned as in a Markov environment. To show the validity of our proposed learning technologies, several simulation result j will be illustrated.

Keywords

References

M.L. Minsky, 'Steps towards artificial intelligence', In Proceedings of the Institute of Radio Engineers, 49, pp8-30, 1961
A. K. McCallum, 'Reinforcement Learning with selective Perception and Hidden State', PhD thesis, University of Rochester, 1996
R.Sun, C.Sessions, 'Self Segmentation of Sequences', IEEE Trans System Man and Cybernetics, vol. 30, no. 3, pp. 403-418, 2000 https://doi.org/10.1109/3477.846230
M.L. Littman, 'Algorithm for Sequential Decision Making', PhD thesis, Brown University, 1996
S. D. Whitehead, L.J. Lin, 'Reinforcement learning in non-Markov environments', Artificial Intelligence, 1993
R.,Sutton, A. Barto, Reinforcement Learning, MIT Press, 1997
C. Watkins, 'Learning from Delayed Rewards', PhD thesis, University of Cambridge, 1989
B.F. Skinner, Behavior of Organisms, Appleton-Century-Crofts, 1938
D.S. Touretzky, L.M.,Saksida, 'Operant conditioning in skinnerbots', Adaptive Behavior, 5(3/4), pp. 219-247, 1997 https://doi.org/10.1177/105971239700500302
L. Kaelbling, M. Littman, A.,Moore, 'Reinforcement Learning : A Survey', J. Artificial Intelligence Research, vol.4, pp.237-285, 1996
W.S. Lovejoy, 'A survey of algorithmic method for partially observable Markov decision processes', Annual of Operation Research, 28, pp47-66, 1991 https://doi.org/10.1007/BF02055574
R. Sun, C.,Sessions, 'Self Segmentation of Sequences', IEEE Trans System Man and Cybernetics, Vol.30, No. 3, pp.403418, 2000 https://doi.org/10.1109/3477.846230
M. Wieringm, J. Schmidhuber, 'HQ-learnming. Adaptive Behavior', 6:2, pp 219-246, 1997 https://doi.org/10.1177/105971239700600202
M. Humphrys, 'Action selection methods using reinforcement learning', From Animals to Animats 4: Proceedings of the Fourth International conference on Simulation of Adaptive Behavior, Cambridge, MA, pp 135-144, MIT Press, 1996
L. Chrisman, 'Reinforcement Learning with Perceptual Aliasing : The Perceptual Distinctions Approach', National Conference on Artificial Intelligence, pp 183-188, 1992
R. Sun, T. Peterson, 'Autonomous Learning of Sequential Tasks: Experiments and Analyses', IEEE Trans. Neural Networks, vol.9, no.6, Nov. 1998 https://doi.org/10.1109/72.728364
R.E. Neapolitan, Foundation of algorithms : using C++ pseudocode, Jones and Bartlett Publishers, 1998

The Transactions of the Korean Institute of Electrical Engineers D (대한전기학회논문지:시스템및제어부문D)

An Improved Reinforcement Learning Technique for Mission Completion

임무수행을 위한 개선된 강화학습 방법

Abstract

Keywords

References

Detail Search