임무수행을 위한 개선된 강화학습 방법

An Improved Reinforcement Learning Technique for Mission Completion

  • 권우영 (한양대학 정보통신대학원) ;
  • 이상훈 (한양대학 전기전자제어계측학과) ;
  • 서일홍 (한양대학 정보통신대학원)
  • 발행 : 2003.09.01

초록

Reinforcement learning (RL) has been widely used as a learning mechanism of an artificial life system. However, RL usually suffers from slow convergence to the optimum state-action sequence or a sequence of stimulus-response (SR) behaviors, and may not correctly work in non-Markov processes. In this paper, first, to cope with slow-convergence problem, if some state-action pairs are considered as disturbance for optimum sequence, then they no to be eliminated in long-term memory (LTM), where such disturbances are found by a shortest path-finding algorithm. This process is shown to let the system get an enhanced learning speed. Second, to partly solve a non-Markov problem, if a stimulus is frequently met in a searching-process, then the stimulus will be classified as a sequential percept for a non-Markov hidden state. And thus, a correct behavior for a non-Markov hidden state can be learned as in a Markov environment. To show the validity of our proposed learning technologies, several simulation result j will be illustrated.

키워드

참고문헌

  1. M.L. Minsky, 'Steps towards artificial intelligence', In Proceedings of the Institute of Radio Engineers, 49, pp8-30, 1961
  2. A. K. McCallum, 'Reinforcement Learning with selective Perception and Hidden State', PhD thesis, University of Rochester, 1996
  3. R.Sun, C.Sessions, 'Self Segmentation of Sequences', IEEE Trans System Man and Cybernetics, vol. 30, no. 3, pp. 403-418, 2000 https://doi.org/10.1109/3477.846230
  4. M.L. Littman, 'Algorithm for Sequential Decision Making', PhD thesis, Brown University, 1996
  5. S. D. Whitehead, L.J. Lin, 'Reinforcement learning in non-Markov environments', Artificial Intelligence, 1993
  6. R.,Sutton, A. Barto, Reinforcement Learning, MIT Press, 1997
  7. C. Watkins, 'Learning from Delayed Rewards', PhD thesis, University of Cambridge, 1989
  8. B.F. Skinner, Behavior of Organisms, Appleton-Century-Crofts, 1938
  9. D.S. Touretzky, L.M.,Saksida, 'Operant conditioning in skinnerbots', Adaptive Behavior, 5(3/4), pp. 219-247, 1997 https://doi.org/10.1177/105971239700500302
  10. L. Kaelbling, M. Littman, A.,Moore, 'Reinforcement Learning : A Survey', J. Artificial Intelligence Research, vol.4, pp.237-285, 1996
  11. W.S. Lovejoy, 'A survey of algorithmic method for partially observable Markov decision processes', Annual of Operation Research, 28, pp47-66, 1991 https://doi.org/10.1007/BF02055574
  12. R. Sun, C.,Sessions, 'Self Segmentation of Sequences', IEEE Trans System Man and Cybernetics, Vol.30, No. 3, pp.403418, 2000 https://doi.org/10.1109/3477.846230
  13. M. Wieringm, J. Schmidhuber, 'HQ-learnming. Adaptive Behavior', 6:2, pp 219-246, 1997 https://doi.org/10.1177/105971239700600202
  14. M. Humphrys, 'Action selection methods using reinforcement learning', From Animals to Animats 4: Proceedings of the Fourth International conference on Simulation of Adaptive Behavior, Cambridge, MA, pp 135-144, MIT Press, 1996
  15. L. Chrisman, 'Reinforcement Learning with Perceptual Aliasing : The Perceptual Distinctions Approach', National Conference on Artificial Intelligence, pp 183-188, 1992
  16. R. Sun, T. Peterson, 'Autonomous Learning of Sequential Tasks: Experiments and Analyses', IEEE Trans. Neural Networks, vol.9, no.6, Nov. 1998 https://doi.org/10.1109/72.728364
  17. R.E. Neapolitan, Foundation of algorithms : using C++ pseudocode, Jones and Bartlett Publishers, 1998