강화학습의 Q-learning을 위한 함수근사 방법

A Function Approximation Method for Q-learning of Reinforcement Learning

  • 이영아 (경희대학교 컴퓨터공학과) ;
  • 정태충 (경희대학교 컴퓨터공학과)
  • 발행 : 2004.11.01

초록

강화학습(reinforcement learning)은 온라인으로 환경(environment)과 상호작용 하는 과정을 통하여 목표를 이루기 위한 전략을 학습한다. 강화학습의 기본적인 알고리즘인 Q-learning의 학습 속도를 가속하기 위해서, 거대한 상태공간 문제(curse of dimensionality)를 해결할 수 있고 강화학습의 특성에 적합한 함수 근사 방법이 필요하다. 본 논문에서는 이러한 문제점들을 개선하기 위해서, 온라인 퍼지 클러스터링(online fuzzy clustering)을 기반으로 한 Fuzzy Q-Map을 제안한다. Fuzzy Q-Map은 온라인 학습이 가능하고 환경의 불확실성을 표현할 수 있는 강화학습에 적합한 함수근사방법이다. Fuzzy Q-Map을 마운틴 카 문제에 적용하여 보았고, 학습 초기에 학습 속도가 가속됨을 보였다.

Reinforcement learning learns policies for accomplishing a task's goal by experience through interaction between agent and environment. Q-learning, basis algorithm of reinforcement learning, has the problem of curse of dimensionality and slow learning speed in the incipient stage of learning. In order to solve the problems of Q-learning, new function approximation methods suitable for reinforcement learning should be studied. In this paper, to improve these problems, we suggest Fuzzy Q-Map algorithm that is based on online fuzzy clustering. Fuzzy Q-Map is a function approximation method suitable to reinforcement learning that can do on-line teaming and express uncertainty of environment. We made an experiment on the mountain car problem with fuzzy Q-Map, and its results show that learning speed is accelerated in the incipient stage of learning.

키워드

참고문헌

  1. Richard Sutton, Andrew G. Barto, 'Reinforcement Learning :An Introduction,' MIT Press, 1998
  2. Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moor, 'Reinforcement Learning: A Survey,' Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996
  3. Pierre Yves Glorennce, 'Reinforcement Learning : an Overview,' Proceedings of the European Symposium on Intelligent Techniques, 2000
  4. William Donald Smart, 'Making Reinforcement Learning Work on Real Robots,' Ph. D. Thesis, Brown University, 2002
  5. A.K. Jain, M.N, Murty, P.J. Flynn, 'Data Clustering: A Review,' ACM Computing Surveys, vol. 31, no. 3, 1999 https://doi.org/10.1145/331499.331504
  6. Baraldi, A. and Blonda, P., 1999, 'A Survey of Fuzzy Clustering Algorithms for Pattern Recognition - Part I,' IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 29, No.6, pp. 778-786 https://doi.org/10.1109/3477.809032
  7. Aristidis Likas, 'A Reinforcement Learning Approach to On-line Clustering,' Neural computation 11 (8): 1915-1932, 1999 https://doi.org/10.1162/089976699300016025
  8. Nicolas B. Karayiannis, James C. Bezdek, 'An Integrated Approach to Fuzzy Learning Vector Quantization and Fuzzy c-Means Clstering,' IEEE Transactions of Fuzzy systems, vol. 5, no. 4, 1997 https://doi.org/10.1109/91.649915
  9. 전종원, 민준영, 'GLVQ클러스터링을 위한 필기체 숫자의 효율적인 특징추출 방법', 한국정보처리학회 논문지, vol. 2, no. 6, 1995
  10. Barbara Hammer, Thomas Villmann, 'Generalized Relevance Learning Vector Quantization,' Neural Networks, vol. 15 no. 8-9, pp. 1059-1068, 2002 https://doi.org/10.1016/S0893-6080(02)00079-5
  11. Shyn Jong Hu, 'Pattern Recognition by LVQ and GLVQ Networks,' http://neuron.et.ntust.edu.tw/homework/87/NN/87Homework%232/M8702043
  12. Michael Herrmann, Ralf Der, 'Efficient Q- Learning by Division of Labor,' Proceedings of International Conference on Artificial Neural Networks, 1995
  13. K. Yamada, M. Svinin, K. Ueda, 'Reinforcement Learning with Autonomous State Space Construction using Unsupervised Clustering Method,' Proceedings of the 5th International Symposium on Artificial Life and Robotics, 2000
  14. Lionel Jouffe, 'Fuzzy Inference System Learning by Reinforcement Methods,' IEEE Transactions on Systems, Man and Cybernetics pp. 338-355, 1998. https://doi.org/10.1109/5326.704563
  15. Andrea Bonarini, 'Delayed Reinforcement, Fuzzy Q-Learning and Fuzzy Logic Controllers,' In Herrera, F., Verdegay, J. L. (Eds.) Genetic Algorithms and Soft Computing, pp. 447-466, 1996
  16. Pierre Yves Glorennec, Lionel Jouffe, 'Fuzzy Q-Learning,' Proceedings of Sixth IEEE International Conference on Fuzzy Systems, pp. 719-724, 1997
  17. 정석일, 이연정, '분포기여도를 이용한 퍼지 Q-Learning', 퍼지 및 지능시스템 학회 논문지, vol. 11, no. 5, pp. 388-394, 2001
  18. Richard S. Sutton, 'Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding,' Advances in Neural Information Processing Systems 8, pp. 1038-1044, MIT Press, 1996
  19. R. Matthew Kretchmar, Charles W. Anderson, 'Comparison of CMACs and Radial Basis Functions for Local Function Approximators in Reinforcement Learning,' Proceedings of International Conference on Neural Networks, 1997 https://doi.org/10.1109/ICNN.1997.616132
  20. Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram, 'Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces,' COINS Technical Report 96-88, 1996
  21. William D. Smart, Leslie Pack Kaelbling, 'Practical Reinforcement Learning in Continuous Spaces,' Proceedings of International Conference on Machine Learning, 2000
  22. William D. Smart, Leslie Pack Kaelbling, 'Reinforcement Learning for Robot Control,' In Mobile Robots XVI, 2001