Reinforcement Learning with Clustering for Function Approximation and Rule Extraction

함수근사와 규칙추출을 위한 클러스터링을 이용한 강화학습

  • 이영아 (경희대학교 컴퓨터공학과) ;
  • 홍석미 (경희대학교 컴퓨터공학과) ;
  • 정태충 (경희대학교 컴퓨터공학과)
  • Published : 2003.12.01

Abstract

Q-Learning, a representative algorithm of reinforcement learning, experiences repeatedly until estimation values about all state-action pairs of state space converge and achieve optimal policies. When the state space is high dimensional or continuous, complex reinforcement learning tasks involve very large state space and suffer from storing all individual state values in a single table. We introduce Q-Map that is new function approximation method to get classified policies. As an agent learns on-line, Q-Map groups states of similar situations and adapts to new experiences repeatedly. State-action pairs necessary for fine control are treated in the form of rule. As a result of experiment in maze environment and mountain car problem, we can achieve classified knowledge and extract easily rules from Q-Map

강화학습의 대표적인 알고리즘인 Q-Learning은 상태공간의 모든 상태-행동 쌍(state-action pairs)의 평가값이 수렴할 때까지 반복해서 경험하여 최적의 전략(policy)을 얻는다. 상태공간을 구성하는 요소(feature)들이 많거나 요소의 데이타 형태가 연속형(continuous)인 경우, 상태공간은 지수적으로 증가하게 되어, 모든 상태들을 반복해서 경험해야 하고 모든 상태-행동 쌍의 Q값을 저장하는 것은 시간과 메모리에 있어서 어려운 문제이다. 본 논문에서는 온라인으로 학습을 진행하면서 비슷한 상황의 상태들을 클러스터링(clustering)하고 새로운 경험에 적응해서 클러스터(cluster)의 수정(update)을 반복하여, 분류된 최적의 전략(policy)을 얻는 새로운 함수근사(function approximation)방법인 Q-Map을 소개한다. 클러스터링으로 인해 정교한 제어가 필요한 상태(state)는 규칙(rule)으로 추출하여 보완하였다. 미로환경과 마운틴 카 문제를 제안한 Q-Map으로 실험한 결과 분류된 지식을 얻을 수 있었으며 가시화된(explicit) 지식의 형태인 규칙(rule)으로도 쉽게 변환할 수 있었다.

Keywords

References

  1. Stuart I. Reynolds, Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partition, Advances in Artificial Intelligence, 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence(AI-2001), Ottawa, Canada, June 2001, Proceedings
  2. Michael Herrmann, Ralf Der, Efficient Q-Learning by Division of Labor, in Proc. International Conference on Artificial Neural Networks-ICANN'95, Vol. II, S.129-134
  3. Ron Sun, knowledge Extraction from Reinforcement Learning, Proceedings of International Joint Conference on Neural Networks, Washington, DC. July 10-15, 1999. IEEE Press, Piscataway, NJ
  4. Rudy Setiono and Huan Liu, Symbolic Representation of Neural Networks, IEEE Computer March 1996 (Vol. 29, No. 3) pp. 71-77 https://doi.org/10.1109/2.485895
  5. Ron Sun, Supplementing Neural Reinforcement Learning with Symbolic Methods: Possibilities and Challenges, Proceedings of International Joint Conference on Neural Networks, Washington, DC. July 10-15, 1999. IEEE Press, Piscataway, NJ https://doi.org/10.1109/IJCNN.1999.830828
  6. Richard S. Sutton, Generalization in Reinforcement Learning: Successful Examples Using sparse Coarse Coding, Advances in Neural Information Processing Systems, pp.1038-1044, MIT Press, 1996
  7. Edward Keedwell, Ajit Narayanan and Dragon Savic, Using Genetic algorithms to extract rules from trained neural networks, Proceedings of the Genetic and Evolutionary Computing Conference, Volume 1, Morgan Kaufmann Publishers, San Francisco, California, USA, 1999: 793
  8. R.Matthew Kretchmar, Charles W. Anderson, Comparison of CMACs and Radial Basis Functions for Local Function Approximators in Reinforcement Learning, ICNN'97. International Conference on Neural Networks. 1997
  9. Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu, Clustering by Pattern Similarity in Large Data Sets, ACM SIGMOD Conference 2002 Madison, Wisconsin, USA https://doi.org/10.1145/564691.564737
  10. R. Matthew Kretchmar, Charles W. Anderson, Using Temporal Neighborhoods to Adapt Function Approximators in Reinforcement Learning, IWANN99: International Work Conference on Artificial and Natural Neural Networks : Alicante, Spain. June 1999
  11. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA., 1998