Potential-based Reinforcement Learning Combined with Case-based Decision Theory

사례 기반 결정 이론을 융합한 포텐셜 기반 강화 학습

  • 김은선 (서강대학교 컴퓨터공학과) ;
  • 장형수 (서강대학교 컴퓨터공학과)
  • 발행 : 2009.12.15


This paper proposes a potential-based reinforcement learning, called "RLs-CBDT", which combines multiple RL agents and case-base decision theory designed for decision making in uncertain environment as an expert knowledge in RL. We empirically show that RLs-CBDT converges to an optimal policy faster than pre-existing RL algorithms through a Tetris experiment.

본 논문에서는 다수의 강화 학습 에이전트들의 학습 결과 및 Expert의 지식을 하나의 학습 알고리즘으로 융합하는 강화학습인 "potential-based" reinforcement learning(RL)기법에 불확실한 환경에서의 의사결정 알고리즘인 Case-based Decision Theory(CBDT)를 적용한 "RLs-CBDT"를 제안한다. 그리고 테트리스 실험을 통하여 기존의 RL 알고리즘에 비해 RLs-CBDT가 최적의 정책에 더 마르게 수렴하는 것을 보인다.



  1. M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, wiley, New York, 1994
  2. R. Sutton and A. Barrto, Reinforcement Learning, MIT Press, 2000
  3. L. P. Kaelbling, Michael L. Littman, Andrew W. Moore, “Reinforcement learning: A survey,” Journal of Articial Intelligence Research, vol.4, pp.237-285, 1996 https://doi.org/10.1613/jair.301
  4. H. S. Chang, “Reinforcement Learning with Supervision by Combining Multiple Learnings and Expert Advices,” in Proc. of the 2006 American Control Conference, pp.4159-4164, June, 2006 https://doi.org/10.1109/ACC.2006.1657371
  5. A. Y. Ng, D. Harada, S. Russel, “Policy invariance under reward transformations:theory and applica-tion to reward shaping,” in Proc. of the 16th Int. Conf. on Machine Learning, pp.278-287, 1999
  6. I. Gilboa and D. Schmeidler, "Case-based decision theory," Quart. J. Economics, vol.110, no.4, pp.605-639, 1995 https://doi.org/10.2307/2946694
  7. E. Hllermeier “Experience-based decision making: a satisficing decision tree approach,” IEEE Trans-actions on Systems, Man, and Cybernetics, vol.35, no.5, pp.641-653, 2005 https://doi.org/10.1109/TSMCA.2005.851145
  8. S. Singh, T. jaakkola, M. Littman, and C. Sze-pesvari, “Convergence results for single-step on-policy reinforcement learning algorithms,” Machine Learning, vol.38, pp.287-308, 2000 https://doi.org/10.1023/A:1007678930559
  9. S. Melax “Reinforcement learning tetris example,” 1998. URL http://www.melax.com/tetris/