References
- M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, wiley, New York, 1994
- R. Sutton and A. Barrto, Reinforcement Learning, MIT Press, 2000
- L. P. Kaelbling, Michael L. Littman, Andrew W. Moore, “Reinforcement learning: A survey,” Journal of Articial Intelligence Research, vol.4, pp.237-285, 1996 https://doi.org/10.1613/jair.301
- H. S. Chang, “Reinforcement Learning with Supervision by Combining Multiple Learnings and Expert Advices,” in Proc. of the 2006 American Control Conference, pp.4159-4164, June, 2006 https://doi.org/10.1109/ACC.2006.1657371
- A. Y. Ng, D. Harada, S. Russel, “Policy invariance under reward transformations:theory and applica-tion to reward shaping,” in Proc. of the 16th Int. Conf. on Machine Learning, pp.278-287, 1999
- I. Gilboa and D. Schmeidler, "Case-based decision theory," Quart. J. Economics, vol.110, no.4, pp.605-639, 1995 https://doi.org/10.2307/2946694
- E. Hllermeier “Experience-based decision making: a satisficing decision tree approach,” IEEE Trans-actions on Systems, Man, and Cybernetics, vol.35, no.5, pp.641-653, 2005 https://doi.org/10.1109/TSMCA.2005.851145
- S. Singh, T. jaakkola, M. Littman, and C. Sze-pesvari, “Convergence results for single-step on-policy reinforcement learning algorithms,” Machine Learning, vol.38, pp.287-308, 2000 https://doi.org/10.1023/A:1007678930559
- S. Melax “Reinforcement learning tetris example,” 1998. URL http://www.melax.com/tetris/