1 |
G. Brockman, V.Cheung, L. Pettersson, J. Schneider, J.Schulman, J.Tang, and W. Zaremba, "OpenAI Gym", arXiv preprint arXiv, 1606.1540, Jun. 2016.
|
2 |
Clifton, J., and Laber, E., "Q-Learning: Theory and Applications", Annual Review of Statistics and Its Application, Vol. 7, No. 1, pp. 279-301, Mar. 2020.
DOI
|
3 |
Watkins, C.J.C.H., Dayan, P., "Q-learning", Machine Learning, Vol. 8, No. 1, pp. 279-292, May. 1992.
|
4 |
Watkins, C.J.C.H, Learning from Delayed Rewards, Ph.D. thesis, King's College, London, May. 1989.
|
5 |
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning", arXiv preprint arXiv, 1312.5602, Dec. 2013.
|