Q-Learning Policy and Reward Design for Efficient Path Selection |
Yong, Sung-Jung
(Department of Computer Science and Engineering, Korea University of Technology and Education)
Park, Hyo-Gyeong (Department of Computer Science and Engineering, Korea University of Technology and Education) You, Yeon-Hwi (Department of Computer Science and Engineering, Korea University of Technology and Education) Moon, Il-Young (Department of Computer Science and Engineering, Korea University of Technology and Education) |
1 | G. Brockman, V.Cheung, L. Pettersson, J. Schneider, J.Schulman, J.Tang, and W. Zaremba, "OpenAI Gym", arXiv preprint arXiv, 1606.1540, Jun. 2016. |
2 | Clifton, J., and Laber, E., "Q-Learning: Theory and Applications", Annual Review of Statistics and Its Application, Vol. 7, No. 1, pp. 279-301, Mar. 2020. DOI |
3 | Watkins, C.J.C.H., Dayan, P., "Q-learning", Machine Learning, Vol. 8, No. 1, pp. 279-292, May. 1992. |
4 | Watkins, C.J.C.H, Learning from Delayed Rewards, Ph.D. thesis, King's College, London, May. 1989. |
5 | V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning", arXiv preprint arXiv, 1312.5602, Dec. 2013. |