Browse > Article
http://dx.doi.org/10.12673/jant.2022.26.2.72

Q-Learning Policy and Reward Design for Efficient Path Selection  

Yong, Sung-Jung (Department of Computer Science and Engineering, Korea University of Technology and Education)
Park, Hyo-Gyeong (Department of Computer Science and Engineering, Korea University of Technology and Education)
You, Yeon-Hwi (Department of Computer Science and Engineering, Korea University of Technology and Education)
Moon, Il-Young (Department of Computer Science and Engineering, Korea University of Technology and Education)
Abstract
Among the techniques of reinforcement learning, Q-Learning means learning optimal policies by learning Q functions that perform actionsin a given state and predict future efficient expectations. Q-Learning is widely used as a basic algorithm for reinforcement learning. In this paper, we studied the effectiveness of selecting and learning efficient paths by designing policies and rewards based on Q-Learning. In addition, the results of the existing algorithm and punishment compensation policy and the proposed punishment reinforcement policy were compared by applying the same number of times of learning to the 8x8 grid environment of the Frozen Lake game. Through this comparison, it was analyzed that the Q-Learning punishment reinforcement policy proposed in this paper can significantly increase the learning speed compared to the application of conventional algorithms.
Keywords
OpenAI Gym; Path Selection; Q-Learning; Reinforcement Learning; Reward Policy;
Citations & Related Records
연도 인용수 순위
  • Reference
1 G. Brockman, V.Cheung, L. Pettersson, J. Schneider, J.Schulman, J.Tang, and W. Zaremba, "OpenAI Gym", arXiv preprint arXiv, 1606.1540, Jun. 2016.
2 Clifton, J., and Laber, E., "Q-Learning: Theory and Applications", Annual Review of Statistics and Its Application, Vol. 7, No. 1, pp. 279-301, Mar. 2020.   DOI
3 Watkins, C.J.C.H., Dayan, P., "Q-learning", Machine Learning, Vol. 8, No. 1, pp. 279-292, May. 1992.
4 Watkins, C.J.C.H, Learning from Delayed Rewards, Ph.D. thesis, King's College, London, May. 1989.
5 V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning", arXiv preprint arXiv, 1312.5602, Dec. 2013.