[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.12673/jant.2022.26.2.72

Q-Learning Policy and Reward Design for Efficient Path Selection

Yong, Sung-Jung (Department of Computer Science and Engineering, Korea University of Technology and Education)
Park, Hyo-Gyeong (Department of Computer Science and Engineering, Korea University of Technology and Education)
You, Yeon-Hwi (Department of Computer Science and Engineering, Korea University of Technology and Education)
Moon, Il-Young (Department of Computer Science and Engineering, Korea University of Technology and Education)

Publication Information

Journal of Advanced Navigation Technology / v.26, no.2, 2022 , pp. 72-77 More about this Journal

Abstract

Among the techniques of reinforcement learning, Q-Learning means learning optimal policies by learning Q functions that perform actionsin a given state and predict future efficient expectations. Q-Learning is widely used as a basic algorithm for reinforcement learning. In this paper, we studied the effectiveness of selecting and learning efficient paths by designing policies and rewards based on Q-Learning. In addition, the results of the existing algorithm and punishment compensation policy and the proposed punishment reinforcement policy were compared by applying the same number of times of learning to the 8x8 grid environment of the Frozen Lake game. Through this comparison, it was analyzed that the Q-Learning punishment reinforcement policy proposed in this paper can significantly increase the learning speed compared to the application of conventional algorithms.

Keywords

OpenAI Gym; Path Selection; Q-Learning; Reinforcement Learning; Reward Policy;

Citations & Related Records

Reference

1	G. Brockman, V.Cheung, L. Pettersson, J. Schneider, J.Schulman, J.Tang, and W. Zaremba, "OpenAI Gym", arXiv preprint arXiv, 1606.1540, Jun. 2016.
2	Clifton, J., and Laber, E., "Q-Learning: Theory and Applications", Annual Review of Statistics and Its Application, Vol. 7, No. 1, pp. 279-301, Mar. 2020. DOI
3	Watkins, C.J.C.H., Dayan, P., "Q-learning", Machine Learning, Vol. 8, No. 1, pp. 279-292, May. 1992.
4	Watkins, C.J.C.H, Learning from Delayed Rewards, Ph.D. thesis, King's College, London, May. 1989.
5	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning", arXiv preprint arXiv, 1312.5602, Dec. 2013.

KSCI

Q-Learning Policy and Reward Design for Efficient Path Selection 효율적인 경로 선택을 위한 Q-Learning 정책 및 보상 설계

Q-Learning Policy and Reward Design for Efficient Path Selection