[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7472/jksii.2020.21.5.1

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity

Kim, Chayoung (Division of General Studies, Kyonggi University)
Park, Seohee (KT)
Lee, Woosik (SSiS)

Publication Information

Journal of Internet Computing and Services / v.21, no.5, 2020 , pp. 1-7 More about this Journal

Abstract

Deep neural networks(DNN), which are used as approximation functions in reinforcement learning (RN), theoretically can be attributed to realistic results. In empirical benchmark works, time difference learning (TD) shows better results than Monte-Carlo learning (MC). However, among some previous works show that MC is better than TD when the reward is very rare or delayed. Also, another recent research shows when the information observed by the agent from the environment is partial on complex control works, it indicates that the MC prediction is superior to the TD-based methods. Most of these environments can be regarded as 5-step Q-learning or 20-step Q-learning, where the experiment continues without long roll-outs for alleviating reduce performance degradation. In other words, for networks with a noise, a representative network that is regardless of the controlled roll-outs, it is better to learn MC, which is robust to noisy rewards than TD, or almost identical to MC. These studies provide a break with that TD is better than MC. These recent research results show that the way combining MC and TD is better than the theoretical one. Therefore, in this study, based on the results shown in previous studies, we attempt to exploit a random balance with a mixture of TD and MC in RL without any complicated formulas by rewards used in those studies do. Compared to the DQN using the MC and TD random mixture and the well-known DQN using only the TD-based learning, we demonstrate that a well-performed TD learning are also granted special favor of the mixture of TD and MC through an experiments in OpenAI Gym.

Keywords

Deep Q-Network; Temporal Difference; Monte Carlo; Reinforcement Learning; Variation and Bias Balance;

Citations & Related Records

Times Cited By KSCI : 4 (Citation Analysis)

Reference
Cited By KSCI

1	D. Silver, A. Huang, C. J. Maddison, A.Guez, L.t Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, Vol 529, No. 7587, pp. 484-489, 2016. https://doi.org/10.1038/nature16961 DOI
2	R. S. Sutton, A. G. Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998. https://doi.org/10.1016/S1364-6613(99)01331-5
3	Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." NIPS 2013. http://www.cs.toronto.edu/-vmnih/docs/dqn.pdf
4	A. Amiranashvili, A. Dosovitskiy, V. Koltun and T. Brox, TD OR NOT TD: Analyzing The Role Of Temporal Differencing In Deep Reinforcement Learning, ICLR 2018. http://arxiv.org/abs/1806.01175
5	S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, S. Levine, Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, ICLR 2017. http://arxiv.org/abs/1611.02247
6	T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, ICLR 2016. https://arxiv.org/abs/1509.02971
7	V. Nair and G. E. Hinton, Rectified Linear Units Improve Restricted Boltzmann Machines, ICML 2010. https://www.cs.toronto.edu/-hinton/absps/reluICML.pdf
8	OpenAI Gym: https://gym.openai.com
9	Cart-Pole-V0: https://github.com/openai/gym/wiki/Cart-Pole-v0
10	Cart-Pole-DQN: https://github.com/rlcode/reinforcement-learning-kr/blob/master/2-cartpole/1-dqn/cartpole_dqn.py, 8 Jul. 2017.
11	Tensorflow: https://github.com/tensorflow/tensorflow, 31 Oct. 2019.
12	Keras : https://keras.io/api/ Oct. 2019.
13	G. Sun, G. O. Boateng, H. Huang and W. Jiang, "A Reinforcement Learning Framework for Autonomous Cell Activation and Customized Energy-Efficient Resource Allocation in C-RANs," KSII Transactions on Internet and Information Systems, vol. 13, no. 8, pp. 3821-3841, 2019. https://doi.org/10.3837/tiis.2019.08.001 DOI
14	R. Mu and X. Zeng, "A Review of Deep Learning Research," KSII Transactions on Internet and Information Systems, vol. 13, no. 4, pp. 1738-1764, 2019. https://doi.org/10.3837/tiis.2019.04.001 DOI

KSCI

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity 오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity