Browse > Article
http://dx.doi.org/10.7471/ikeee.2021.25.4.644

Improved Deep Q-Network Algorithm Using Self-Imitation Learning  

Sunwoo, Yung-Min (Dept. of Smart Robot Convergence and Application Engineering, Pukyong National University)
Lee, Won-Chang (Dept. of Electronic Engineering, Pukyong National University)
Publication Information
Journal of IKEEE / v.25, no.4, 2021 , pp. 644-649 More about this Journal
Abstract
Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.
Keywords
Self-Imitation Learning; Actor-Critic Algorithm; Optimal Policy; Reinforcement Learning; Deep Q-Network;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Kuleshov, Volodymyr, and Doina Precup. "Algorithms for multi-armed bandit problems," arXiv preprint arXiv:1402.6028, 2014.
2 Oh, Junhyuk, et al. "Self-imitation learning," International Conference on Machine Learning. PMLR, 2018.
3 Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning," Nature, Vol.0518, pp.529-533, 2015.   DOI
4 Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol.30. No.1. 2016.
5 Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
6 Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An introduction," MIT press, 2018.
7 https://gym.openai.com/docs/
8 Andrychowicz, Marcin, et al. "Hindsight experience replay." arXiv preprint arXiv:1707.01495, 2017.
9 Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv: 1707.06347, 2017.
10 Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning, Vol.8, No.3-4, pp.279-292, 1992.   DOI
11 Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
12 Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295, 2017.
13 Schaul, Tom, et al. "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.