[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7471/ikeee.2021.25.4.644

Improved Deep Q-Network Algorithm Using Self-Imitation Learning

Sunwoo, Yung-Min (Dept. of Smart Robot Convergence and Application Engineering, Pukyong National University)
Lee, Won-Chang (Dept. of Electronic Engineering, Pukyong National University)

Publication Information

Journal of IKEEE / v.25, no.4, 2021 , pp. 644-649 More about this Journal

Abstract

Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Keywords

Self-Imitation Learning; Actor-Critic Algorithm; Optimal Policy; Reinforcement Learning; Deep Q-Network;

Citations & Related Records

Reference

1	Kuleshov, Volodymyr, and Doina Precup. "Algorithms for multi-armed bandit problems," arXiv preprint arXiv:1402.6028, 2014.
2	Oh, Junhyuk, et al. "Self-imitation learning," International Conference on Machine Learning. PMLR, 2018.
3	Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning," Nature, Vol.0518, pp.529-533, 2015. DOI
4	Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol.30. No.1. 2016.
5	Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
6	Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An introduction," MIT press, 2018.
7	https://gym.openai.com/docs/
8	Andrychowicz, Marcin, et al. "Hindsight experience replay." arXiv preprint arXiv:1707.01495, 2017.
9	Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv: 1707.06347, 2017.
10	Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning, Vol.8, No.3-4, pp.279-292, 1992. DOI
11	Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
12	Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295, 2017.
13	Schaul, Tom, et al. "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.

KSCI

Improved Deep Q-Network Algorithm Using Self-Imitation Learning Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘

Improved Deep Q-Network Algorithm Using Self-Imitation Learning