DOI QR코드

DOI QR Code

Improved Deep Q-Network Algorithm Using Self-Imitation Learning

Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘

  • Sunwoo, Yung-Min (Dept. of Smart Robot Convergence and Application Engineering, Pukyong National University) ;
  • Lee, Won-Chang (Dept. of Electronic Engineering, Pukyong National University)
  • Received : 2021.12.01
  • Accepted : 2021.12.13
  • Published : 2021.12.31

Abstract

Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Self-Imitation Learning은 간단한 비활성 정책 actor-critic 알고리즘으로써 에이전트가 과거의 좋은 경험을 활용하여 최적의 정책을 찾을 수 있도록 해준다. 그리고 actor-critic 구조를 갖는 강화학습 알고리즘에 결합되어 다양한 환경들에서 알고리즘의 상당한 개선을 보여주었다. 하지만 Self-Imitation Learning이 강화학습에 큰 도움을 준다고 하더라도 그 적용 분야는 actor-critic architecture를 가지는 강화학습 알고리즘으로 제한되어 있다. 본 논문에서 Self-Imitation Learning의 알고리즘을 가치 기반 강화학습 알고리즘인 DQN에 적용하는 방법을 제안하고, Self-Imitation Learning이 적용된 DQN 알고리즘의 학습을 다양한 환경에서 진행한다. 아울러 그 결과를 기존의 결과와 비교함으로써 Self-Imitation Leaning이 DQN에도 적용될 수 있으며 DQN의 성능을 개선할 수 있음을 보인다.

Keywords

References

  1. Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An introduction," MIT press, 2018.
  2. Kuleshov, Volodymyr, and Doina Precup. "Algorithms for multi-armed bandit problems," arXiv preprint arXiv:1402.6028, 2014.
  3. Oh, Junhyuk, et al. "Self-imitation learning," International Conference on Machine Learning. PMLR, 2018.
  4. Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning," Nature, Vol.0518, pp.529-533, 2015. https://doi.org/10.1038/nature14236
  5. https://gym.openai.com/docs/
  6. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning, Vol.8, No.3-4, pp.279-292, 1992. https://doi.org/10.1007/BF00992698
  7. Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol.30. No.1. 2016.
  8. Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
  9. Schaul, Tom, et al. "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.
  10. Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295, 2017.
  11. Andrychowicz, Marcin, et al. "Hindsight experience replay." arXiv preprint arXiv:1707.01495, 2017.
  12. Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
  13. Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv: 1707.06347, 2017.