Improved Deep Q-Network Algorithm Using Self-Imitation Learning

Sunwoo, Yung-Min;Lee, Won-Chang;

doi:10.7471/ikeee.2021.25.4.644

Journal of IKEEE (전기전자학회논문지)

Volume 25 Issue 4
/
Pages.644-649
/
2021
/
1226-7244(pISSN)
/
2288-243X(eISSN)

Institute of Korean Electrical and Electronics Engineers (한국전기전자학회)

DOI QR Code

Improved Deep Q-Network Algorithm Using Self-Imitation Learning

Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘

Sunwoo, Yung-Min (Dept. of Smart Robot Convergence and Application Engineering, Pukyong National University) ;
Lee, Won-Chang (Dept. of Electronic Engineering, Pukyong National University)

선우영민 ;
이원창

Received : 2021.12.01
Accepted : 2021.12.13
Published : 2021.12.31

https://doi.org/10.7471/ikeee.2021.25.4.644 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Self-Imitation Learning is a simple off-policy actor-critic algorithm that makes an agent find an optimal policy by using past good experiences. In case that Self-Imitation Learning is combined with reinforcement learning algorithms that have actor-critic architecture, it shows performance improvement in various game environments. However, its applications are limited to reinforcement learning algorithms that have actor-critic architecture. In this paper, we propose a method of applying Self-Imitation Learning to Deep Q-Network which is a value-based deep reinforcement learning algorithm and train it in various game environments. We also show that Self-Imitation Learning can be applied to Deep Q-Network to improve the performance of Deep Q-Network by comparing the proposed algorithm and ordinary Deep Q-Network training results.

Self-Imitation Learning은 간단한 비활성 정책 actor-critic 알고리즘으로써 에이전트가 과거의 좋은 경험을 활용하여 최적의 정책을 찾을 수 있도록 해준다. 그리고 actor-critic 구조를 갖는 강화학습 알고리즘에 결합되어 다양한 환경들에서 알고리즘의 상당한 개선을 보여주었다. 하지만 Self-Imitation Learning이 강화학습에 큰 도움을 준다고 하더라도 그 적용 분야는 actor-critic architecture를 가지는 강화학습 알고리즘으로 제한되어 있다. 본 논문에서 Self-Imitation Learning의 알고리즘을 가치 기반 강화학습 알고리즘인 DQN에 적용하는 방법을 제안하고, Self-Imitation Learning이 적용된 DQN 알고리즘의 학습을 다양한 환경에서 진행한다. 아울러 그 결과를 기존의 결과와 비교함으로써 Self-Imitation Leaning이 DQN에도 적용될 수 있으며 DQN의 성능을 개선할 수 있음을 보인다.

Keywords

References

Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An introduction," MIT press, 2018.
Kuleshov, Volodymyr, and Doina Precup. "Algorithms for multi-armed bandit problems," arXiv preprint arXiv:1402.6028, 2014.
Oh, Junhyuk, et al. "Self-imitation learning," International Conference on Machine Learning. PMLR, 2018.
Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning," Nature, Vol.0518, pp.529-533, 2015. https://doi.org/10.1038/nature14236
https://gym.openai.com/docs/
Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning, Vol.8, No.3-4, pp.279-292, 1992. https://doi.org/10.1007/BF00992698
Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol.30. No.1. 2016.
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
Schaul, Tom, et al. "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.
Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295, 2017.
Andrychowicz, Marcin, et al. "Hindsight experience replay." arXiv preprint arXiv:1707.01495, 2017.
Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv: 1707.06347, 2017.

Journal of IKEEE (전기전자학회논문지)

Improved Deep Q-Network Algorithm Using Self-Imitation Learning

Self-Imitation Learning을 이용한 개선된 Deep Q-Network 알고리즘

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)