1 |
Kuleshov, Volodymyr, and Doina Precup. "Algorithms for multi-armed bandit problems," arXiv preprint arXiv:1402.6028, 2014.
|
2 |
Oh, Junhyuk, et al. "Self-imitation learning," International Conference on Machine Learning. PMLR, 2018.
|
3 |
Mnih, V., Kavukcuoglu, K., Silver, D. et al. "Human-level control through deep reinforcement learning," Nature, Vol.0518, pp.529-533, 2015.
DOI
|
4 |
Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol.30. No.1. 2016.
|
5 |
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
|
6 |
Sutton, Richard S., and Andrew G. Barto. "Reinforcement learning: An introduction," MIT press, 2018.
|
7 |
https://gym.openai.com/docs/
|
8 |
Andrychowicz, Marcin, et al. "Hindsight experience replay." arXiv preprint arXiv:1707.01495, 2017.
|
9 |
Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv: 1707.06347, 2017.
|
10 |
Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning, Vol.8, No.3-4, pp.279-292, 1992.
DOI
|
11 |
Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
|
12 |
Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295, 2017.
|
13 |
Schaul, Tom, et al. "Prioritized experience replay," arXiv preprint arXiv:1511.05952, 2015.
|