Acknowledgement
이 논문은 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(No. 2018R1A6A1A03025526 및 No. NRF-2020R1I1A3065610).
References
- R. S. Sutton, "Learning to Predict by the Methods of Temporal Differences," Machine Learning, Vol.3, No.1, pp.9-44, 1988. https://doi.org/10.1007/BF00115009
- H. Seijen and R. Sutton, "True Online TD(λ)," In Proceedings of the 31st International Conference on Machine Learning, vol. 32 of Proceedings of Machine Learning Research, pp.692-700. PMLR, Bejing, China, 2014.
- K. D. Asis, J. Hernandez-Garcia, G. Holland, et al., "Multi-Step Reinforcement Learning: A Unifying Algorithm," In Association for the Advancement of Artificial Intelligence, 2018.
- S. L. Chen, H. Z. Wu, X. L. Han, and L. Xiao, "Multi-Step Truncated Q Learning Algorithm," In 2005 International Conference on Machine Learning and Cybernetics, Vol.1, pp.194-198. 2005.
- K. De Asis and R. Sutton, "Per-decision Multi-step Temporal Difference Learning with Control Variates," arXiv:1807.01830, 2018.
- A. R. Mahmood, H. Yu, and R. Sutton, "Multi-step Off-policy Learning without Importance Sampling Ratios," arXiv: 1702.03006, 2017.
- L. Yang, M. Shi, Q. Zheng, W. Meng, and G. Pan, "A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning," In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp.2984-2990, 2018.
- C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, Vol.8, No.3, pp.279-292, 1992. https://doi.org/10.1007/BF00992698
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with Deep Reinforcement Learning," Cite arXiv:1312.5602Comment: NIPS Deep Learning Workshop 2013.
- V. Mnih, et al., "Human-level Control through Deep Reinforcement Learning," Nature, Vol.518, pp.529-533, 2015. https://doi.org/10.1038/nature14236
- S. Thrun A. and Schwartz, "Issues in using function approximation for reinforcement learning," In Proceedings of the Fourth Connectionist Models Summer School, Erlbaum, 1993.
- H. van Hasselt, "Double Q-learning," In Advances in Neural Information Processing Systems, Vol.23, pp.2613-2621, 2010.
- H. van Hasselt, A. Guez, and D. Silver, "Deep Reinforcement Learning with Double Q-learning," In Association for the Advancement of Artificial Intelligence, 2016.
- R. S. Sutton and A. G. Barto, "Reinforcement Learning: An Introduction," The MIT Press, Second Edn., 2018.
- J. Peng and R. J. Williams, "Incremental Multi-Step Q-Learning," Machine Learning, Vol.22, No.1, pp.283-290, 1996. https://doi.org/10.1007/BF00114731
- M. Hessel, et al., "Rainbow: Combining Improvements in Deep Reinforcement Learning," In Association for the Advancement of Artificial Intelligence, pp.3215-3222, AAAI Press, 2018.
- C. J. C. H. Watkins, "Learning from delayed rewards," (Doctoral dissertation, Cambridge University).
- Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, "Dueling Network Architectures for Deep Reinforcement Learning," In Proceedings of The 33rd International Conference on Machine Learning, in PMLR 48:1995-2003, 2016.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," 2015. Cite arXiv:1511.05952Comment: Published at ICLR 2016.
- S. J. Bradtke and M. O. Duff, "Reinforcement learning methods for continuous-time Markov decision problems," In Proceedings of the 7th International Conference on Neural Information Processing Systems, MIT Press, Cambridge, MA, USA, pp.393-400, 1994.
- J. Hernandez-Garcia and R. Sutton, "Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target," 2019. Cite arXiv:1901.07510Comment: NIPS Deep Learning Workshop 2018.
- M. J. Kearns and S. P. Singh, "Bias-variance error bounds for temporal difference updates," In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp.142-147. San Francisco, CA, USA, 2000.
- D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. Hasselt, and D. Silver, "Distributed prioritized experience replay," 2018. Cite arXiv:1803.00933Comment: Published at ICLR 2018.
- Q. Lan, Y. Pan, A. Fyshe, M. White, "Maxmin Q-learning: Controlling the Estimation Bias of Q-learning," 2020. Cite arXiv:2002.06487Comment: Published at ICLR 2020.
- OpenAI. OpenAI Gym Docs [Internet], https://gym.openai.com/docs/. Accessed: 2020-11-20.