Acknowledgement
이 논문은 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. 2020R1G1A1102683). 본 연구는 삼성미래기술육성센터의 지원을 받아 수행하였음(No. SRFC-TC1603-52).
References
- R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," MIT press, 2018.
- D. Silver, et al., "Mastering the game of go with deep neural networks and tree search," Nature, Vol.529, No.7587, pp.484-489, 2016. https://doi.org/10.1038/nature16961
- D. Silver, et al., "Mastering the game of go without human knowledge," Nature, Vol.550, No.7676, pp.354-359, 2017. https://doi.org/10.1038/nature24270
- J. Schrittwieser, et al., "Mastering atari, go, chess and shogi by planning with a learned model," Nature, Vol.588, No.7839, pp.604-609, 2020. https://doi.org/10.1038/s41586-020-03051-4
- J. H. Lee, B. Seymour, J. Z. Leibo, S. J. Lee, and S. W. Lee, "Toward high-performance, memory-efficient, and fast reinforcement learning-Lessons from decision neuro-science," Science Robotics, Vol.4, No.26, pp.eaav2975, 2019.
- S. W. Lee, S. Shimojo, and J. P. O'Doherty, "Neural computations underlying arbitration between model-based and model-free learning," Neuron, Vol.81, No.3, pp.687-699, 2014. https://doi.org/10.1016/j.neuron.2013.11.028
- J. P. O'Doherty, S. W. Lee, and D. McNamee, "The structure of reinforcement-learning mechanisms in the human brain," Current Opinion in Behavioral Sciences, Vol.1, pp.94-100, 2014.
- J. X. Wang, et al., "Prefrontal cortex as a meta-reinforcement learning system," Nature Neuroscience, Vol.21, No.6, pp.860-868, 2018. https://doi.org/10.1038/s41593-018-0147-8
- W. Dabney, G. Ostrovski, D. Silver, and R.Munos, "Implicit quantile networks for distributional reinforcement learning," In: International Conference on Machine Learning, PMLR, pp.1096-1105, 2018.
- D. Hassabis, D. Kumaran, C. Summerfield, and M.Botvinick, "Neuroscience-inspired artificial intelligence," Neuron, Vol.95, No.2, pp.245-258, 2017. https://doi.org/10.1016/j.neuron.2017.06.011
- S.-H. Kim, and J. H. Lee, "Evaluating a successor representation-based reinforcement learning algorithm in the 2-stage Markov decision task," In: Proceedings of the Korea Information Processing Society Conference, Korea Information Processing Society, pp.910-913, 2021.
- K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman, "The hippocampus as a predictive map," Nature Neuroscience, Vol.20, No.11, pp.1643-1653, 2017. https://doi.org/10.1038/nn.4650
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, Vol.3, No.1, pp.9-44, 1988. https://doi.org/10.1007/BF00115009
- S. J. Gershman, "The successor representation: Its computational logic and neural substrates," Journal of Neuro-scence, Vol.38, No.33, pp.7193-7200, 2018. https://doi.org/10.1523/JNEUROSCI.0151-18.2018
- I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman, "The successor representation in human reinforcement learning," Nature Human Behaviour, Vol.1, No.9, pp.680-692, 2017. https://doi.org/10.1038/s41562-017-0180-8
- E. M. Russek, I. Momennejad, M. M. Botvinick, S. J. Gershman, and N. D. Daw, "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLoS Computational Biology, Vol.13, No.9, pp.e1005768, 2017.
- E. C. Tolman, "Cognitive maps in rats and men," Psychological Review, Vol.55, No.4, pp.189, 1948.
- R. S. Sutton, "Dyna, an integrated architecture for learning, planning, and reacting," ACM Sigart Bulletin, Vol.2, No.4, pp.160-163, 1991. https://doi.org/10.1145/122344.122377
- J. X. Wang, et al., "Learning to reinforcement learn," arXiv preprint arXiv:1611.05763, 2016.
- G. Farquhar, et al., "Self-Consistent Models and Values," Advances in Neural Information Processing Systems, Vol.34, pp.1111-1125, 2021