Browse > Article
http://dx.doi.org/10.3745/KTSDE.2022.11.8.331

Evaluating SR-Based Reinforcement Learning Algorithm Under the Highly Uncertain Decision Task  

Kim, So Hyeon (상명대학교 지능정보공학과)
Lee, Jee Hang (상명대학교 휴먼지능정보공학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.11, no.8, 2022 , pp. 331-338 More about this Journal
Abstract
Successor representation (SR) is a model of human reinforcement learning (RL) mimicking the underlying mechanism of hippocampal cells constructing cognitive maps. SR utilizes these learned features to adaptively respond to the frequent reward changes. In this paper, we evaluated the performance of SR under the context where changes in latent variables of environments trigger the reward structure changes. For a benchmark test, we adopted SR-Dyna, an integration of SR into goal-driven Dyna RL algorithm in the 2-stage Markov Decision Task (MDT) in which we can intentionally manipulate the latent variables - state transition uncertainty and goal-condition. To precisely investigate the characteristics of SR, we conducted the experiments while controlling each latent variable that affects the changes in reward structure. Evaluation results showed that SR-Dyna could learn to respond to the reward changes in relation to the changes in latent variables, but could not learn rapidly in that situation. This brings about the necessity to build more robust RL models that can rapidly learn to respond to the frequent changes in the environment in which latent variables and reward structure change at the same time.
Keywords
SR Based Reinforcement Learning Algorithm; 2-Stage Markov Decision Task; State Transition Probability; Reward Function;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. P. O'Doherty, S. W. Lee, and D. McNamee, "The structure of reinforcement-learning mechanisms in the human brain," Current Opinion in Behavioral Sciences, Vol.1, pp.94-100, 2014.
2 W. Dabney, G. Ostrovski, D. Silver, and R.Munos, "Implicit quantile networks for distributional reinforcement learning," In: International Conference on Machine Learning, PMLR, pp.1096-1105, 2018.
3 D. Hassabis, D. Kumaran, C. Summerfield, and M.Botvinick, "Neuroscience-inspired artificial intelligence," Neuron, Vol.95, No.2, pp.245-258, 2017.   DOI
4 K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman, "The hippocampus as a predictive map," Nature Neuroscience, Vol.20, No.11, pp.1643-1653, 2017.   DOI
5 R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," MIT press, 2018.
6 D. Silver, et al., "Mastering the game of go without human knowledge," Nature, Vol.550, No.7676, pp.354-359, 2017.   DOI
7 J. Schrittwieser, et al., "Mastering atari, go, chess and shogi by planning with a learned model," Nature, Vol.588, No.7839, pp.604-609, 2020.   DOI
8 S. W. Lee, S. Shimojo, and J. P. O'Doherty, "Neural computations underlying arbitration between model-based and model-free learning," Neuron, Vol.81, No.3, pp.687-699, 2014.   DOI
9 R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, Vol.3, No.1, pp.9-44, 1988.   DOI
10 E. M. Russek, I. Momennejad, M. M. Botvinick, S. J. Gershman, and N. D. Daw, "Predictive representations can link model-based reinforcement learning to model-free mechanisms," PLoS Computational Biology, Vol.13, No.9, pp.e1005768, 2017.
11 R. S. Sutton, "Dyna, an integrated architecture for learning, planning, and reacting," ACM Sigart Bulletin, Vol.2, No.4, pp.160-163, 1991.   DOI
12 J. X. Wang, et al., "Learning to reinforcement learn," arXiv preprint arXiv:1611.05763, 2016.
13 S.-H. Kim, and J. H. Lee, "Evaluating a successor representation-based reinforcement learning algorithm in the 2-stage Markov decision task," In: Proceedings of the Korea Information Processing Society Conference, Korea Information Processing Society, pp.910-913, 2021.
14 D. Silver, et al., "Mastering the game of go with deep neural networks and tree search," Nature, Vol.529, No.7587, pp.484-489, 2016.   DOI
15 J. H. Lee, B. Seymour, J. Z. Leibo, S. J. Lee, and S. W. Lee, "Toward high-performance, memory-efficient, and fast reinforcement learning-Lessons from decision neuro-science," Science Robotics, Vol.4, No.26, pp.eaav2975, 2019.
16 J. X. Wang, et al., "Prefrontal cortex as a meta-reinforcement learning system," Nature Neuroscience, Vol.21, No.6, pp.860-868, 2018.   DOI
17 S. J. Gershman, "The successor representation: Its computational logic and neural substrates," Journal of Neuro-scence, Vol.38, No.33, pp.7193-7200, 2018.   DOI
18 G. Farquhar, et al., "Self-Consistent Models and Values," Advances in Neural Information Processing Systems, Vol.34, pp.1111-1125, 2021
19 I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick, N. D. Daw, and S. J. Gershman, "The successor representation in human reinforcement learning," Nature Human Behaviour, Vol.1, No.9, pp.680-692, 2017.   DOI
20 E. C. Tolman, "Cognitive maps in rats and men," Psychological Review, Vol.55, No.4, pp.189, 1948.