References
- R. S. Sutton and A. G. Barto, Reinforcement Learning. MIT press, 1998.
- V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. https://doi.org/10.1038/nature14236
- D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016. https://doi.org/10.1038/nature16961
- D. Silver et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, Oct. 2017. https://doi.org/10.1038/nature24270
- J. P. O'Doherty, S. W. Lee, and D. McNamee, "The structure of reinforcement-learning mechanisms in the human brain," Curr. Opin. Behav. Sci., vol. 1, pp. 94-100, Oct. 2014.
- D. P. Bertsekas, Dynamic programming and optimal control. Athena Scientific, 2005.
- M. L. Puterman, Markov decision processes : discrete stochastic dynamic programming. Wiley-Interscience, 2005.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation." pp. 1057-1063, 2000.
- D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. JMLR.org, p. I-387, 2014.
- W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science (80-. )., vol. 275, pp. 1593-1599, 1997. https://doi.org/10.1126/science.275.5306.1593
- C. D. Fiorillo, P. N. Tobler, and W. Schultz, "Discrete coding of reward probability and uncertainty by dopamine neurons.," Science, vol. 299, no. 5614, pp. 1898-902, Mar. 2003. https://doi.org/10.1126/science.1077349
- B. W. Balleine and J. P. O'Doherty, "Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action.," Neuropsychopharmacology, vol. 35, no. 1, pp. 48-69, Jan. 2010. https://doi.org/10.1038/npp.2009.131
- N. D. Daw, Y. Niv, and P. Dayan, "Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control," Nat. Neurosci., vol. 8, pp. 1704-1711, 2005. https://doi.org/10.1038/nn1560
- P. D. Mate Lengyel, "Hippocampal Contributions to Control: The Third Way," in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 889-896.
- S. a Sheth et al., "Human dorsal anterior cingulate cortex neurons mediate ongoing behavioural adaptation.," Nature, pp. 3-7, Jun. 2012.
- J. Glascher, N. Daw, P. Dayan, and J. P. O'Doherty, "States versus Rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning," Neuron, vol. 66, no. 4, pp. 585-95, May 2010. https://doi.org/10.1016/j.neuron.2010.04.016
- N. D. Daw, S. J. Gershman, B. Seymour, P. Dayan, and R. J. Dolan, "Model-based influences on humans' choices and striatal prediction errors.," Neuron, vol. 69, no. 6, pp. 1204-15, Mar. 2011. https://doi.org/10.1016/j.neuron.2011.02.027
- S. W. Lee, S. Shimojo, and J. P. O'Doherty, "Neural Computations Underlying Arbitration between Model-Based and Model-free Learning," Neuron, vol. 81, no. 3, pp. 687-699, Feb. 2014. https://doi.org/10.1016/j.neuron.2013.11.028
- E. Tricomi, B. W. Balleine, and J. P. O'Doherty, "A specific role for posterior dorsolateral striatum in human habit learning," Eur. J. Neurosci., vol. 29, pp. 2225-2232, 2009.
- K. Wunderlich, P. Dayan, and R. J. Dolan, "Mapping value based planning and extensively trained choices in the human brain," Nat. Neurosci., vol. 15, pp. 786-791, 2012. https://doi.org/10.1038/nn.3068
- E. D. Boorman, T. E. Behrens, M. W. Woolrich, and M. F. S. Rushworth, "How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action," Neuron, vol. 62, pp. 733-743, 2009. https://doi.org/10.1016/j.neuron.2009.05.014
- T. a Hare, C. F. Camerer, and A. Rangel, "Self-control in decision-making involves modulation of the vmPFC valuation system," Science (80-. )., vol. 324, pp. 646-648, 2009. https://doi.org/10.1126/science.1168450
- M. F. S. Rushworth, M. P. Noonan, E. D. Boorman, M. E. Walton, and T. E. Behrens, "Frontal Cortex and Reward-Guided Learning and Decision-Making," Neuron, vol. 70, pp. 1054-1069, 2011. https://doi.org/10.1016/j.neuron.2011.05.014