강화학습 이론의 신경과학적 고찰

  • Published : 2018.01.25

Abstract

Keywords

References

  1. R. S. Sutton and A. G. Barto, Reinforcement Learning. MIT press, 1998.
  2. V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. https://doi.org/10.1038/nature14236
  3. D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016. https://doi.org/10.1038/nature16961
  4. D. Silver et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, Oct. 2017. https://doi.org/10.1038/nature24270
  5. J. P. O'Doherty, S. W. Lee, and D. McNamee, "The structure of reinforcement-learning mechanisms in the human brain," Curr. Opin. Behav. Sci., vol. 1, pp. 94-100, Oct. 2014.
  6. D. P. Bertsekas, Dynamic programming and optimal control. Athena Scientific, 2005.
  7. M. L. Puterman, Markov decision processes : discrete stochastic dynamic programming. Wiley-Interscience, 2005.
  8. R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation." pp. 1057-1063, 2000.
  9. D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. JMLR.org, p. I-387, 2014.
  10. W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science (80-. )., vol. 275, pp. 1593-1599, 1997. https://doi.org/10.1126/science.275.5306.1593
  11. C. D. Fiorillo, P. N. Tobler, and W. Schultz, "Discrete coding of reward probability and uncertainty by dopamine neurons.," Science, vol. 299, no. 5614, pp. 1898-902, Mar. 2003. https://doi.org/10.1126/science.1077349
  12. B. W. Balleine and J. P. O'Doherty, "Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action.," Neuropsychopharmacology, vol. 35, no. 1, pp. 48-69, Jan. 2010. https://doi.org/10.1038/npp.2009.131
  13. N. D. Daw, Y. Niv, and P. Dayan, "Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control," Nat. Neurosci., vol. 8, pp. 1704-1711, 2005. https://doi.org/10.1038/nn1560
  14. P. D. Mate Lengyel, "Hippocampal Contributions to Control: The Third Way," in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 889-896.
  15. S. a Sheth et al., "Human dorsal anterior cingulate cortex neurons mediate ongoing behavioural adaptation.," Nature, pp. 3-7, Jun. 2012.
  16. J. Glascher, N. Daw, P. Dayan, and J. P. O'Doherty, "States versus Rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning," Neuron, vol. 66, no. 4, pp. 585-95, May 2010. https://doi.org/10.1016/j.neuron.2010.04.016
  17. N. D. Daw, S. J. Gershman, B. Seymour, P. Dayan, and R. J. Dolan, "Model-based influences on humans' choices and striatal prediction errors.," Neuron, vol. 69, no. 6, pp. 1204-15, Mar. 2011. https://doi.org/10.1016/j.neuron.2011.02.027
  18. S. W. Lee, S. Shimojo, and J. P. O'Doherty, "Neural Computations Underlying Arbitration between Model-Based and Model-free Learning," Neuron, vol. 81, no. 3, pp. 687-699, Feb. 2014. https://doi.org/10.1016/j.neuron.2013.11.028
  19. E. Tricomi, B. W. Balleine, and J. P. O'Doherty, "A specific role for posterior dorsolateral striatum in human habit learning," Eur. J. Neurosci., vol. 29, pp. 2225-2232, 2009.
  20. K. Wunderlich, P. Dayan, and R. J. Dolan, "Mapping value based planning and extensively trained choices in the human brain," Nat. Neurosci., vol. 15, pp. 786-791, 2012. https://doi.org/10.1038/nn.3068
  21. E. D. Boorman, T. E. Behrens, M. W. Woolrich, and M. F. S. Rushworth, "How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action," Neuron, vol. 62, pp. 733-743, 2009. https://doi.org/10.1016/j.neuron.2009.05.014
  22. T. a Hare, C. F. Camerer, and A. Rangel, "Self-control in decision-making involves modulation of the vmPFC valuation system," Science (80-. )., vol. 324, pp. 646-648, 2009. https://doi.org/10.1126/science.1168450
  23. M. F. S. Rushworth, M. P. Noonan, E. D. Boorman, M. E. Walton, and T. E. Behrens, "Frontal Cortex and Reward-Guided Learning and Decision-Making," Neuron, vol. 70, pp. 1054-1069, 2011. https://doi.org/10.1016/j.neuron.2011.05.014