강화학습 이론의 신경과학적 고찰

Lee, Sang-Wan;

Communications of the Korean Institute of Information Scientists and Engineers (정보과학회지)

Volume 36 Issue 1
/
Pages.8-16
/
2018
/
1229-6821(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

강화학습 이론의 신경과학적 고찰

Lee, Sang-Wan (KAIST)

이상완

Published : 2018.01.25

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Keywords

References

R. S. Sutton and A. G. Barto, Reinforcement Learning. MIT press, 1998.
V. Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. https://doi.org/10.1038/nature14236
D. Silver et al., "Mastering the game of Go with deep neural networks and tree search," Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016. https://doi.org/10.1038/nature16961
D. Silver et al., "Mastering the game of Go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, Oct. 2017. https://doi.org/10.1038/nature24270
J. P. O'Doherty, S. W. Lee, and D. McNamee, "The structure of reinforcement-learning mechanisms in the human brain," Curr. Opin. Behav. Sci., vol. 1, pp. 94-100, Oct. 2014.
D. P. Bertsekas, Dynamic programming and optimal control. Athena Scientific, 2005.
M. L. Puterman, Markov decision processes : discrete stochastic dynamic programming. Wiley-Interscience, 2005.
R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy Gradient Methods for Reinforcement Learning with Function Approximation." pp. 1057-1063, 2000.
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, "Deterministic policy gradient algorithms," Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. JMLR.org, p. I-387, 2014.
W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science (80-. )., vol. 275, pp. 1593-1599, 1997. https://doi.org/10.1126/science.275.5306.1593
C. D. Fiorillo, P. N. Tobler, and W. Schultz, "Discrete coding of reward probability and uncertainty by dopamine neurons.," Science, vol. 299, no. 5614, pp. 1898-902, Mar. 2003. https://doi.org/10.1126/science.1077349
B. W. Balleine and J. P. O'Doherty, "Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action.," Neuropsychopharmacology, vol. 35, no. 1, pp. 48-69, Jan. 2010. https://doi.org/10.1038/npp.2009.131
N. D. Daw, Y. Niv, and P. Dayan, "Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control," Nat. Neurosci., vol. 8, pp. 1704-1711, 2005. https://doi.org/10.1038/nn1560
P. D. Mate Lengyel, "Hippocampal Contributions to Control: The Third Way," in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 889-896.
S. a Sheth et al., "Human dorsal anterior cingulate cortex neurons mediate ongoing behavioural adaptation.," Nature, pp. 3-7, Jun. 2012.
J. Glascher, N. Daw, P. Dayan, and J. P. O'Doherty, "States versus Rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning," Neuron, vol. 66, no. 4, pp. 585-95, May 2010. https://doi.org/10.1016/j.neuron.2010.04.016
N. D. Daw, S. J. Gershman, B. Seymour, P. Dayan, and R. J. Dolan, "Model-based influences on humans' choices and striatal prediction errors.," Neuron, vol. 69, no. 6, pp. 1204-15, Mar. 2011. https://doi.org/10.1016/j.neuron.2011.02.027
S. W. Lee, S. Shimojo, and J. P. O'Doherty, "Neural Computations Underlying Arbitration between Model-Based and Model-free Learning," Neuron, vol. 81, no. 3, pp. 687-699, Feb. 2014. https://doi.org/10.1016/j.neuron.2013.11.028
E. Tricomi, B. W. Balleine, and J. P. O'Doherty, "A specific role for posterior dorsolateral striatum in human habit learning," Eur. J. Neurosci., vol. 29, pp. 2225-2232, 2009.
K. Wunderlich, P. Dayan, and R. J. Dolan, "Mapping value based planning and extensively trained choices in the human brain," Nat. Neurosci., vol. 15, pp. 786-791, 2012. https://doi.org/10.1038/nn.3068
E. D. Boorman, T. E. Behrens, M. W. Woolrich, and M. F. S. Rushworth, "How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action," Neuron, vol. 62, pp. 733-743, 2009. https://doi.org/10.1016/j.neuron.2009.05.014
T. a Hare, C. F. Camerer, and A. Rangel, "Self-control in decision-making involves modulation of the vmPFC valuation system," Science (80-. )., vol. 324, pp. 646-648, 2009. https://doi.org/10.1126/science.1168450
M. F. S. Rushworth, M. P. Noonan, E. D. Boorman, M. E. Walton, and T. E. Behrens, "Frontal Cortex and Reward-Guided Learning and Decision-Making," Neuron, vol. 70, pp. 1054-1069, 2011. https://doi.org/10.1016/j.neuron.2011.05.014

Communications of the Korean Institute of Information Scientists and Engineers (정보과학회지)

강화학습 이론의 신경과학적 고찰

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)