A distributional code for value in dopamine-based reinforcement learning

doi:10.1038/s41586-019-1924-6

GSTDTAP > 地球科学

DOI	10.1038/s41586-019-1924-6
	A distributional code for value in dopamine-based reinforcement learning
	House, Robert A.1; Maitra, Urmimala 1; Perez-Osorio, Miguel A.1; Lozano, Juan G.1,2; Jin, Liyu 1; Somerville, James W.1; Duda, Laurent C.3; Nag, Abhishek 4; Walters, Andrew 4; Zhou, Ke-Jin 4; Roberts, Matthew R.1; Bruce, Peter G.1,5,6,7
	2020-03-01
发表期刊	NATURE
ISSN	0028-0836
EISSN	1476-4687
出版年	2020
卷号	577 期号:7792 页码:671-+
文章类型	Article
语种	英语
国家	England; USA
英文关键词	Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain(1-3). According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning(4-6). We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning. Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution.
领域	地球科学 ; 气候变化 ; 资源环境
收录类别	SCI-E ; SSCI
WOS记录号	WOS:000508287700004
WOS关键词	REWARD ; GRADIENTS ; CIRCUITRY ; RESPONSES ; NEURONS ; SITES ; D-1
WOS类目	Multidisciplinary Sciences
WOS研究方向	Science & Technology - Other Topics
引用统计
文献类型	期刊论文
条目标识符	http://119.78.100.173/C666/handle/2XK7JSWQ/281060
专题	地球科学资源环境科学气候变化
作者单位	1.Univ Oxford, Dept Mat, Oxford, England; 2.Univ Seville, Escuela Tecn Super Ingn, Dept Ingn & Ciencia Mat & Transporte, Seville, Spain; 3.Uppsala Univ, Div Mol & Condensed Matter Phys, Dept Phys & Astron, Uppsala, Sweden; 4.Diamond Light Source, Harwell, Berks, England; 5.Univ Oxford, Dept Chem, Oxford, England; 6.Henry Royce Inst, Oxford, England; 7.Faraday Inst, Didcot, Oxon, England
推荐引用方式 GB/T 7714	House, Robert A.,Maitra, Urmimala,Perez-Osorio, Miguel A.,et al. A distributional code for value in dopamine-based reinforcement learning[J]. NATURE,2020,577(7792):671-+.
APA	House, Robert A..,Maitra, Urmimala.,Perez-Osorio, Miguel A..,Lozano, Juan G..,Jin, Liyu.,...&Bruce, Peter G..(2020).A distributional code for value in dopamine-based reinforcement learning.NATURE,577(7792),671-+.
MLA	House, Robert A.,et al."A distributional code for value in dopamine-based reinforcement learning".NATURE 577.7792(2020):671-+.