GSTDTAP  > 地球科学
DOI10.1038/s41586-019-1924-6
A distributional code for value in dopamine-based reinforcement learning
House, Robert A.1; Maitra, Urmimala1; Perez-Osorio, Miguel A.1; Lozano, Juan G.1,2; Jin, Liyu1; Somerville, James W.1; Duda, Laurent C.3; Nag, Abhishek4; Walters, Andrew4; Zhou, Ke-Jin4; Roberts, Matthew R.1; Bruce, Peter G.1,5,6,7
2020-03-01
发表期刊NATURE
ISSN0028-0836
EISSN1476-4687
出版年2020
卷号577期号:7792页码:671-+
文章类型Article
语种英语
国家England; USA
英文关键词

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain(1-3). According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning(4-6). We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.


Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution.


领域地球科学 ; 气候变化 ; 资源环境
收录类别SCI-E ; SSCI
WOS记录号WOS:000508287700004
WOS关键词REWARD ; GRADIENTS ; CIRCUITRY ; RESPONSES ; NEURONS ; SITES ; D-1
WOS类目Multidisciplinary Sciences
WOS研究方向Science & Technology - Other Topics
引用统计
文献类型期刊论文
条目标识符http://119.78.100.173/C666/handle/2XK7JSWQ/281060
专题地球科学
资源环境科学
气候变化
作者单位1.Univ Oxford, Dept Mat, Oxford, England;
2.Univ Seville, Escuela Tecn Super Ingn, Dept Ingn & Ciencia Mat & Transporte, Seville, Spain;
3.Uppsala Univ, Div Mol & Condensed Matter Phys, Dept Phys & Astron, Uppsala, Sweden;
4.Diamond Light Source, Harwell, Berks, England;
5.Univ Oxford, Dept Chem, Oxford, England;
6.Henry Royce Inst, Oxford, England;
7.Faraday Inst, Didcot, Oxon, England
推荐引用方式
GB/T 7714
House, Robert A.,Maitra, Urmimala,Perez-Osorio, Miguel A.,et al. A distributional code for value in dopamine-based reinforcement learning[J]. NATURE,2020,577(7792):671-+.
APA House, Robert A..,Maitra, Urmimala.,Perez-Osorio, Miguel A..,Lozano, Juan G..,Jin, Liyu.,...&Bruce, Peter G..(2020).A distributional code for value in dopamine-based reinforcement learning.NATURE,577(7792),671-+.
MLA House, Robert A.,et al."A distributional code for value in dopamine-based reinforcement learning".NATURE 577.7792(2020):671-+.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[House, Robert A.]的文章
[Maitra, Urmimala]的文章
[Perez-Osorio, Miguel A.]的文章
百度学术
百度学术中相似的文章
[House, Robert A.]的文章
[Maitra, Urmimala]的文章
[Perez-Osorio, Miguel A.]的文章
必应学术
必应学术中相似的文章
[House, Robert A.]的文章
[Maitra, Urmimala]的文章
[Perez-Osorio, Miguel A.]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。