Neurons, behavior, data analysis and theory最新文献

英文中文

Representation learning with reward prediction errors 具有奖励预测误差的表征学习

Neurons, behavior, data analysis and theory

Pub Date : 2019-01-23 DOI: 10.51628/001c.37270

S. Gershman

The Reward Prediction Error hypothesis proposes that phasic activity in the midbrain dopaminergic system reflects prediction errors needed for learning in reinforcement learning. Besides the well-documented association between dopamine and reward processing, dopamine is implicated in a variety of functions without a clear relationship to reward prediction error. Fluctuations in dopamine levels influence the subjective perception of time, dopamine bursts precede the generation of motor responses, and the dopaminergic system innervates regions of the brain, including hippocampus and areas in prefrontal cortex, whose function is not uniquely tied to reward. In this manuscript, we propose that a common theme linking these functions is representation, and that prediction errors signaled by the dopamine system, in addition to driving associative learning, can also support the acquisition of adaptive state representations. In a series of simulations, we show how this extension can account for the role of dopamine in temporal and spatial representation, motor response, and abstract categorization tasks. By extending the role of dopamine signals to learning state representations, we resolve a critical challenge to the Reward Prediction Error hypothesis of dopamine function.

奖励预测误差假说认为，在强化学习中，中脑多巴胺能系统的相活动反映了学习所需的预测误差。除了多巴胺和奖励处理之间有充分记录的联系外，多巴胺还涉及多种功能，但与奖励预测误差没有明确的关系。多巴胺水平的波动影响对时间的主观感知，多巴胺的爆发先于运动反应的产生，多巴胺能系统支配大脑的一些区域，包括海马体和前额皮质区域，这些区域的功能并不仅仅与奖励有关。在本文中，我们提出连接这些功能的一个共同主题是表征，并且多巴胺系统发出的预测错误信号，除了驱动联想学习外，还可以支持自适应状态表征的获取。在一系列的模拟中，我们展示了这种扩展如何解释多巴胺在时间和空间表征、运动反应和抽象分类任务中的作用。通过将多巴胺信号的作用扩展到学习状态表征，我们解决了多巴胺功能奖励预测误差假说的一个关键挑战。

{"title":"Representation learning with reward prediction errors","authors":"S. Gershman","doi":"10.51628/001c.37270","DOIUrl":"https://doi.org/10.51628/001c.37270","url":null,"abstract":"The Reward Prediction Error hypothesis proposes that phasic activity in the midbrain dopaminergic system reflects prediction errors needed for learning in reinforcement learning. Besides the well-documented association between dopamine and reward processing, dopamine is implicated in a variety of functions without a clear relationship to reward prediction error. Fluctuations in dopamine levels influence the subjective perception of time, dopamine bursts precede the generation of motor responses, and the dopaminergic system innervates regions of the brain, including hippocampus and areas in prefrontal cortex, whose function is not uniquely tied to reward. In this manuscript, we propose that a common theme linking these functions is representation, and that prediction errors signaled by the dopamine system, in addition to driving associative learning, can also support the acquisition of adaptive state representations. In a series of simulations, we show how this extension can account for the role of dopamine in temporal and spatial representation, motor response, and abstract categorization tasks. By extending the role of dopamine signals to learning state representations, we resolve a critical challenge to the Reward Prediction Error hypothesis of dopamine function.","PeriodicalId":74289,"journal":{"name":"Neurons, behavior, data analysis and theory","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82789402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Performance of normative and approximate evidence accumulation on the dynamic clicks task. 规范和近似证据积累在动态点击任务中的表现。

Neurons, behavior, data analysis and theory

Pub Date : 2019-01-01 Epub Date: 2019-10-09

Adrian E Radillo, Alan Veliz-Cuba, Krešimir Josić, Zachary P Kilpatrick

The aim of a number of psychophysics tasks is to uncover how mammals make decisions in a world that is in flux. Here we examine the characteristics of ideal and near-ideal observers in a task of this type. We ask when and how performance depends on task parameters and design, and, in turn, what observer performance tells us about their decision-making process. In the dynamic clicks task subjects hear two streams (left and right) of Poisson clicks with different rates. Subjects are rewarded when they correctly identify the side with the higher rate, as this side switches unpredictably. We show that a reduced set of task parameters defines regions in parameter space in which optimal, but not near-optimal observers, maintain constant response accuracy. We also show that for a range of task parameters an approximate normative model must be finely tuned to reach near-optimal performance, illustrating a potential way to distinguish between normative models and their approximations. In addition, we show that using the negative log-likelihood and the 0/1-loss functions to fit these types of models is not equivalent: the 0/1-loss leads to a bias in parameter recovery that increases with sensory noise. These findings suggest ways to tease apart models that are hard to distinguish when tuned exactly, and point to general pitfalls in experimental design, model fitting, and interpretation of the resulting data.

许多心理物理学任务的目的是揭示哺乳动物是如何在一个不断变化的世界中做出决定的。在这里，我们考察了这类任务中理想和接近理想观察者的特征。我们会问，绩效何时以及如何取决于任务参数和设计，以及，反过来，观察者的绩效告诉我们他们的决策过程。在动态点击任务中，受试者听到两种不同频率的泊松点击流(左和右)。当受试者正确识别出频率较高的那一面时，他们就会得到奖励，因为这一面的变化是不可预测的。我们证明了任务参数的简化集定义了参数空间中的区域，其中最优但不是接近最优的观察者保持恒定的响应精度。我们还表明，对于一系列任务参数，近似规范模型必须经过精细调整才能达到接近最佳的性能，这说明了区分规范模型及其近似的潜在方法。此外，我们表明，使用负对数似然函数和0/1损失函数来拟合这些类型的模型是不等价的:0/1损失导致参数恢复中的偏差，随着感官噪声的增加而增加。这些发现提出了梳理模型的方法，这些模型在精确调整时难以区分，并指出了实验设计、模型拟合和结果数据解释中的一般陷阱。

{"title":"Performance of normative and approximate evidence accumulation on the dynamic clicks task.","authors":"Adrian E Radillo, Alan Veliz-Cuba, Krešimir Josić, Zachary P Kilpatrick","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The aim of a number of psychophysics tasks is to uncover how mammals make decisions in a world that is in flux. Here we examine the characteristics of ideal and near-ideal observers in a task of this type. We ask when and how performance depends on task parameters and design, and, in turn, what observer performance tells us about their decision-making process. In the dynamic clicks task subjects hear two streams (left and right) of Poisson clicks with different rates. Subjects are rewarded when they correctly identify the side with the higher rate, as this side switches unpredictably. We show that a reduced set of task parameters defines regions in parameter space in which optimal, but not near-optimal observers, maintain constant response accuracy. We also show that for a range of task parameters an approximate normative model must be finely tuned to reach near-optimal performance, illustrating a potential way to distinguish between normative models and their approximations. In addition, we show that using the negative log-likelihood and the 0/1-loss functions to fit these types of models is not equivalent: the 0/1-loss leads to a bias in parameter recovery that increases with sensory noise. These findings suggest ways to tease apart models that are hard to distinguish when tuned exactly, and point to general pitfalls in experimental design, model fitting, and interpretation of the resulting data.</p>","PeriodicalId":74289,"journal":{"name":"Neurons, behavior, data analysis and theory","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7166050/pdf/nihms-1576728.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37850901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Imagination and Heuristics to Learn Strategies that Generalize 结合想象力和启发式学习泛化策略

Neurons, behavior, data analysis and theory

Pub Date : 2018-09-10 DOI: 10.51628/001c.13477

Erik J Peterson, Necati Alp Müyesser, T. Verstynen, Kyle Dunovan

Deep reinforcement learning can match or exceed human performance in stable contexts, but with minor changes to the environment artificial networks, unlike humans, often cannot adapt. Humans rely on a combination of heuristics to simplify computational load and imagination to extend experiential learning to new and more challenging environments. Motivated by theories of the hierarchical organization of the human prefrontal networks, we have developed a model of hierarchical reinforcement learning that combines both heuristics and imagination into a “stumbler-strategist” network. We test performance of this network using Wythoff’s game, a gridworld environment with a known optimal strategy. We show that a heuristic labeling of each position as hot or cold, combined with imagined play, both accelerates learning and promotes transfer to novel games, while also improving model interpretability

深度强化学习可以在稳定的环境中匹配或超过人类的表现，但与人类不同，人工网络在环境发生微小变化时往往无法适应。人类依靠启发式的组合来简化计算负荷和想象力，将体验式学习扩展到新的和更具挑战性的环境。受人类前额叶网络分层组织理论的启发，我们开发了一种分层强化学习模型，该模型将启发式和想象力结合到“绊倒-战略家”网络中。我们使用Wythoff游戏测试了该网络的性能，这是一个具有已知最优策略的网格世界环境。我们表明，将每个位置标记为热或冷的启发式标签，结合想象游戏，既加速了学习，又促进了向新游戏的迁移，同时也提高了模型的可解释性

引用次数: 1

On the Subspace Invariance of Population Responses. 关于总体响应的子空间不变性。

Neurons, behavior, data analysis and theory

Pub Date : 2018-01-01

Elaine Tring, Dario L Ringach

In cat visual cortex, the response of a neural population to the linear combination of two sinusoidal gratings (a plaid) can be well approximated by a weighted sum of the population responses to the individual gratings - a property we refer to as subspace invariance. We tested subspace invariance in mouse primary visual cortex by measuring the angle between the population response to a plaid and the plane spanned by the population responses to its individual components. We found robust violations of subspace invariance arising from a strong, negative correlation between the responses of neurons to individual gratings and their responses to the plaid. Contrast invariance, a special case of subspace invariance, also failed. The responses of some neurons decreased with increasing contrast, while others increased. Altogether the data show that subspace and contrast invariance do not hold in mouse primary visual cortex. These findings rule out some models of population coding, including vector averaging, some versions of normalization and temporal multiplexing.

在猫的视觉皮层中，神经种群对两个正弦光栅(格子)线性组合的响应可以很好地近似为对单个光栅的种群响应的加权和-我们称之为子空间不变性。我们通过测量群体对格子的反应与群体对其各个组成部分的反应所跨越的平面之间的角度来测试小鼠初级视觉皮层的子空间不变性。我们发现，由于神经元对单个栅格的反应与其对格子的反应之间存在强烈的负相关关系，因此存在对子空间不变性的强大违反。对比不变性，一种特殊的子空间不变性，也失败了。一些神经元的反应随着对比度的增加而降低，而另一些神经元的反应则增加。总之，这些数据表明，子空间和对比度不变性在小鼠初级视觉皮层中不成立。这些发现排除了一些人口编码模型，包括向量平均，某些版本的归一化和时间复用。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Neurons, behavior, data analysis and theory

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀