一种生物启发强化学习模型,能说明惩罚后的快速适应性

IF 2.2 4区 心理学 Q3 BEHAVIORAL SCIENCES Neurobiology of Learning and Memory Pub Date : 2024-08-28 DOI:10.1016/j.nlm.2024.107974
Eric Chalmers , Artur Luczak
{"title":"一种生物启发强化学习模型,能说明惩罚后的快速适应性","authors":"Eric Chalmers ,&nbsp;Artur Luczak","doi":"10.1016/j.nlm.2024.107974","DOIUrl":null,"url":null,"abstract":"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>","PeriodicalId":19102,"journal":{"name":"Neurobiology of Learning and Memory","volume":"215 ","pages":"Article 107974"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment\",\"authors\":\"Eric Chalmers ,&nbsp;Artur Luczak\",\"doi\":\"10.1016/j.nlm.2024.107974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>\",\"PeriodicalId\":19102,\"journal\":{\"name\":\"Neurobiology of Learning and Memory\",\"volume\":\"215 \",\"pages\":\"Article 107974\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurobiology of Learning and Memory\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BEHAVIORAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Learning and Memory","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1074742724000856","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

当以前的奖励策略受到惩罚时,人类和动物可以迅速学会一种新策略。但很难用强化学习方法来模拟这种情况,因为它们倾向于坚持以前学习的策略--这是对惩罚反应受损的标志。过去的研究通过使用特别参数或并行学习系统来增强传统的强化学习方程来解决这个问题。这种方法产生的强化学习模型可以解释逆转学习,但更加抽象、复杂,而且在一定程度上脱离了神经基质。在这里,我们采用了一种不同的方法:我们概括了最近发现的神经元级学习规则,假设它捕捉到了可能发生在全脑级的学习基本原理。令人惊奇的是,这给出了一个新的强化学习规则,它考虑到了适应和损失转移行为,并且只使用了与传统强化学习方程相同的参数。在新规则中,驱动强化学习的正常奖赏预测误差会被代理赋予触发奖赏或惩罚的行动的可能性所缩放。新规则在纸牌排序和可变爱荷华赌博任务中表现出快速适应性,还表现出类似人类的选择悖论效应。它将对学习和行为建模的实验研究人员有所帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment

Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of impaired response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.10
自引率
7.40%
发文量
77
审稿时长
12.6 weeks
期刊介绍: Neurobiology of Learning and Memory publishes articles examining the neurobiological mechanisms underlying learning and memory at all levels of analysis ranging from molecular biology to synaptic and neural plasticity and behavior. We are especially interested in manuscripts that examine the neural circuits and molecular mechanisms underlying learning, memory and plasticity in both experimental animals and human subjects.
期刊最新文献
How predictability and individual alpha frequency shape memory: Insights from an event-related potential investigation. The retrosplenial cortical role in delayed spatial alternation. Attentional processing in the rat dorsal posterior parietal cortex Motor-related oscillations reveal the involvement of sensorimotor processes during recognition memory Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1