一种生物启发强化学习模型，能说明惩罚后的快速适应性

IF 2.2 4区心理学 Q3 BEHAVIORAL SCIENCES Neurobiology of Learning and Memory Pub Date : 2024-08-28 DOI:10.1016/j.nlm.2024.107974

Eric Chalmers , Artur Luczak

{"title":"一种生物启发强化学习模型，能说明惩罚后的快速适应性","authors":"Eric Chalmers , Artur Luczak","doi":"10.1016/j.nlm.2024.107974","DOIUrl":null,"url":null,"abstract":"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>","PeriodicalId":19102,"journal":{"name":"Neurobiology of Learning and Memory","volume":"215 ","pages":"Article 107974"},"PeriodicalIF":2.2000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment\",\"authors\":\"Eric Chalmers , Artur Luczak\",\"doi\":\"10.1016/j.nlm.2024.107974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of <em>impaired</em> response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.</p></div>\",\"PeriodicalId\":19102,\"journal\":{\"name\":\"Neurobiology of Learning and Memory\",\"volume\":\"215 \",\"pages\":\"Article 107974\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856/pdfft?md5=c9ca4f1643f792be3695d63fd4923555&pid=1-s2.0-S1074742724000856-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurobiology of Learning and Memory\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1074742724000856\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BEHAVIORAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Learning and Memory","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1074742724000856","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

当以前的奖励策略受到惩罚时，人类和动物可以迅速学会一种新策略。但很难用强化学习方法来模拟这种情况，因为它们倾向于坚持以前学习的策略--这是对惩罚反应受损的标志。过去的研究通过使用特别参数或并行学习系统来增强传统的强化学习方程来解决这个问题。这种方法产生的强化学习模型可以解释逆转学习，但更加抽象、复杂，而且在一定程度上脱离了神经基质。在这里，我们采用了一种不同的方法：我们概括了最近发现的神经元级学习规则，假设它捕捉到了可能发生在全脑级的学习基本原理。令人惊奇的是，这给出了一个新的强化学习规则，它考虑到了适应和损失转移行为，并且只使用了与传统强化学习方程相同的参数。在新规则中，驱动强化学习的正常奖赏预测误差会被代理赋予触发奖赏或惩罚的行动的可能性所缩放。新规则在纸牌排序和可变爱荷华赌博任务中表现出快速适应性，还表现出类似人类的选择悖论效应。它将对学习和行为建模的实验研究人员有所帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A bio-inspired reinforcement learning model that accounts for fast adaptation after punishment

Humans and animals can quickly learn a new strategy when a previously-rewarding strategy is punished. It is difficult to model this with reinforcement learning methods, because they tend to perseverate on previously-learned strategies − a hallmark of impaired response to punishment. Past work has addressed this by augmenting conventional reinforcement learning equations with ad hoc parameters or parallel learning systems. This produces reinforcement learning models that account for reversal learning, but are more abstract, complex, and somewhat detached from neural substrates. Here we use a different approach: we generalize a recently-discovered neuron-level learning rule, on the assumption that it captures a basic principle of learning that may occur at the whole-brain-level. Surprisingly, this gives a new reinforcement learning rule that accounts for adaptation and lose-shift behavior, and uses only the same parameters as conventional reinforcement learning equations. In the new rule, the normal reward prediction errors that drive reinforcement learning are scaled by the likelihood the agent assigns to the action that triggered a reward or punishment. The new rule demonstrates quick adaptation in card sorting and variable Iowa gambling tasks, and also exhibits a human-like paradox-of-choice effect. It will be useful for experimental researchers modeling learning and behavior.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurobiology of Learning and Memory 医学-行为科学

CiteScore

5.10

自引率

7.40%

发文量

审稿时长

12.6 weeks

期刊介绍： Neurobiology of Learning and Memory publishes articles examining the neurobiological mechanisms underlying learning and memory at all levels of analysis ranging from molecular biology to synaptic and neural plasticity and behavior. We are especially interested in manuscripts that examine the neural circuits and molecular mechanisms underlying learning, memory and plasticity in both experimental animals and human subjects.