Extinction burst could be explained by curiosity-driven reinforcement learning

Kota Yamada, Hiroshi Matsui, Koji Toda
{"title":"Extinction burst could be explained by curiosity-driven reinforcement learning","authors":"Kota Yamada, Hiroshi Matsui, Koji Toda","doi":"10.1101/2024.08.28.610088","DOIUrl":null,"url":null,"abstract":"Curiosity encourages agents to explore their environment, leading to learning opportunities. Although psychology and neurobiology have tackled how external rewards control behavior, how intrinsic factors control behavior remains unclear. An extinction burst is a behavioral phenomenon in which a sudden increase in the frequency of a behavior immediately follows the omission of a reward. Although the extinction burst is textbook knowledge in psychology, there is little empirical evidence of it in experimental situations. In this study, we show that the extinction burst can be explained by curiosity by combining computational modeling of behavior and empirical demonstrations in mice. First, we built a reinforcement learning model incorporating curiosity, defined as expected reward prediction errors, and the model additively controlled the agent's behavior to the primary reward. Simulations revealed that the curiosity-driven reinforcement learning model produced an extinction burst and burst intensity depended on the reward probability. Second, we established a behavioral procedure that captured extinction bursts in an experimental setup using mice. We conducted an operant conditioning task with head-fixed mice, in which the reward followed after pressing a lever at a given probability. After the training sessions, we occasionally withheld the reward delivery when the mice performed the task. We found that phasic bursts of responses occurred immediately after reward omission when responses were rewarded with a high probability, suggesting that the magnitude of reward prediction errors controlled the burst. These results provide theoretical and experimental evidence that intrinsic factors control behavior in adapting to an ever-changing environment.","PeriodicalId":501210,"journal":{"name":"bioRxiv - Animal Behavior and Cognition","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Animal Behavior and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.28.610088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Curiosity encourages agents to explore their environment, leading to learning opportunities. Although psychology and neurobiology have tackled how external rewards control behavior, how intrinsic factors control behavior remains unclear. An extinction burst is a behavioral phenomenon in which a sudden increase in the frequency of a behavior immediately follows the omission of a reward. Although the extinction burst is textbook knowledge in psychology, there is little empirical evidence of it in experimental situations. In this study, we show that the extinction burst can be explained by curiosity by combining computational modeling of behavior and empirical demonstrations in mice. First, we built a reinforcement learning model incorporating curiosity, defined as expected reward prediction errors, and the model additively controlled the agent's behavior to the primary reward. Simulations revealed that the curiosity-driven reinforcement learning model produced an extinction burst and burst intensity depended on the reward probability. Second, we established a behavioral procedure that captured extinction bursts in an experimental setup using mice. We conducted an operant conditioning task with head-fixed mice, in which the reward followed after pressing a lever at a given probability. After the training sessions, we occasionally withheld the reward delivery when the mice performed the task. We found that phasic bursts of responses occurred immediately after reward omission when responses were rewarded with a high probability, suggesting that the magnitude of reward prediction errors controlled the burst. These results provide theoretical and experimental evidence that intrinsic factors control behavior in adapting to an ever-changing environment.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
消亡爆发可以用好奇心驱动的强化学习来解释
好奇心鼓励人们探索周围环境,从而获得学习机会。虽然心理学和神经生物学已经解决了外部奖励如何控制行为的问题,但内在因素如何控制行为仍不清楚。消退爆发是一种行为现象,在这种现象中,一种行为的频率会在奖励消失后立即突然增加。尽管消光爆发是心理学的教科书知识,但在实验情况下却鲜有实证证据。在本研究中,我们通过对行为的计算建模和小鼠的经验证明相结合,证明了消亡爆发可以用好奇心来解释。首先,我们建立了一个包含好奇心(定义为预期奖励预测误差)的强化学习模型,该模型通过加法控制行为主体的行为来获得主要奖励。模拟结果表明,好奇心驱动的强化学习模型会产生灭绝爆发,爆发强度取决于奖励概率。其次,我们建立了一个行为程序,在使用小鼠的实验装置中捕捉消亡爆发。我们对头部固定的小鼠进行了操作性条件反射训练,即在按下给定概率的杠杆后获得奖励。训练结束后,我们偶尔会在小鼠完成任务时暂停奖励的发放。我们发现,当小鼠以高概率获得奖励时,在奖励缺失后会立即出现阶段性的反应爆发,这表明奖励预测错误的大小控制着反应爆发。这些结果提供了理论和实验证据,证明内在因素控制着适应不断变化环境的行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Emotional contexts influence vocal individuality in ungulates Athene cunicularia hypugaea wintering in a central California urban setting arrive later, leave earlier, prefer sheltered micro-habitat, tolerate rain, and contend with diverse predators Monkeys Predict US Elections Meat transfers follow social ties in the multi-level society of Guinea baboons but are not related to male reproductive success Jumping spiders are not fooled by the peripheral drift illusion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1