An inductive bias for slowly changing features in human reinforcement learning.

IF 3.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS PLoS Computational Biology Pub Date : 2024-11-25 DOI:10.1371/journal.pcbi.1012568
Noa L Hedrich, Eric Schulz, Sam Hall-McMaster, Nicolas W Schuck
{"title":"An inductive bias for slowly changing features in human reinforcement learning.","authors":"Noa L Hedrich, Eric Schulz, Sam Hall-McMaster, Nicolas W Schuck","doi":"10.1371/journal.pcbi.1012568","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying goal-relevant features in novel environments is a central challenge for efficient behaviour. We asked whether humans address this challenge by relying on prior knowledge about common properties of reward-predicting features. One such property is the rate of change of features, given that behaviourally relevant processes tend to change on a slower timescale than noise. Hence, we asked whether humans are biased to learn more when task-relevant features are slow rather than fast. To test this idea, 295 human participants were asked to learn the rewards of two-dimensional bandits when either a slowly or quickly changing feature of the bandit predicted reward. Across two experiments and one preregistered replication, participants accrued more reward when a bandit's relevant feature changed slowly, and its irrelevant feature quickly, as compared to the opposite. We did not find a difference in the ability to generalise to unseen feature values between conditions. Testing how feature speed could affect learning with a set of four function approximation Kalman filter models revealed that participants had a higher learning rate for the slow feature, and adjusted their learning to both the relevance and the speed of feature changes. The larger the improvement in participants' performance for slow compared to fast bandits, the more strongly they adjusted their learning rates. These results provide evidence that human reinforcement learning favours slower features, suggesting a bias in how humans approach reward learning.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"20 11","pages":"e1012568"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012568","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Identifying goal-relevant features in novel environments is a central challenge for efficient behaviour. We asked whether humans address this challenge by relying on prior knowledge about common properties of reward-predicting features. One such property is the rate of change of features, given that behaviourally relevant processes tend to change on a slower timescale than noise. Hence, we asked whether humans are biased to learn more when task-relevant features are slow rather than fast. To test this idea, 295 human participants were asked to learn the rewards of two-dimensional bandits when either a slowly or quickly changing feature of the bandit predicted reward. Across two experiments and one preregistered replication, participants accrued more reward when a bandit's relevant feature changed slowly, and its irrelevant feature quickly, as compared to the opposite. We did not find a difference in the ability to generalise to unseen feature values between conditions. Testing how feature speed could affect learning with a set of four function approximation Kalman filter models revealed that participants had a higher learning rate for the slow feature, and adjusted their learning to both the relevance and the speed of feature changes. The larger the improvement in participants' performance for slow compared to fast bandits, the more strongly they adjusted their learning rates. These results provide evidence that human reinforcement learning favours slower features, suggesting a bias in how humans approach reward learning.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人类强化学习中对缓慢变化特征的归纳偏差
在新环境中识别与目标相关的特征是高效行为的核心挑战。我们的问题是,人类是否会依靠有关奖励预测特征共同特性的先验知识来应对这一挑战。其中一个特性就是特征的变化速度,因为与行为相关的过程往往比噪声的变化速度慢。因此,我们想知道,当与任务相关的特征变化较慢而不是较快时,人类是否会偏向于学习更多的知识。为了验证这一观点,我们要求 295 名人类参与者学习二维匪帮的奖励,当匪帮的缓慢或快速变化特征预测奖励时。在两次实验和一次预先登记的重复实验中,当匪徒的相关特征变化缓慢而无关特征变化迅速时,参与者获得的奖励更多,反之则更少。我们没有发现不同条件下,参与者对未见特征值的泛化能力存在差异。使用一组四个函数近似卡尔曼滤波模型来测试特征速度如何影响学习,结果显示,参与者对慢速特征的学习率较高,并且会根据特征变化的相关性和速度来调整学习。与快速匪帮相比,参与者在慢速匪帮上的成绩提高越大,他们调整学习率的力度就越大。这些结果提供了人类强化学习偏爱慢速特征的证据,表明人类在处理奖励学习时存在偏差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
PLoS Computational Biology
PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.10
自引率
4.70%
发文量
820
审稿时长
2.5 months
期刊介绍: PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.
期刊最新文献
Mathematical modeling suggests heterogeneous replication of Mycobacterium tuberculosis in rabbits. scGRN-Entropy: Inferring cell differentiation trajectories using single-cell data and gene regulation network-based transfer entropy. Socially driven negative feedback regulates activity and energy use in ant colonies. An inductive bias for slowly changing features in human reinforcement learning. Competition for resources can reshape the evolutionary properties of spatial structure.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1