Observational learning of exploration-exploitation strategies in bandit tasks

IF 2.8 1区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Cognition Pub Date : 2025-03-20 DOI:10.1016/j.cognition.2025.106124
Ludwig Danwitz, Bettina von Helversen
{"title":"Observational learning of exploration-exploitation strategies in bandit tasks","authors":"Ludwig Danwitz,&nbsp;Bettina von Helversen","doi":"10.1016/j.cognition.2025.106124","DOIUrl":null,"url":null,"abstract":"<div><div>In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"259 ","pages":"Article 106124"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027725000642","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

In decision-making scenarios, individuals often face the challenge of balancing between exploring new options and exploiting known ones—a dynamic known as the exploration-exploitation trade-off. In such situations, people frequently have the opportunity to observe others' actions. Yet little is known about when, how, and from whom individuals use observational learning in the exploration-exploitation dilemma. In two experiments, participants completed multiple nine-armed bandit tasks, either independently or while observing a fictitious agent using either an explorative or equally successful exploitative strategy. To analyze participants' behaviors, we used a reinforcement learning model (simplified Kalman Filter) to extract parameters for both copying and exploration at the individual level. Results showed that participants copied the observed agents' choices by adding a bonus to the individually estimated value of the observed action. While most participants appear to use an unconditional copying approach, a subset of participants adopted a copy-when-uncertain approach, that is copying more when uncertain about the optimal action based on their individually acquired knowledge. Further, participants adjusted their exploration strategies in alignment with those observed. We discuss, in how far this can be understood as a form of emulation. Results on participants' preferences to copy from explorative versus exploitative agents are ambiguous. Contrary to expectations, similarity or dissimilarity between participants' and agents' exploration tendencies had no impact on observational learning. These results shed light on humans' processing of social and non-social information in exploration scenarios and conditions of observational learning.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
土匪任务中探索-开发策略的观察学习。
在决策场景中,个人经常面临在探索新选项和利用已知选项之间取得平衡的挑战--这种动态被称为探索-利用权衡。在这种情况下,人们经常有机会观察他人的行动。然而,人们对个体在探索-开发两难境地中何时、如何以及从谁那里利用观察学习知之甚少。在两个实验中,参与者独立完成了多个九臂强盗任务,或同时观察了一个使用探索策略或同样成功的利用策略的虚构代理。为了分析参与者的行为,我们使用了一个强化学习模型(简化卡尔曼滤波器)来提取个体层面的复制和探索参数。结果显示,参与者通过在观察到的行动的个体估计值上添加奖励来复制观察到的代理选择。虽然大多数参与者似乎采用了无条件复制的方法,但也有一部分参与者采用了 "不确定时复制 "的方法。此外,参与者还根据观察到的情况调整自己的探索策略。我们将讨论这在多大程度上可以理解为一种模仿。关于参与者对模仿探索型代理与模仿开发型代理的偏好,结果并不明确。与预期相反,参与者和代理人探索倾向的相似性或不相似性对观察学习没有影响。这些结果揭示了人类在探索情景和观察学习条件下对社会和非社会信息的处理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Cognition
Cognition PSYCHOLOGY, EXPERIMENTAL-
CiteScore
6.40
自引率
5.90%
发文量
283
期刊介绍: Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.
期刊最新文献
The inferred value of unchosen options spreads to related items in memory. More than words: Effects of grammaticality and lexical surprisal in self-paced reading. Social offloading: When joint action leads to cognitive facilitation. Adults' number gestures focus children on numeracy. Stimulus reliability but not boundary distance manipulations violate the folded-X pattern of confidence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1