The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma

J. Artif. Soc. Soc. Simul. Pub Date : 2022-01-01 DOI:10.18564/jasss.4771

H. Nunner, W. Przepiorka, Chris Janssen

{"title":"The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma","authors":"H. Nunner, W. Przepiorka, Chris Janssen","doi":"10.18564/jasss.4771","DOIUrl":null,"url":null,"abstract":"We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.","PeriodicalId":14675,"journal":{"name":"J. Artif. Soc. Soc. Simul.","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Artif. Soc. Soc. Simul.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18564/jasss.4771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

强化学习在约定产生中的作用:重复志愿者困境的模拟实验

我们使用强化学习模型来研究认知机制在重复志愿者困境(VOD)中约定产生中的作用。VOD是一种多人、二元选择的集体商品博弈，在这种博弈中，只有一个人的贡献是必要的，足以为整个群体产生利益。行为实验表明，在对称的视频点播中，当所有成员的志愿服务成本相同时，就会出现轮流约定;而在非对称的视频点播中，当一个“强”成员的志愿服务成本较低时，就会出现一个“强”成员大多数时间都志愿服务的“孤独-志愿约定”。我们比较了三种不同类型的强化学习模型复制这些实证结果的能力。我们的研究结果证实，强化学习模型可以为人类在相同的互动情况下反复遇到彼此时如何默认一个行动过程提供简洁的解释。我们发现，考虑策略设计(即行动序列)和策略选择(即倾向于成本的平均分配)的上下文线索(即奖励结构)有助于在优化不太突出时进行协调。此外，当代理行为短视(偏好当前而非预期的未来奖励)并且遵守约定的奖励不会延迟时，我们的模型与经验数据的拟合更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Artif. Soc. Soc. Simul.

自引率

0.00%

发文量