{"title":"强化学习在约定产生中的作用:重复志愿者困境的模拟实验","authors":"H. Nunner, W. Przepiorka, Chris Janssen","doi":"10.18564/jasss.4771","DOIUrl":null,"url":null,"abstract":"We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.","PeriodicalId":14675,"journal":{"name":"J. Artif. Soc. Soc. Simul.","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma\",\"authors\":\"H. Nunner, W. Przepiorka, Chris Janssen\",\"doi\":\"10.18564/jasss.4771\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.\",\"PeriodicalId\":14675,\"journal\":{\"name\":\"J. Artif. Soc. Soc. Simul.\",\"volume\":\"45 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Artif. Soc. Soc. Simul.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18564/jasss.4771\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Artif. Soc. Soc. Simul.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18564/jasss.4771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Role of Reinforcement Learning in the Emergence of Conventions: Simulation Experiments with the Repeated Volunteer's Dilemma
We use reinforcement learning models to investigate the role of cognitive mechanisms in the emergence of conventions in the repeated volunteer’s dilemma (VOD). The VOD is amulti-person, binary choice collective goods game in which the contribution of only one individual is necessary and su icient to produce a benefit for the entire group. Behavioral experiments show that in the symmetric VOD,where all groupmembers have the same costs of volunteering, a turn-taking convention emerges, whereas in the asymmetric VOD,where one “strong” group member has lower costs of volunteering, a solitary-volunteering convention emerges with the strong member volunteering most of the time. We compare three di erent classes of reinforcement learningmodels in their ability to replicate these empirical findings. Our results confirm that reinforcement learning models canprovide aparsimonious account of howhumans tacitly agreeonone course of actionwhenencountering each other repeatedly in the same interaction situation. We find that considering contextual clues (i.e., reward structures) for strategy design (i.e., sequences of actions) and strategy selection (i.e., favoring equal distribution of costs) facilitate coordinationwhenoptimaare less salient. Furthermore, ourmodels producebetter fits with the empirical datawhen agents actmyopically (favoring current over expected future rewards) and the rewards for adhering to conventions are not delayed.