有条件合作的学习督促：多代理强化学习模型

arXiv - CS - Multiagent Systems Pub Date : 2024-09-14 DOI:arxiv-2409.09509

Shatayu Kulkarni, Sabine Brunswicker

{"title":"有条件合作的学习督促：多代理强化学习模型","authors":"Shatayu Kulkarni, Sabine Brunswicker","doi":"arxiv-2409.09509","DOIUrl":null,"url":null,"abstract":"The public goods game describes a social dilemma in which a large proportion\nof agents act as conditional cooperators (CC): they only act cooperatively if\nthey see others acting cooperatively because they satisfice with the social\nnorm to be in line with what others are doing instead of optimizing\ncooperation. CCs are guided by aspiration-based reinforcement learning guided\nby past experiences of interactions with others and satisficing aspirations. In\nmany real-world settings, reinforcing social norms do not emerge. In this\npaper, we propose that an optimizing reinforcement agent can facilitate\ncooperation through nudges, i.e. indirect mechanisms for cooperation to happen.\nThe agent's goal is to motivate CCs into cooperation through its own actions to\ncreate social norms that signal that others are cooperating. We introduce a\nmulti-agent reinforcement learning model for public goods games, with 3 CC\nlearning agents using aspirational reinforcement learning and 1 nudging agent\nusing deep reinforcement learning to learn nudges that optimize cooperation.\nFor our nudging agent, we model two distinct reward functions, one maximizing\nthe total game return (sum DRL) and one maximizing the number of cooperative\ncontributions contributions higher than a proportional threshold (prop DRL).\nOur results show that our aspiration-based RL model for CC agents is consistent\nwith empirically observed CC behavior. Games combining 3 CC RL agents and one\nnudging RL agent outperform the baseline consisting of 4 CC RL agents only. The\nsum DRL nudging agent increases the total sum of contributions by 8.22% and the\ntotal proportion of cooperative contributions by 12.42%, while the prop nudging\nDRL increases the total sum of contributions by 8.85% and the total proportion\nof cooperative contributions by 14.87%. Our findings advance the literature on\npublic goods games and reinforcement learning.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"208 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model\",\"authors\":\"Shatayu Kulkarni, Sabine Brunswicker\",\"doi\":\"arxiv-2409.09509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The public goods game describes a social dilemma in which a large proportion\\nof agents act as conditional cooperators (CC): they only act cooperatively if\\nthey see others acting cooperatively because they satisfice with the social\\nnorm to be in line with what others are doing instead of optimizing\\ncooperation. CCs are guided by aspiration-based reinforcement learning guided\\nby past experiences of interactions with others and satisficing aspirations. In\\nmany real-world settings, reinforcing social norms do not emerge. In this\\npaper, we propose that an optimizing reinforcement agent can facilitate\\ncooperation through nudges, i.e. indirect mechanisms for cooperation to happen.\\nThe agent's goal is to motivate CCs into cooperation through its own actions to\\ncreate social norms that signal that others are cooperating. We introduce a\\nmulti-agent reinforcement learning model for public goods games, with 3 CC\\nlearning agents using aspirational reinforcement learning and 1 nudging agent\\nusing deep reinforcement learning to learn nudges that optimize cooperation.\\nFor our nudging agent, we model two distinct reward functions, one maximizing\\nthe total game return (sum DRL) and one maximizing the number of cooperative\\ncontributions contributions higher than a proportional threshold (prop DRL).\\nOur results show that our aspiration-based RL model for CC agents is consistent\\nwith empirically observed CC behavior. Games combining 3 CC RL agents and one\\nnudging RL agent outperform the baseline consisting of 4 CC RL agents only. The\\nsum DRL nudging agent increases the total sum of contributions by 8.22% and the\\ntotal proportion of cooperative contributions by 12.42%, while the prop nudging\\nDRL increases the total sum of contributions by 8.85% and the total proportion\\nof cooperative contributions by 14.87%. Our findings advance the literature on\\npublic goods games and reinforcement learning.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"208 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

公共物品博弈描述了一种社会困境，在这种困境中，大部分行为主体都是有条件的合作者（CC）：他们只有在看到他人采取合作行动时才会采取合作行动，因为他们满足于社会规范，与他人的行为保持一致，而不是优化合作。CC 以过去与他人互动的经验和满足愿望的愿望为指导，进行基于愿望的强化学习。在现实世界的许多环境中，强化社会规范并没有出现。在本文中，我们提出一个优化强化代理可以通过暗示（即合作发生的间接机制）来促进合作。该代理的目标是通过自己的行动来激励 CC 进行合作，从而建立社会规范，向他人发出合作的信号。我们为公共物品博弈引入了多代理强化学习模型，其中 3 个 CC 学习代理使用愿望强化学习，1 个劝告代理使用深度强化学习来学习优化合作的劝告。对于我们的劝告代理，我们模拟了两个不同的奖励函数，一个是最大化总博弈收益（总和 DRL），另一个是最大化高于比例阈值的合作贡献贡献数（比例 DRL）。我们的结果表明，我们基于愿望的 CC 代理 RL 模型与经验观察到的 CC 行为是一致的。由 3 个 CC RL 代理和一个推断 RL 代理组成的游戏优于仅由 4 个 CC RL 代理组成的基线游戏。总和 DRL 推断代理使贡献总和增加了 8.22%，合作贡献总比例增加了 12.42%，而道具推断 DRL 使贡献总和增加了 8.85%，合作贡献总比例增加了 14.87%。我们的研究结果推动了有关公益博弈和强化学习的文献的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model

The public goods game describes a social dilemma in which a large proportion of agents act as conditional cooperators (CC): they only act cooperatively if they see others acting cooperatively because they satisfice with the social norm to be in line with what others are doing instead of optimizing cooperation. CCs are guided by aspiration-based reinforcement learning guided by past experiences of interactions with others and satisficing aspirations. In many real-world settings, reinforcing social norms do not emerge. In this paper, we propose that an optimizing reinforcement agent can facilitate cooperation through nudges, i.e. indirect mechanisms for cooperation to happen. The agent's goal is to motivate CCs into cooperation through its own actions to create social norms that signal that others are cooperating. We introduce a multi-agent reinforcement learning model for public goods games, with 3 CC learning agents using aspirational reinforcement learning and 1 nudging agent using deep reinforcement learning to learn nudges that optimize cooperation. For our nudging agent, we model two distinct reward functions, one maximizing the total game return (sum DRL) and one maximizing the number of cooperative contributions contributions higher than a proportional threshold (prop DRL). Our results show that our aspiration-based RL model for CC agents is consistent with empirically observed CC behavior. Games combining 3 CC RL agents and one nudging RL agent outperform the baseline consisting of 4 CC RL agents only. The sum DRL nudging agent increases the total sum of contributions by 8.22% and the total proportion of cooperative contributions by 12.42%, while the prop nudging DRL increases the total sum of contributions by 8.85% and the total proportion of cooperative contributions by 14.87%. Our findings advance the literature on public goods games and reinforcement learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量