在具有间接互惠性的混合动机游戏中学习公平合作

Martin Smit, Fernando P. Santos
{"title":"在具有间接互惠性的混合动机游戏中学习公平合作","authors":"Martin Smit, Fernando P. Santos","doi":"arxiv-2408.04549","DOIUrl":null,"url":null,"abstract":"Altruistic cooperation is costly yet socially desirable. As a result, agents\nstruggle to learn cooperative policies through independent reinforcement\nlearning (RL). Indirect reciprocity, where agents consider their interaction\npartner's reputation, has been shown to stabilise cooperation in homogeneous,\nidealised populations. However, more realistic settings are comprised of\nheterogeneous agents with different characteristics and group-based social\nidentities. We study cooperation when agents are stratified into two such\ngroups, and allow reputation updates and actions to depend on group\ninformation. We consider two modelling approaches: evolutionary game theory,\nwhere we comprehensively search for social norms (i.e., rules to assign\nreputations) leading to cooperation and fairness; and RL, where we consider how\nthe stochastic dynamics of policy learning affects the analytically identified\nequilibria. We observe that a defecting majority leads the minority group to\ndefect, but not the inverse. Moreover, changing the norms that judge in and\nout-group interactions can steer a system towards either fair or unfair\ncooperation. This is made clearer when moving beyond equilibrium analysis to\nindependent RL agents, where convergence to fair cooperation occurs with a\nnarrower set of norms. Our results highlight that, in heterogeneous populations\nwith reputations, carefully defining interaction norms is fundamental to tackle\nboth dilemmas of cooperation and of fairness.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity\",\"authors\":\"Martin Smit, Fernando P. Santos\",\"doi\":\"arxiv-2408.04549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Altruistic cooperation is costly yet socially desirable. As a result, agents\\nstruggle to learn cooperative policies through independent reinforcement\\nlearning (RL). Indirect reciprocity, where agents consider their interaction\\npartner's reputation, has been shown to stabilise cooperation in homogeneous,\\nidealised populations. However, more realistic settings are comprised of\\nheterogeneous agents with different characteristics and group-based social\\nidentities. We study cooperation when agents are stratified into two such\\ngroups, and allow reputation updates and actions to depend on group\\ninformation. We consider two modelling approaches: evolutionary game theory,\\nwhere we comprehensively search for social norms (i.e., rules to assign\\nreputations) leading to cooperation and fairness; and RL, where we consider how\\nthe stochastic dynamics of policy learning affects the analytically identified\\nequilibria. We observe that a defecting majority leads the minority group to\\ndefect, but not the inverse. Moreover, changing the norms that judge in and\\nout-group interactions can steer a system towards either fair or unfair\\ncooperation. This is made clearer when moving beyond equilibrium analysis to\\nindependent RL agents, where convergence to fair cooperation occurs with a\\nnarrower set of norms. Our results highlight that, in heterogeneous populations\\nwith reputations, carefully defining interaction norms is fundamental to tackle\\nboth dilemmas of cooperation and of fairness.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.04549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

利他主义合作代价高昂,但却符合社会需求。因此,行为主体很难通过独立的强化学习(RL)来学习合作政策。间接互惠,即代理考虑其互动伙伴的声誉,已被证明能稳定同质理想化群体中的合作。然而,更现实的环境是由具有不同特征和基于群体的社会身份的异质代理组成的。我们研究了当代理分层为两个这样的群体时的合作,并允许声誉更新和行动取决于群体信息。我们考虑了两种建模方法:一是进化博弈论,即全面寻找导致合作与公平的社会规范(即分配声誉的规则);二是 RL,即考虑政策学习的随机动态如何影响分析确定的均衡。我们观察到,多数人的叛变会导致少数人的叛变,但反之不会。此外,改变判断群体内和群体外互动的准则,可以引导系统走向公平或不公平的合作。当超越均衡分析,转而分析独立的 RL 代理时,这一点就变得更加清晰了。我们的研究结果突出表明,在有声誉的异质群体中,仔细定义互动规范是解决合作和公平两难问题的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learning Fair Cooperation in Mixed-Motive Games with Indirect Reciprocity
Altruistic cooperation is costly yet socially desirable. As a result, agents struggle to learn cooperative policies through independent reinforcement learning (RL). Indirect reciprocity, where agents consider their interaction partner's reputation, has been shown to stabilise cooperation in homogeneous, idealised populations. However, more realistic settings are comprised of heterogeneous agents with different characteristics and group-based social identities. We study cooperation when agents are stratified into two such groups, and allow reputation updates and actions to depend on group information. We consider two modelling approaches: evolutionary game theory, where we comprehensively search for social norms (i.e., rules to assign reputations) leading to cooperation and fairness; and RL, where we consider how the stochastic dynamics of policy learning affects the analytically identified equilibria. We observe that a defecting majority leads the minority group to defect, but not the inverse. Moreover, changing the norms that judge in and out-group interactions can steer a system towards either fair or unfair cooperation. This is made clearer when moving beyond equilibrium analysis to independent RL agents, where convergence to fair cooperation occurs with a narrower set of norms. Our results highlight that, in heterogeneous populations with reputations, carefully defining interaction norms is fundamental to tackle both dilemmas of cooperation and of fairness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1