首页 > 最新文献

arXiv - CS - Multiagent Systems最新文献

英文 中文
Finite-Time Analysis of Asynchronous Multi-Agent TD Learning 异步多代理 TD 学习的有限时间分析
Pub Date : 2024-07-29 DOI: arxiv-2407.20441
Nicolò Dal Fabbro, Arman Adibi, Aritra Mitra, George J. Pappas
Recent research endeavours have theoretically shown the beneficial effect ofcooperation in multi-agent reinforcement learning (MARL). In a settinginvolving $N$ agents, this beneficial effect usually comes in the form of an$N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ -in the number of iterations required to reach a certain convergence precision.In this paper, we show for the first time that this speedup property also holdsfor a MARL framework subject to asynchronous delays in the local agents'updates. In particular, we consider a policy evaluation problem in whichmultiple agents cooperate to evaluate a common policy by communicating with acentral aggregator. In this setting, we study the finite-time convergence oftexttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD)learning algorithm in which agents' local TD update directions are subject toasynchronous bounded delays. Our main contribution is providing a finite-timeanalysis of texttt{AsyncMATD}, for which we establish a linear convergencespeedup while highlighting the effect of time-varying asynchronous delays onthe resulting convergence rate.
最近的研究工作从理论上证明了合作对多代理强化学习(MARL)的有利影响。在涉及 $N$ 代理的环境中,这种有利效应通常表现为 $N$ 倍的线性收敛加速,即达到一定收敛精度所需的迭代次数减少 - 与 $N$ 成比例。在本文中,我们首次证明了这种加速特性也适用于本地代理更新存在异步延迟的 MARL 框架。我们特别考虑了一个策略评估问题,在这个问题中,多个代理通过与中心聚合器通信,合作评估一个共同策略。在这种情况下,我们研究了一种异步多代理时差(TD)学习算法(texttt{AsyncMATD})的有限时间收敛性,在这种算法中,代理的本地 TD 更新方向受到异步有界延迟的影响。我们的主要贡献是提供了对texttt{AsyncMATD}的有限时间分析,我们建立了线性收敛加速,同时强调了时变异步延迟对收敛速率的影响。
{"title":"Finite-Time Analysis of Asynchronous Multi-Agent TD Learning","authors":"Nicolò Dal Fabbro, Arman Adibi, Aritra Mitra, George J. Pappas","doi":"arxiv-2407.20441","DOIUrl":"https://doi.org/arxiv-2407.20441","url":null,"abstract":"Recent research endeavours have theoretically shown the beneficial effect of\u0000cooperation in multi-agent reinforcement learning (MARL). In a setting\u0000involving $N$ agents, this beneficial effect usually comes in the form of an\u0000$N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ -\u0000in the number of iterations required to reach a certain convergence precision.\u0000In this paper, we show for the first time that this speedup property also holds\u0000for a MARL framework subject to asynchronous delays in the local agents'\u0000updates. In particular, we consider a policy evaluation problem in which\u0000multiple agents cooperate to evaluate a common policy by communicating with a\u0000central aggregator. In this setting, we study the finite-time convergence of\u0000texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD)\u0000learning algorithm in which agents' local TD update directions are subject to\u0000asynchronous bounded delays. Our main contribution is providing a finite-time\u0000analysis of texttt{AsyncMATD}, for which we establish a linear convergence\u0000speedup while highlighting the effect of time-varying asynchronous delays on\u0000the resulting convergence rate.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantum Computing and Neuromorphic Computing for Safe, Reliable, and explainable Multi-Agent Reinforcement Learning: Optimal Control in Autonomous Robotics 量子计算和神经形态计算用于安全、可靠和可解释的多代理强化学习:自主机器人技术中的最优控制
Pub Date : 2024-07-29 DOI: arxiv-2408.03884
Mazyar Taghavi
This paper investigates the utilization of Quantum Computing and NeuromorphicComputing for Safe, Reliable, and Explainable Multi_Agent ReinforcementLearning (MARL) in the context of optimal control in autonomous robotics. Theobjective was to address the challenges of optimizing the behavior ofautonomous agents while ensuring safety, reliability, and explainability.Quantum Computing techniques, including Quantum Approximate OptimizationAlgorithm (QAOA), were employed to efficiently explore large solution spacesand find approximate solutions to complex MARL problems. NeuromorphicComputing, inspired by the architecture of the human brain, provided paralleland distributed processing capabilities, which were leveraged to developintelligent and adaptive systems. The combination of these technologies heldthe potential to enhance the safety, reliability, and explainability of MARL inautonomous robotics. This research contributed to the advancement of autonomousrobotics by exploring cutting-edge technologies and their applications inmulti-agent systems. Codes and data are available.
本文以自主机器人技术中的最优控制为背景,研究了利用量子计算和神经形态计算实现安全、可靠和可解释的多代理强化学习(MARL)。量子计算技术,包括量子近似优化算法(QAOA),被用来高效地探索大型求解空间,并为复杂的MARL问题找到近似解。神经形态计算受到人脑结构的启发,提供了并行和分布式处理能力,可用于开发智能和自适应系统。这些技术的结合有望提高自主机器人技术中 MARL 的安全性、可靠性和可解释性。这项研究通过探索前沿技术及其在多代理系统中的应用,为自主机器人技术的发展做出了贡献。可提供代码和数据。
{"title":"Quantum Computing and Neuromorphic Computing for Safe, Reliable, and explainable Multi-Agent Reinforcement Learning: Optimal Control in Autonomous Robotics","authors":"Mazyar Taghavi","doi":"arxiv-2408.03884","DOIUrl":"https://doi.org/arxiv-2408.03884","url":null,"abstract":"This paper investigates the utilization of Quantum Computing and Neuromorphic\u0000Computing for Safe, Reliable, and Explainable Multi_Agent Reinforcement\u0000Learning (MARL) in the context of optimal control in autonomous robotics. The\u0000objective was to address the challenges of optimizing the behavior of\u0000autonomous agents while ensuring safety, reliability, and explainability.\u0000Quantum Computing techniques, including Quantum Approximate Optimization\u0000Algorithm (QAOA), were employed to efficiently explore large solution spaces\u0000and find approximate solutions to complex MARL problems. Neuromorphic\u0000Computing, inspired by the architecture of the human brain, provided parallel\u0000and distributed processing capabilities, which were leveraged to develop\u0000intelligent and adaptive systems. The combination of these technologies held\u0000the potential to enhance the safety, reliability, and explainability of MARL in\u0000autonomous robotics. This research contributed to the advancement of autonomous\u0000robotics by exploring cutting-edge technologies and their applications in\u0000multi-agent systems. Codes and data are available.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"374 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Eliminating Majority Illusion is Easy 消除多数幻觉很容易
Pub Date : 2024-07-29 DOI: arxiv-2407.20187
Jack Dippel, Max Dupré la Tour, April Niu, Sanjukta Roy, Adrian Vetta
Majority Illusion is a phenomenon in social networks wherein the decision bythe majority of the network is not the same as one's personal social circle'smajority, leading to an incorrect perception of the majority in a largenetwork. In this paper, we present polynomial-time algorithms which caneliminate majority illusion in a network by altering as few connections aspossible. Additionally, we prove that the more general problem of ensuring allneighbourhoods in the network are at least a $p$-fraction of the majority isNP-hard for most values of $p$.
多数幻觉是社交网络中的一种现象,即网络中多数人的决定与个人社交圈中多数人的决定不一致,从而导致对大型网络中多数人的错误认知。在本文中,我们提出了多项式时间算法,可以通过尽可能少地改变连接来消除网络中的多数人错觉。此外,我们还证明了一个更普遍的问题,即确保网络中的所有相邻关系至少是多数人的 $p$ 分数,对于大多数 $p$ 值来说都是 NP-困难的。
{"title":"Eliminating Majority Illusion is Easy","authors":"Jack Dippel, Max Dupré la Tour, April Niu, Sanjukta Roy, Adrian Vetta","doi":"arxiv-2407.20187","DOIUrl":"https://doi.org/arxiv-2407.20187","url":null,"abstract":"Majority Illusion is a phenomenon in social networks wherein the decision by\u0000the majority of the network is not the same as one's personal social circle's\u0000majority, leading to an incorrect perception of the majority in a large\u0000network. In this paper, we present polynomial-time algorithms which can\u0000eliminate majority illusion in a network by altering as few connections as\u0000possible. Additionally, we prove that the more general problem of ensuring all\u0000neighbourhoods in the network are at least a $p$-fraction of the majority is\u0000NP-hard for most values of $p$.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigation services amplify concentration of traffic and emissions in our cities 导航服务扩大了城市交通和排放的集中度
Pub Date : 2024-07-29 DOI: arxiv-2407.20004
Giuliano Cornacchia, Mirco Nanni, Dino Pedreschi, Luca Pappalardo
The proliferation of human-AI ecosystems involving human interaction withalgorithms, such as assistants and recommenders, raises concerns aboutlarge-scale social behaviour. Despite evidence of such phenomena across severalcontexts, the collective impact of GPS navigation services remains unclear:while beneficial to the user, they can also cause chaos if too many vehiclesare driven through the same few roads. Our study employs a simulation frameworkto assess navigation services' influence on road network usage and CO2emissions. The results demonstrate a universal pattern of amplified conformity:increasing adoption rates of navigation services cause a reduction of routediversity of mobile travellers and increased concentration of traffic andemissions on fewer roads, thus exacerbating an unequal distribution of negativeexternalities on selected neighbourhoods. Although navigation servicesrecommendations can help reduce CO2 emissions when their adoption rate is low,these benefits diminish or even disappear when the adoption rate is high andexceeds a certain city- and service-dependent threshold. We summarize thesediscoveries in a non-linear function that connects the marginal increase ofconformity with the marginal reduction in CO2 emissions. Our simulationapproach addresses the challenges posed by the complexity of transportationsystems and the lack of data and algorithmic transparency.
人机交互生态系统(如助手和推荐器)的激增引起了人们对大规模社会行为的担忧。尽管有证据表明在多种情况下存在这种现象,但 GPS 导航服务的集体影响仍不明确:虽然对用户有利,但如果太多车辆驶过同几条道路,也会造成混乱。我们的研究采用了一个模拟框架来评估导航服务对道路网络使用和二氧化碳排放的影响。研究结果表明了一种普遍的 "一致性放大 "模式:导航服务的采用率不断提高,导致移动旅行者的路线多样性减少,交通量和排放量集中在更少的道路上,从而加剧了选定街区的负外部性的不平等分布。虽然导航服务建议在采用率较低时有助于减少二氧化碳排放,但当采用率较高并超过一定的城市和服务阈值时,这些益处就会减少甚至消失。我们将这些发现总结为一个非线性函数,该函数将一致性的边际增加与二氧化碳排放量的边际减少联系起来。我们的模拟方法解决了交通系统的复杂性以及缺乏数据和算法透明度所带来的挑战。
{"title":"Navigation services amplify concentration of traffic and emissions in our cities","authors":"Giuliano Cornacchia, Mirco Nanni, Dino Pedreschi, Luca Pappalardo","doi":"arxiv-2407.20004","DOIUrl":"https://doi.org/arxiv-2407.20004","url":null,"abstract":"The proliferation of human-AI ecosystems involving human interaction with\u0000algorithms, such as assistants and recommenders, raises concerns about\u0000large-scale social behaviour. Despite evidence of such phenomena across several\u0000contexts, the collective impact of GPS navigation services remains unclear:\u0000while beneficial to the user, they can also cause chaos if too many vehicles\u0000are driven through the same few roads. Our study employs a simulation framework\u0000to assess navigation services' influence on road network usage and CO2\u0000emissions. The results demonstrate a universal pattern of amplified conformity:\u0000increasing adoption rates of navigation services cause a reduction of route\u0000diversity of mobile travellers and increased concentration of traffic and\u0000emissions on fewer roads, thus exacerbating an unequal distribution of negative\u0000externalities on selected neighbourhoods. Although navigation services\u0000recommendations can help reduce CO2 emissions when their adoption rate is low,\u0000these benefits diminish or even disappear when the adoption rate is high and\u0000exceeds a certain city- and service-dependent threshold. We summarize these\u0000discoveries in a non-linear function that connects the marginal increase of\u0000conformity with the marginal reduction in CO2 emissions. Our simulation\u0000approach addresses the challenges posed by the complexity of transportation\u0000systems and the lack of data and algorithmic transparency.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mechanism Design for Locating Facilities with Capacities with Insufficient Resources 在资源不足的情况下为有能力的设施选址的机制设计
Pub Date : 2024-07-26 DOI: arxiv-2407.18547
Gennaro Auricchio, Harry J. Clough, Jie Zhang
This paper explores the Mechanism Design aspects of the $m$-CapacitatedFacility Location Problem where the total facility capacity is less than thenumber of agents. Following the framework outlined by Aziz et al., the SocialWelfare of the facility location is determined through aFirst-Come-First-Served (FCFS) game, in which agents compete once the facilitypositions are established. When the number of facilities is $m > 1$, the NashEquilibrium (NE) of the FCFS game is not unique, making the utility of theagents and the concept of truthfulness unclear. To tackle these issues, weconsider absolutely truthful mechanisms, i.e. mechanisms that prevent agentsfrom misreporting regardless of the strategies used during the FCFS game. Wecombine this stricter truthfulness requirement with the notion of EquilibriumStable (ES) mechanisms, which are mechanisms whose Social Welfare does notdepend on the NE of the FCFS game. We demonstrate that the class of percentilemechanisms is absolutely truthful and identify the conditions under which theyare ES. We also show that the approximation ratio of each ES percentilemechanism is bounded and determine its value. Notably, when all the facilitieshave the same capacity and the number of agents is sufficiently large, it ispossible to achieve an approximation ratio smaller than $1+frac{1}{2m-1}$.Finally, we extend our study to encompass higher-dimensional problems. Withinthis framework, we demonstrate that the class of ES percentile mechanisms iseven more restricted and characterize the mechanisms that are both ES andabsolutely truthful. We further support our findings by empirically evaluatingthe performance of the mechanisms when the agents are the samples of adistribution.
本文探讨了总设施容量小于代理数量的 $m$ 有能力设施位置问题的机制设计问题。按照 Aziz 等人概述的框架,设施位置的社会福利通过先到先得(FCFS)博弈来确定,在博弈中,一旦设施位置确定,代理就会展开竞争。当设施数量为 $m > 1$ 时,FCFS 博弈的纳什均衡(NE)并不是唯一的,这使得代理的效用和真实性的概念变得不明确。为了解决这些问题,我们考虑了绝对真实的机制,即无论在 FCFS 博弈过程中使用何种策略,都能防止代理人误报的机制。我们将这一更严格的真实性要求与均衡稳定(ES)机制的概念相结合,后者是指社会福利不依赖于 FCFS 博弈的 NE 的机制。我们证明了百分数机制是绝对真实的,并确定了它们成为 ES 机制的条件。我们还证明了每种 ES 百分比机制的近似率都是有界的,并确定了其值。值得注意的是,当所有设施的容量相同且代理数量足够大时,有可能实现小于 1+frac{1}{2m-1}$ 的近似率。最后,我们将研究扩展到了高维问题。在这个框架下,我们证明了 ES 百分位机制的类别甚至受到了更大的限制,并描述了既是 ES 又是绝对真实的机制的特征。我们还通过实证评估了代理人作为分布样本时的机制性能,进一步支持了我们的发现。
{"title":"Mechanism Design for Locating Facilities with Capacities with Insufficient Resources","authors":"Gennaro Auricchio, Harry J. Clough, Jie Zhang","doi":"arxiv-2407.18547","DOIUrl":"https://doi.org/arxiv-2407.18547","url":null,"abstract":"This paper explores the Mechanism Design aspects of the $m$-Capacitated\u0000Facility Location Problem where the total facility capacity is less than the\u0000number of agents. Following the framework outlined by Aziz et al., the Social\u0000Welfare of the facility location is determined through a\u0000First-Come-First-Served (FCFS) game, in which agents compete once the facility\u0000positions are established. When the number of facilities is $m > 1$, the Nash\u0000Equilibrium (NE) of the FCFS game is not unique, making the utility of the\u0000agents and the concept of truthfulness unclear. To tackle these issues, we\u0000consider absolutely truthful mechanisms, i.e. mechanisms that prevent agents\u0000from misreporting regardless of the strategies used during the FCFS game. We\u0000combine this stricter truthfulness requirement with the notion of Equilibrium\u0000Stable (ES) mechanisms, which are mechanisms whose Social Welfare does not\u0000depend on the NE of the FCFS game. We demonstrate that the class of percentile\u0000mechanisms is absolutely truthful and identify the conditions under which they\u0000are ES. We also show that the approximation ratio of each ES percentile\u0000mechanism is bounded and determine its value. Notably, when all the facilities\u0000have the same capacity and the number of agents is sufficiently large, it is\u0000possible to achieve an approximation ratio smaller than $1+frac{1}{2m-1}$.\u0000Finally, we extend our study to encompass higher-dimensional problems. Within\u0000this framework, we demonstrate that the class of ES percentile mechanisms is\u0000even more restricted and characterize the mechanisms that are both ES and\u0000absolutely truthful. We further support our findings by empirically evaluating\u0000the performance of the mechanisms when the agents are the samples of a\u0000distribution.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Socially efficient mechanism on the minimum budget 最低预算的社会效率机制
Pub Date : 2024-07-26 DOI: arxiv-2407.18515
Hirota Kinoshita, Takayuki Osogami, Kohei Miyaguchi
In social decision-making among strategic agents, a universal focus lies onthe balance between social and individual interests. Socially efficientmechanisms are thus desirably designed to not only maximize the social welfarebut also incentivize the agents for their own profit. Under a generalized modelthat includes applications such as double auctions and trading networks, thisstudy establishes a socially efficient (SE), dominant-strategy incentivecompatible (DSIC), and individually rational (IR) mechanism with the minimumtotal budget expensed to the agents. The present method exploits discrete andknown type domains to reduce a set of constraints into the shortest pathproblem in a weighted graph. In addition to theoretical derivation, wesubstantiate the optimality of the proposed mechanism through numericalexperiments, where it certifies strictly lower budget thanVickery-Clarke-Groves (VCG) mechanisms for a wide class of instances.
在战略主体的社会决策中,普遍关注的焦点是社会利益与个人利益之间的平衡。因此,社会有效机制的设计不仅要使社会福利最大化,还要激励代理人为自己谋利。本研究在一个包括双重拍卖和交易网络等应用的广义模型下,建立了一个社会有效(SE)、主导策略激励兼容(DSIC)和个体理性(IR)的机制,并将最小总预算分配给代理人。本方法利用离散和已知类型域,将一组约束条件简化为加权图中的最短路径问题。除了理论推导外,我们还通过数值实验证明了所提机制的最优性,在大量实例中,该机制的预算严格低于维克里-克拉克-格罗夫斯(VCG)机制。
{"title":"Socially efficient mechanism on the minimum budget","authors":"Hirota Kinoshita, Takayuki Osogami, Kohei Miyaguchi","doi":"arxiv-2407.18515","DOIUrl":"https://doi.org/arxiv-2407.18515","url":null,"abstract":"In social decision-making among strategic agents, a universal focus lies on\u0000the balance between social and individual interests. Socially efficient\u0000mechanisms are thus desirably designed to not only maximize the social welfare\u0000but also incentivize the agents for their own profit. Under a generalized model\u0000that includes applications such as double auctions and trading networks, this\u0000study establishes a socially efficient (SE), dominant-strategy incentive\u0000compatible (DSIC), and individually rational (IR) mechanism with the minimum\u0000total budget expensed to the agents. The present method exploits discrete and\u0000known type domains to reduce a set of constraints into the shortest path\u0000problem in a weighted graph. In addition to theoretical derivation, we\u0000substantiate the optimality of the proposed mechanism through numerical\u0000experiments, where it certifies strictly lower budget than\u0000Vickery-Clarke-Groves (VCG) mechanisms for a wide class of instances.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal-Agent Reinforcement Learning 主代理强化学习
Pub Date : 2024-07-25 DOI: arxiv-2407.18074
Dima Ivanov, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C. Parkes
Contracts are the economic framework which allows a principal to delegate atask to an agent -- despite misaligned interests, and even without directlyobserving the agent's actions. In many modern reinforcement learning settings,self-interested agents learn to perform a multi-stage task delegated to them bya principal. We explore the significant potential of utilizing contracts toincentivize the agents. We model the delegated task as an MDP, and study astochastic game between the principal and agent where the principal learns whatcontracts to use, and the agent learns an MDP policy in response. We present alearning-based algorithm for optimizing the principal's contracts, whichprovably converges to the subgame-perfect equilibrium of the principal-agentgame. A deep RL implementation allows us to apply our method to very large MDPswith unknown transition dynamics. We extend our approach to multiple agents,and demonstrate its relevance to resolving a canonical sequential socialdilemma with minimal intervention to agent rewards.
契约是一种经济框架,它允许委托人将任务委托给代理人--尽管利益不一致,甚至不需要直接观察代理人的行动。在许多现代强化学习环境中,自利的代理学会执行委托人委托给他们的多阶段任务。我们探索了利用合同激励代理的巨大潜力。我们将委托任务建模为一个 MDP,并研究委托人和代理人之间的弹性博弈,在这种博弈中,委托人学习使用什么合约,而代理人则学习 MDP 策略作为回应。我们提出了一种基于学习的算法来优化委托人的合约,这种算法可以收敛到委托人与代理人博弈的子博弈完美均衡。通过深度 RL 实现,我们可以将我们的方法应用于具有未知过渡动态的超大型 MDP。我们将方法扩展到了多代理,并证明了它在解决典型的顺序社会难题时的相关性,同时将对代理奖励的干预降到了最低。
{"title":"Principal-Agent Reinforcement Learning","authors":"Dima Ivanov, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C. Parkes","doi":"arxiv-2407.18074","DOIUrl":"https://doi.org/arxiv-2407.18074","url":null,"abstract":"Contracts are the economic framework which allows a principal to delegate a\u0000task to an agent -- despite misaligned interests, and even without directly\u0000observing the agent's actions. In many modern reinforcement learning settings,\u0000self-interested agents learn to perform a multi-stage task delegated to them by\u0000a principal. We explore the significant potential of utilizing contracts to\u0000incentivize the agents. We model the delegated task as an MDP, and study a\u0000stochastic game between the principal and agent where the principal learns what\u0000contracts to use, and the agent learns an MDP policy in response. We present a\u0000learning-based algorithm for optimizing the principal's contracts, which\u0000provably converges to the subgame-perfect equilibrium of the principal-agent\u0000game. A deep RL implementation allows us to apply our method to very large MDPs\u0000with unknown transition dynamics. We extend our approach to multiple agents,\u0000and demonstrate its relevance to resolving a canonical sequential social\u0000dilemma with minimal intervention to agent rewards.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Very Large-Scale Multi-Agent Simulation in AgentScope AgentScope 中的超大规模多代理模拟
Pub Date : 2024-07-25 DOI: arxiv-2407.17789
Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou
Recent advances in large language models (LLMs) have opened new avenues forapplying multi-agent systems in very large-scale simulations. However, thereremain several challenges when conducting multi-agent simulations with existingplatforms, such as limited scalability and low efficiency, unsatisfied agentdiversity, and effort-intensive management processes. To address thesechallenges, we develop several new features and components for AgentScope, auser-friendly multi-agent platform, enhancing its convenience and flexibilityfor supporting very large-scale multi-agent simulations. Specifically, wepropose an actor-based distributed mechanism as the underlying technologicalinfrastructure towards great scalability and high efficiency, and provideflexible environment support for simulating various real-world scenarios, whichenables parallel execution of multiple agents, centralized workfloworchestration, and both inter-agent and agent-environment interactions amongagents. Moreover, we integrate an easy-to-use configurable tool and anautomatic background generation pipeline in AgentScope, simplifying the processof creating agents with diverse yet detailed background settings. Last but notleast, we provide a web-based interface for conveniently monitoring andmanaging a large number of agents that might deploy across multiple devices. Weconduct a comprehensive simulation to demonstrate the effectiveness of theproposed enhancements in AgentScope, and provide detailed observations anddiscussions to highlight the great potential of applying multi-agent systems inlarge-scale simulations. The source code is released on GitHub athttps://github.com/modelscope/agentscope to inspire further research anddevelopment in large-scale multi-agent simulations.
大型语言模型(LLM)的最新进展为在超大规模仿真中应用多代理系统开辟了新途径。然而,在使用现有平台进行多代理仿真时,仍然存在一些挑战,例如可扩展性有限、效率低下、代理多样性得不到满足以及管理过程耗费精力等。为了应对这些挑战,我们为用户友好型多代理平台 AgentScope 开发了一些新功能和组件,增强了其支持超大规模多代理仿真的便利性和灵活性。具体来说,我们提出了一种基于代理的分布式机制作为底层技术基础设施,以实现极高的可扩展性和效率,并为模拟各种真实世界场景提供灵活的环境支持,从而实现多个代理的并行执行、集中式工作流协调以及代理之间和代理与环境之间的交互。此外,我们还在 AgentScope 中集成了易于使用的可配置工具和自动背景生成管道,从而简化了创建具有各种详细背景设置的代理的过程。最后但并非最不重要的一点是,我们提供了一个基于网络的界面,可以方便地监控和管理可能跨多个设备部署的大量代理。我们进行了全面的仿真,以展示 AgentScope 中建议的增强功能的有效性,并提供详细的观察和讨论,以突出在大规模仿真中应用多代理系统的巨大潜力。源代码已在 GitHub 上发布,网址是:https://github.com/modelscope/agentscope,以激励在大规模多代理仿真方面的进一步研究和开发。
{"title":"Very Large-Scale Multi-Agent Simulation in AgentScope","authors":"Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou","doi":"arxiv-2407.17789","DOIUrl":"https://doi.org/arxiv-2407.17789","url":null,"abstract":"Recent advances in large language models (LLMs) have opened new avenues for\u0000applying multi-agent systems in very large-scale simulations. However, there\u0000remain several challenges when conducting multi-agent simulations with existing\u0000platforms, such as limited scalability and low efficiency, unsatisfied agent\u0000diversity, and effort-intensive management processes. To address these\u0000challenges, we develop several new features and components for AgentScope, a\u0000user-friendly multi-agent platform, enhancing its convenience and flexibility\u0000for supporting very large-scale multi-agent simulations. Specifically, we\u0000propose an actor-based distributed mechanism as the underlying technological\u0000infrastructure towards great scalability and high efficiency, and provide\u0000flexible environment support for simulating various real-world scenarios, which\u0000enables parallel execution of multiple agents, centralized workflow\u0000orchestration, and both inter-agent and agent-environment interactions among\u0000agents. Moreover, we integrate an easy-to-use configurable tool and an\u0000automatic background generation pipeline in AgentScope, simplifying the process\u0000of creating agents with diverse yet detailed background settings. Last but not\u0000least, we provide a web-based interface for conveniently monitoring and\u0000managing a large number of agents that might deploy across multiple devices. We\u0000conduct a comprehensive simulation to demonstrate the effectiveness of the\u0000proposed enhancements in AgentScope, and provide detailed observations and\u0000discussions to highlight the great potential of applying multi-agent systems in\u0000large-scale simulations. The source code is released on GitHub at\u0000https://github.com/modelscope/agentscope to inspire further research and\u0000development in large-scale multi-agent simulations.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategic Cost Selection in Participatory Budgeting 参与式预算编制中的战略成本选择
Pub Date : 2024-07-25 DOI: arxiv-2407.18092
Piotr Faliszewski, Łukasz Janeczko, Andrzej Kaczmarczyk, Grzegorz Lisowski, Piotr Skowron, Stanisław Szufa
We study strategic behavior of project proposers in the context ofapproval-based participatory budgeting (PB). In our model we assume that thevotes are fixed and known and the proposers want to set as high project pricesas possible, provided that their projects get selected and the prices are notbelow the minimum costs of their delivery. We study the existence of pure Nashequilibria (NE) in such games, focusing on the AV/Cost, Phragm'en, and Methodof Equal Shares rules. Furthermore, we report an experimental study ofstrategic cost selection on real-life PB election data.
我们研究了在基于审批的参与式预算(PB)背景下项目提案人的战略行为。在我们的模型中,我们假定投票是固定和已知的,提议者希望尽可能设定高的项目价格,前提是他们的项目被选中,且价格不低于其交付的最低成本。我们研究了此类博弈中纯粹纳什均衡(NE)的存在,重点研究了 AV/Cost、Phragm'en 和 Methodof Equal Shares 规则。此外,我们还报告了在真实的 PB 选举数据上对策略成本选择的实验研究。
{"title":"Strategic Cost Selection in Participatory Budgeting","authors":"Piotr Faliszewski, Łukasz Janeczko, Andrzej Kaczmarczyk, Grzegorz Lisowski, Piotr Skowron, Stanisław Szufa","doi":"arxiv-2407.18092","DOIUrl":"https://doi.org/arxiv-2407.18092","url":null,"abstract":"We study strategic behavior of project proposers in the context of\u0000approval-based participatory budgeting (PB). In our model we assume that the\u0000votes are fixed and known and the proposers want to set as high project prices\u0000as possible, provided that their projects get selected and the prices are not\u0000below the minimum costs of their delivery. We study the existence of pure Nash\u0000equilibria (NE) in such games, focusing on the AV/Cost, Phragm'en, and Method\u0000of Equal Shares rules. Furthermore, we report an experimental study of\u0000strategic cost selection on real-life PB election data.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic Games with Minimally Bounded Action Costs 行动成本最小有界的随机博弈
Pub Date : 2024-07-25 DOI: arxiv-2407.18010
David Mguni
In many multi-player interactions, players incur strictly positive costs eachtime they execute actions e.g. 'menu costs' or transaction costs in financialsystems. Since acting at each available opportunity would accumulateprohibitively large costs, the resulting decision problem is one in whichplayers must make strategic decisions about when to execute actions in additionto their choice of action. This paper analyses a discrete-time stochastic game(SG) in which players face minimally bounded positive costs for each action andinfluence the system using impulse controls. We prove SGs of two-sided impulsecontrol have a unique value and characterise the saddle point equilibrium inwhich the players execute actions at strategically chosen times in accordancewith Markovian strategies. We prove the game respects a dynamic programmingprinciple and that the Markov perfect equilibrium can be computed as a limitpoint of a sequence of Bellman operations. We then introduce a new Q-learningvariant which we show converges almost surely to the value of the game enablingsolutions to be extracted in unknown settings. Lastly, we extend our results tosettings with budgetory constraints.
在许多多玩家互动中,玩家每次采取行动都会产生严格的正成本,例如金融系统中的 "菜单成本 "或交易成本。由于在每个可利用的机会采取行动都会累积高得惊人的成本,因此由此产生的决策问题是,博弈者除了选择行动外,还必须就何时采取行动做出战略决策。本文分析了一种离散时间随机博弈(SG),在这种博弈中,博弈者的每次行动都会面临最小约束的正成本,并使用脉冲控制来影响系统。我们证明了双面脉冲控制的 SG 具有唯一值,并描述了鞍点均衡的特征,在鞍点均衡中,博弈方按照马尔可夫策略在策略选择的时间执行行动。我们证明博弈遵守动态编程原则,马尔可夫完美均衡可以作为贝尔曼运算序列的极限点来计算。然后,我们引入了一个新的 Q-learning 变量,并证明该变量几乎肯定收敛于博弈值,从而能在未知环境中提取解决方案。最后,我们将结果扩展到具有预算约束的设置。
{"title":"Stochastic Games with Minimally Bounded Action Costs","authors":"David Mguni","doi":"arxiv-2407.18010","DOIUrl":"https://doi.org/arxiv-2407.18010","url":null,"abstract":"In many multi-player interactions, players incur strictly positive costs each\u0000time they execute actions e.g. 'menu costs' or transaction costs in financial\u0000systems. Since acting at each available opportunity would accumulate\u0000prohibitively large costs, the resulting decision problem is one in which\u0000players must make strategic decisions about when to execute actions in addition\u0000to their choice of action. This paper analyses a discrete-time stochastic game\u0000(SG) in which players face minimally bounded positive costs for each action and\u0000influence the system using impulse controls. We prove SGs of two-sided impulse\u0000control have a unique value and characterise the saddle point equilibrium in\u0000which the players execute actions at strategically chosen times in accordance\u0000with Markovian strategies. We prove the game respects a dynamic programming\u0000principle and that the Markov perfect equilibrium can be computed as a limit\u0000point of a sequence of Bellman operations. We then introduce a new Q-learning\u0000variant which we show converges almost surely to the value of the game enabling\u0000solutions to be extracted in unknown settings. Lastly, we extend our results to\u0000settings with budgetory constraints.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"165 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multiagent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1