首页 > 最新文献

Adaptive Agents and Multi-Agent Systems最新文献

英文 中文
Discovering Consistent Subelections 发现一致的子选择
Pub Date : 2024-07-26 DOI: 10.5555/3635637.3662948
Lukasz Janeczko, Jérôme Lang, Grzegorz Lisowski, Stanislaw Szufa
We show how hidden interesting subelections can be discovered in ordinal elections. An interesting subelection consists of a reasonably large set of voters and a reasonably large set of candidates such that the former have a consistent opinion about the latter. Consistency may take various forms but we focus on three: Identity (all selected voters rank all selected candidates the same way), antagonism (half of the selected voters rank candidates in some order and the other half in the reverse order), and clones (all selected voters rank all selected candidates contiguously in the original election). We first study the computation of such hidden subelections. Second, we analyze synthetic and real-life data, and find that identifying hidden consistent subelections allows us to uncover some relevant concepts.
我们展示了如何在序数选举中发现隐藏的有趣子选举。一个有趣的子选举由一组数量相当大的选民和一组数量相当大的候选人组成,前者对后者有一致的看法。一致性可以有多种形式,但我们主要关注三种形式:同一性(所有被选投票人对所有被选候选人的排序相同)、对立性(一半被选投票人对候选人的排序按某种顺序排列,另一半按相反顺序排列)和克隆性(所有被选投票人对所有被选候选人的排序在原始选举中是连续的)。我们首先研究这种隐藏子选举的计算方法。其次,我们分析了合成数据和真实数据,发现识别隐藏的一致子选择可以让我们发现一些相关概念。
{"title":"Discovering Consistent Subelections","authors":"Lukasz Janeczko, Jérôme Lang, Grzegorz Lisowski, Stanislaw Szufa","doi":"10.5555/3635637.3662948","DOIUrl":"https://doi.org/10.5555/3635637.3662948","url":null,"abstract":"We show how hidden interesting subelections can be discovered in ordinal elections. An interesting subelection consists of a reasonably large set of voters and a reasonably large set of candidates such that the former have a consistent opinion about the latter. Consistency may take various forms but we focus on three: Identity (all selected voters rank all selected candidates the same way), antagonism (half of the selected voters rank candidates in some order and the other half in the reverse order), and clones (all selected voters rank all selected candidates contiguously in the original election). We first study the computation of such hidden subelections. Second, we analyze synthetic and real-life data, and find that identifying hidden consistent subelections allows us to uncover some relevant concepts.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141799344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategic Cost Selection in Participatory Budgeting 参与式预算编制中的战略成本选择
Pub Date : 2024-07-25 DOI: 10.5555/3635637.3663125
Piotr Faliszewski, Lukasz Janeczko, Andrzej Kaczmarczyk, Grzegorz Lisowski, P. Skowron, Stanislaw Szufa
We study strategic behavior of project proposers in the context of approval-based participatory budgeting (PB). In our model we assume that the votes are fixed and known and the proposers want to set as high project prices as possible, provided that their projects get selected and the prices are not below the minimum costs of their delivery. We study the existence of pure Nash equilibria (NE) in such games, focusing on the AV/Cost, Phragm'en, and Method of Equal Shares rules. Furthermore, we report an experimental study of strategic cost selection on real-life PB election data.
我们研究的是基于批准的参与式预算(PB)中项目提案人的战略行为。在我们的模型中,我们假设票数是固定且已知的,提议者希望尽可能设定高的项目价格,前提是他们的项目被选中,且价格不低于其交付的最低成本。我们研究了此类博弈中纯纳什均衡(NE)的存在性,重点研究了AV/Cost、Phragm/'en 和均分法规则。此外,我们还报告了在真实的 PB 选举数据上对策略性成本选择的实验研究。
{"title":"Strategic Cost Selection in Participatory Budgeting","authors":"Piotr Faliszewski, Lukasz Janeczko, Andrzej Kaczmarczyk, Grzegorz Lisowski, P. Skowron, Stanislaw Szufa","doi":"10.5555/3635637.3663125","DOIUrl":"https://doi.org/10.5555/3635637.3663125","url":null,"abstract":"We study strategic behavior of project proposers in the context of approval-based participatory budgeting (PB). In our model we assume that the votes are fixed and known and the proposers want to set as high project prices as possible, provided that their projects get selected and the prices are not below the minimum costs of their delivery. We study the existence of pure Nash equilibria (NE) in such games, focusing on the AV/Cost, Phragm'en, and Method of Equal Shares rules. Furthermore, we report an experimental study of strategic cost selection on real-life PB election data.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141804386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Minimizing State Exploration While Searching Graphs with Unknown Obstacles 在搜索具有未知障碍的图时尽量减少状态探索
Pub Date : 2024-06-01 DOI: 10.5555/3635637.3662959
Daniel Koyfman, Shahaf S. Shperberg, Dor Atzmon, Ariel Felner
We address the challenge of finding a shortest path in a graph with unknown obstacles where the exploration cost to detect whether a state is free or blocked is very high (e.g., due to sensor activation for obstacle detection). The main objective is to solve the problem while minimizing the number of explorations. To achieve this, we propose MXA∗, a novel heuristic search algorithm based on A∗. The key innovation in MXA∗ lies in modifying the heuristic calculation to avoid obstacles that have already been revealed. Furthermore, this paper makes a noteworthy contribution by introducing the concept of a dynamic heuristic. In contrast to the conventional static heuristic, a dynamic heuristic leverages information that emerges during the search process and adapts its estimations accordingly. By employing a dynamic heuristic, we suggest enhancements to MXA∗ based on real-time information obtained from both the open and closed lists. We demonstrate empirically that MXA∗ finds the shortest path while significantly reducing the number of explored states compared to traditional A∗. The code is available at https: //github.com/bernuly1/MXA-Star.
我们要解决的难题是,如何在具有未知障碍物的图中找到一条最短路径,而在这种情况下,检测一个状态是空闲还是受阻的探索成本非常高(例如,由于传感器激活进行障碍物检测)。主要目标是在解决问题的同时尽量减少探索次数。为此,我们提出了基于 A∗ 的新型启发式搜索算法 MXA∗。MXA∗ 的关键创新在于修改启发式计算,以避开已经暴露的障碍物。此外,本文通过引入动态启发式的概念做出了值得注意的贡献。与传统的静态启发式不同,动态启发式利用搜索过程中出现的信息,并相应地调整其估计值。通过采用动态启发式,我们根据从开放和关闭列表中获得的实时信息,对 MXA∗ 提出了改进建议。我们通过经验证明,与传统的 A∗ 相比,MXA∗ 能找到最短路径,同时显著减少探索状态的数量。代码见 https://github.com/bernuly1/MXA-Star。
{"title":"Minimizing State Exploration While Searching Graphs with Unknown Obstacles","authors":"Daniel Koyfman, Shahaf S. Shperberg, Dor Atzmon, Ariel Felner","doi":"10.5555/3635637.3662959","DOIUrl":"https://doi.org/10.5555/3635637.3662959","url":null,"abstract":"We address the challenge of finding a shortest path in a graph with unknown obstacles where the exploration cost to detect whether a state is free or blocked is very high (e.g., due to sensor activation for obstacle detection). The main objective is to solve the problem while minimizing the number of explorations. To achieve this, we propose MXA∗, a novel heuristic search algorithm based on A∗. The key innovation in MXA∗ lies in modifying the heuristic calculation to avoid obstacles that have already been revealed. Furthermore, this paper makes a noteworthy contribution by introducing the concept of a dynamic heuristic. In contrast to the conventional static heuristic, a dynamic heuristic leverages information that emerges during the search process and adapts its estimations accordingly. By employing a dynamic heuristic, we suggest enhancements to MXA∗ based on real-time information obtained from both the open and closed lists. We demonstrate empirically that MXA∗ finds the shortest path while significantly reducing the number of explored states compared to traditional A∗. The code is available at https: //github.com/bernuly1/MXA-Star.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141232947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
vMFER: von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement of Actor-Critic Algorithms vMFER:基于梯度方向不确定性的冯-米塞斯-费舍尔经验重采样,用于行动者批判算法的政策改进
Pub Date : 2024-05-14 DOI: 10.5555/3635637.3663247
Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan
Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations -- policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL.
强化学习(RL)是一种广泛应用于决策问题的技术,包括两个基本操作--政策评估和政策改进。提高学习效率仍然是强化学习的一个关键挑战,许多人致力于使用集合批判来提高政策评估效率。然而,当使用多个批判者时,政策改进过程中的行为者会获得不同的梯度。以往的研究都是将这些梯度合并起来,而不考虑它们之间的分歧。因此,优化政策改进过程对于提高学习效率至关重要。本研究重点研究了由集合批评者引起的梯度分歧对政策改进的影响。我们引入了梯度方向不确定性的概念,以此来衡量政策改进过程中梯度之间的分歧。通过测量梯度之间的分歧,我们发现梯度方向不确定性较低的过渡在政策改进过程中更加可靠。在这一分析的基础上,我们提出了一种名为 von Mises-Fisher 经验重采样(vMFER)的方法,通过对梯度方向不确定性较低的过渡进行重采样并赋予其更高的置信度来优化策略改进过程。我们的实验证明,vMFER 明显优于基准方法,尤其适用于 RL 中的集合结构。
{"title":"vMFER: von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement of Actor-Critic Algorithms","authors":"Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan","doi":"10.5555/3635637.3663247","DOIUrl":"https://doi.org/10.5555/3635637.3663247","url":null,"abstract":"Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations -- policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140978793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement Nash Equilibrium Solver 强化纳什均衡求解器
Pub Date : 2024-05-06 DOI: 10.5555/3635637.3663224
Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An
Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as"inexact solvers", or"solvers"for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.
纳什均衡(NE)是博弈论的典型解概念,它为理解理性提供了一个优雅的工具。虽然混合策略 NE 存在于任何具有有限玩家和行动的博弈中,但在双人或多人泛和博弈中计算 NE 是 PPAD-Complete 的。人们提出了各种替代方案(如相关均衡(CE))和学习方法(如虚构博弈(FP))来逼近近净。为方便起见,我们称这些方法为 "非精确求解器",简称 "求解器"。然而,替代解与 NE 存在差异,学习方法通常无法收敛到 NE。因此,在这项工作中,我们提出了 "强化纳什均衡求解器"(REinforcement Nash Equilibrium Solver,RENES),它通过训练单一策略来修改不同大小的博弈,并在修改后的博弈中应用求解器,在原始博弈中评估所获得的解。i) 我们将博弈表示为 $alpha$-rank 响应图,并利用图神经网络(GNN)来处理作为输入的不同规模的博弈; ii) 我们使用张量分解,例如iii) 我们使用广泛使用的近端策略优化(PPO)来训练博弈的修改策略,并应用求解器来求解修改后的博弈,在原始博弈中对所获得的解进行评估。在大规模正态博弈中的广泛实验表明,我们的方法可以进一步提高不同求解器(即 $alpha$-rank、CE、FP 和 PRD)的近似近地策略,并且可以推广到未见过的博弈中。
{"title":"Reinforcement Nash Equilibrium Solver","authors":"Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An","doi":"10.5555/3635637.3663224","DOIUrl":"https://doi.org/10.5555/3635637.3663224","url":null,"abstract":"Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as\"inexact solvers\", or\"solvers\"for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141009450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Utility of External Agent Intention Predictor for Human-AI Coordination 论外部代理意图预测器对人机协调的效用
Pub Date : 2024-05-03 DOI: 10.5555/3635637.3663222
Chenxu Wang, Zilong Chen, Huaping Liu
Reaching a consensus on the team plans is vital to human-AI coordination. Although previous studies provide approaches through communications in various ways, it could still be hard to coordinate when the AI has no explainable plan to communicate. To cover this gap, we suggest incorporating external models to assist humans in understanding the intentions of AI agents. In this paper, we propose a two-stage paradigm that first trains a Theory of Mind (ToM) model from collected offline trajectories of the target agent, and utilizes the model in the process of human-AI collaboration by real-timely displaying the future action predictions of the target agent. Such a paradigm leaves the AI agent as a black box and thus is available for improving any agents. To test our paradigm, we further implement a transformer-based predictor as the ToM model and develop an extended online human-AI collaboration platform for experiments. The comprehensive experimental results verify that human-AI teams can achieve better performance with the help of our model. A user assessment attached to the experiment further demonstrates that our paradigm can significantly enhance the situational awareness of humans. Our study presents the potential to augment the ability of humans via external assistance in human-AI collaboration, which may further inspire future research.
就团队计划达成共识对于人类与人工智能的协调至关重要。虽然以往的研究提供了通过各种方式进行交流的方法,但当人工智能没有可解释的交流计划时,仍然很难进行协调。为了弥补这一缺陷,我们建议结合外部模型来帮助人类理解人工智能代理的意图。在本文中,我们提出了一个两阶段范式,首先从收集到的目标代理离线轨迹中训练出一个心智理论(ToM)模型,然后通过实时显示目标代理的未来行动预测,在人机协作过程中利用该模型。这种范式将人工智能代理作为一个黑盒,因此可用于改进任何代理。为了检验我们的范式,我们进一步实现了一个基于变压器的预测器作为 ToM 模型,并开发了一个扩展的在线人机协作平台进行实验。综合实验结果证实,在我们的模型帮助下,人类-人工智能团队可以取得更好的成绩。实验中附带的用户评估进一步证明,我们的范例可以显著增强人类的态势感知能力。我们的研究展示了在人类-人工智能协作中通过外部援助增强人类能力的潜力,这可能会进一步启发未来的研究。
{"title":"On the Utility of External Agent Intention Predictor for Human-AI Coordination","authors":"Chenxu Wang, Zilong Chen, Huaping Liu","doi":"10.5555/3635637.3663222","DOIUrl":"https://doi.org/10.5555/3635637.3663222","url":null,"abstract":"Reaching a consensus on the team plans is vital to human-AI coordination. Although previous studies provide approaches through communications in various ways, it could still be hard to coordinate when the AI has no explainable plan to communicate. To cover this gap, we suggest incorporating external models to assist humans in understanding the intentions of AI agents. In this paper, we propose a two-stage paradigm that first trains a Theory of Mind (ToM) model from collected offline trajectories of the target agent, and utilizes the model in the process of human-AI collaboration by real-timely displaying the future action predictions of the target agent. Such a paradigm leaves the AI agent as a black box and thus is available for improving any agents. To test our paradigm, we further implement a transformer-based predictor as the ToM model and develop an extended online human-AI collaboration platform for experiments. The comprehensive experimental results verify that human-AI teams can achieve better performance with the help of our model. A user assessment attached to the experiment further demonstrates that our paradigm can significantly enhance the situational awareness of humans. Our study presents the potential to augment the ability of humans via external assistance in human-AI collaboration, which may further inspire future research.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141016713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure MESA:通过利用状态-行动空间结构在多代理学习中进行合作元探索
Pub Date : 2024-05-01 DOI: 10.5555/3635637.3663073
Zhicheng Zhang, Yancheng Liang, Yi Wu, Fei Fang
Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to"cover"the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.
多代理强化学习(MARL)算法通常很难找到接近帕累托最优纳什均衡的策略,这主要是由于缺乏有效的探索。在奖励稀疏的环境中,由于策略学习中表现出的较大方差,这一问题更加严重。本文介绍了 MESA,一种用于多机器人合作学习的新型元探索方法。它首先从训练任务中识别出各代理的高回报联合状态-行动子空间,然后学习一系列不同的探索策略来 "覆盖 "该子空间,从而学会探索。这些经过训练的探索策略可与任何非策略 MARL 算法集成,用于测试时间任务。我们首先展示了 MESA 在多步骤矩阵博弈中的优势。此外,实验表明,利用学习到的探索策略,MESA 在多个多代理粒子环境和多代理 MuJoCo 环境中的稀疏奖励任务中取得了明显更好的性能,并表现出了在测试时间泛化到更具挑战性任务的能力。
{"title":"MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure","authors":"Zhicheng Zhang, Yancheng Liang, Yi Wu, Fei Fang","doi":"10.5555/3635637.3663073","DOIUrl":"https://doi.org/10.5555/3635637.3663073","url":null,"abstract":"Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to\"cover\"the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141044404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dual Role AoI-based Incentive Mechanism for HD map Crowdsourcing 基于双角色 AoI 的高清地图众包激励机制
Pub Date : 2024-05-01 DOI: 10.5555/3635637.3663230
Wentao Ye, Bo Liu, Yuan Luo, Jianwei Huang
A high-quality fresh high-definition (HD) map is vital in enhancing transportation efficiency and safety in autonomous driving. Vehicle-based crowdsourcing offers a promising approach for updating HD maps. However, recruiting crowdsourcing vehicles involves making the challenging tradeoff between the HD map freshness and recruitment costs. Existing studies on HD map crowdsourcing often (1) prioritize maximizing spatial coverage and (2) overlook the dual role of crowdsourcing vehicles in HD maps, as vehicles serve both as contributors and customers of HD maps. This motivates us to propose the Dual-Role Age of Information (AoI) based Incentive Mechanism (DRAIM) to address these issues. % Specifically, we propose the trajectory age of information, incorporating the expected AoI of the HD map and the trajectory, to quantify a vehicle's HD map usage utility, which is freshness- and trajectory-dependent. DRAIM aims to achieve the company's tradeoff between freshness and recruitment costs.
高质量的最新高清(HD)地图对于提高自动驾驶的运输效率和安全性至关重要。基于车辆的众包技术为更新高清地图提供了一种前景广阔的方法。然而,招募众包车辆需要在高清地图新鲜度和招募成本之间做出艰难的权衡。现有的高清地图众包研究往往(1)优先考虑最大化空间覆盖率,(2)忽略了众包车辆在高清地图中的双重角色,因为车辆既是高清地图的贡献者,也是高清地图的客户。这促使我们提出基于信息时代(AoI)的双重角色激励机制(DRAIM)来解决这些问题。具体来说,我们提出了轨迹信息年龄,结合高清地图和轨迹的预期 AoI,来量化车辆的高清地图使用效用,该效用与新鲜度和轨迹有关。DRAIM 旨在实现公司在新鲜度和招募成本之间的权衡。
{"title":"Dual Role AoI-based Incentive Mechanism for HD map Crowdsourcing","authors":"Wentao Ye, Bo Liu, Yuan Luo, Jianwei Huang","doi":"10.5555/3635637.3663230","DOIUrl":"https://doi.org/10.5555/3635637.3663230","url":null,"abstract":"A high-quality fresh high-definition (HD) map is vital in enhancing transportation efficiency and safety in autonomous driving. Vehicle-based crowdsourcing offers a promising approach for updating HD maps. However, recruiting crowdsourcing vehicles involves making the challenging tradeoff between the HD map freshness and recruitment costs. Existing studies on HD map crowdsourcing often (1) prioritize maximizing spatial coverage and (2) overlook the dual role of crowdsourcing vehicles in HD maps, as vehicles serve both as contributors and customers of HD maps. This motivates us to propose the Dual-Role Age of Information (AoI) based Incentive Mechanism (DRAIM) to address these issues. % Specifically, we propose the trajectory age of information, incorporating the expected AoI of the HD map and the trajectory, to quantify a vehicle's HD map usage utility, which is freshness- and trajectory-dependent. DRAIM aims to achieve the company's tradeoff between freshness and recruitment costs.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141025969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STV+Reductions: Towards Practical Verification of Strategic Ability Using Model Reductions STV+约简:利用模型约简实现战略能力的实际验证
Pub Date : 2023-10-27 DOI: 10.5555/3463952.3464232
Damian Kurpiewski, Witold Pazderski, W. Jamroga, Yan Kim
We present a substantially expanded version of our tool STV for strategy synthesis and verification of strategic abilities. The new version adds user-definable models and support for model reduction through partial order reduction and checking for bisimulation.
我们提出了我们的战略综合和战略能力验证工具STV的实质性扩展版本。新版本增加了用户可定义的模型,并通过偏阶约简和检查双仿真来支持模型缩减。
{"title":"STV+Reductions: Towards Practical Verification of Strategic Ability Using Model Reductions","authors":"Damian Kurpiewski, Witold Pazderski, W. Jamroga, Yan Kim","doi":"10.5555/3463952.3464232","DOIUrl":"https://doi.org/10.5555/3463952.3464232","url":null,"abstract":"We present a substantially expanded version of our tool STV for strategy synthesis and verification of strategic abilities. The new version adds user-definable models and support for model reduction through partial order reduction and checking for bisimulation.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126828967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods MCTS的形式尖锐匕首:使用数据聚合和形式方法的低延迟蒙特卡罗树搜索
Pub Date : 2023-08-15 DOI: 10.5555/3545946.3598783
D. Chakraborty, Damien Busatto-Gaston, Jean-François Raskin, G. Pérez
We study how to efficiently combine formal methods, Monte Carlo Tree Search (MCTS), and deep learning in order to produce high-quality receding horizon policies in large Markov Decision processes (MDPs). In particular, we use model-checking techniques to guide the MCTS algorithm in order to generate offline samples of high-quality decisions on a representative set of states of the MDP. Those samples can then be used to train a neural network that imitates the policy used to generate them. This neural network can either be used as a guide on a lower-latency MCTS online search, or alternatively be used as a full-fledged policy when minimal latency is required. We use statistical model checking to detect when additional samples are needed and to focus those additional samples on configurations where the learnt neural network policy differs from the (computationally-expensive) offline policy. We illustrate the use of our method on MDPs that model the Frozen Lake and Pac-Man environments -- two popular benchmarks to evaluate reinforcement-learning algorithms.
我们研究了如何有效地结合形式化方法,蒙特卡罗树搜索(MCTS)和深度学习,以便在大型马尔可夫决策过程(mdp)中产生高质量的后退地平线策略。特别是,我们使用模型检查技术来指导MCTS算法,以便在MDP的一组具有代表性的状态上生成高质量决策的离线样本。然后,这些样本可以用来训练一个神经网络,该网络模仿用于生成它们的策略。该神经网络既可以用作低延迟MCTS在线搜索的指南,也可以在需要最小延迟时用作成熟的策略。我们使用统计模型检查来检测何时需要额外的样本,并将这些额外的样本集中在学习到的神经网络策略不同于(计算昂贵的)离线策略的配置上。我们在模拟《Frozen Lake》和《Pac-Man》环境的mdp中说明了我们的方法的使用,这是评估强化学习算法的两个流行基准。
{"title":"Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods","authors":"D. Chakraborty, Damien Busatto-Gaston, Jean-François Raskin, G. Pérez","doi":"10.5555/3545946.3598783","DOIUrl":"https://doi.org/10.5555/3545946.3598783","url":null,"abstract":"We study how to efficiently combine formal methods, Monte Carlo Tree Search (MCTS), and deep learning in order to produce high-quality receding horizon policies in large Markov Decision processes (MDPs). In particular, we use model-checking techniques to guide the MCTS algorithm in order to generate offline samples of high-quality decisions on a representative set of states of the MDP. Those samples can then be used to train a neural network that imitates the policy used to generate them. This neural network can either be used as a guide on a lower-latency MCTS online search, or alternatively be used as a full-fledged policy when minimal latency is required. We use statistical model checking to detect when additional samples are needed and to focus those additional samples on configurations where the learnt neural network policy differs from the (computationally-expensive) offline policy. We illustrate the use of our method on MDPs that model the Frozen Lake and Pac-Man environments -- two popular benchmarks to evaluate reinforcement-learning algorithms.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127035829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Adaptive Agents and Multi-Agent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1