Adaptive Agents and Multi-Agent Systems最新文献

英文中文

The Swiss Gambit 瑞士策略

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-21 DOI: 10.48550/arXiv.2302.10595

Ágnes Cseh, Pascal Führlich, Pascal Lenzner

In each round of a Swiss-system tournament, players of similar score are paired against each other. An intentional early loss therefore might lead to weaker opponents in later rounds and thus to a better final tournament result - a phenomenon known as the Swiss Gambit. To the best of our knowledge it is an open question whether this strategy can actually work. This paper provides answers based on an empirical agent-based analysis for the most prominent application area of the Swiss-system format, namely chess tournaments. We simulate realistic tournaments by employing the official FIDE pairing system for computing the player pairings in each round. We show that even though gambits are widely possible in Swiss-system chess tournaments, profiting from them requires a high degree of predictability of match results. Moreover, even if a Swiss Gambit succeeds, the obtained improvement in the final ranking is limited. Our experiments prove that counting on a Swiss Gambit is indeed a lot more of a risky gambit than a reliable strategy to improve the final rank.

在瑞士系统的每一轮比赛中，得分相近的选手会配对比赛。因此，在比赛早期故意输球可能会导致对手在随后的几轮比赛中较弱，从而在最后的比赛中取得更好的成绩——这种现象被称为“瑞士策略”。据我们所知，这个策略是否真的有效还是一个悬而未决的问题。本文对瑞士系统格式最突出的应用领域，即国际象棋比赛，进行了基于经验主体的分析，给出了答案。我们通过使用官方的棋联配对系统来计算每一轮的棋手配对来模拟现实的比赛。我们表明，尽管开局在瑞士系统的国际象棋比赛中广泛存在，但从中获利需要对比赛结果有高度的可预测性。此外，即使瑞士Gambit成功了，最终排名的提高也是有限的。我们的实验证明，依靠瑞士棋确实是一个风险更大的棋，而不是提高最终排名的可靠策略。

引用次数: 2

Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning 基于心智推理理论的多智能体逆强化学习

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-20 DOI: 10.5555/3545946.3598703

Haochen Wu, Pedro Sequeira, D. Pynadath

We approach the problem of understanding how people interact with each other in collaborative settings, especially when individuals know little about their teammates, via Multiagent Inverse Reinforcement Learning (MIRL), where the goal is to infer the reward functions guiding the behavior of each individual given trajectories of a team's behavior during some task. Unlike current MIRL approaches, we do not assume that team members know each other's goals a priori; rather, that they collaborate by adapting to the goals of others perceived by observing their behavior, all while jointly performing a task. To address this problem, we propose a novel approach to MIRL via Theory of Mind (MIRL-ToM). For each agent, we first use ToM reasoning to estimate a posterior distribution over baseline reward profiles given their demonstrated behavior. We then perform MIRL via decentralized equilibrium by employing single-agent Maximum Entropy IRL to infer a reward function for each agent, where we simulate the behavior of other teammates according to the time-varying distribution over profiles. We evaluate our approach in a simulated 2-player search-and-rescue operation where the goal of the agents, playing different roles, is to search for and evacuate victims in the environment. Our results show that the choice of baseline profiles is paramount to the recovery of the ground-truth rewards, and that MIRL-ToM is able to recover the rewards used by agents interacting both with known and unknown teammates.

我们通过多智能体逆强化学习(MIRL)解决了理解人们在协作环境中如何相互作用的问题，特别是当个人对队友知之甚少时，其目标是推断奖励函数，指导团队在某些任务中给定行为轨迹的每个人的行为。与当前的MIRL方法不同，我们不假设团队成员先验地知道彼此的目标;相反，他们通过观察他人的行为来适应他人的目标，同时共同完成一项任务。为了解决这一问题，我们提出了一种基于心智理论(MIRL- tom)的新方法。对于每个智能体，我们首先使用ToM推理来估计基线奖励曲线的后验分布。然后，我们通过分散均衡执行MIRL，通过使用单智能体最大熵IRL来推断每个智能体的奖励函数，其中我们根据配置文件的时变分布模拟其他队友的行为。我们在一个模拟的双人搜救行动中评估了我们的方法，其中扮演不同角色的代理的目标是在环境中搜索和疏散受害者。我们的研究结果表明，基线概况的选择对于基础真相奖励的恢复至关重要，并且MIRL-ToM能够恢复与已知和未知队友交互的代理使用的奖励。

{"title":"Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning","authors":"Haochen Wu, Pedro Sequeira, D. Pynadath","doi":"10.5555/3545946.3598703","DOIUrl":"https://doi.org/10.5555/3545946.3598703","url":null,"abstract":"We approach the problem of understanding how people interact with each other in collaborative settings, especially when individuals know little about their teammates, via Multiagent Inverse Reinforcement Learning (MIRL), where the goal is to infer the reward functions guiding the behavior of each individual given trajectories of a team's behavior during some task. Unlike current MIRL approaches, we do not assume that team members know each other's goals a priori; rather, that they collaborate by adapting to the goals of others perceived by observing their behavior, all while jointly performing a task. To address this problem, we propose a novel approach to MIRL via Theory of Mind (MIRL-ToM). For each agent, we first use ToM reasoning to estimate a posterior distribution over baseline reward profiles given their demonstrated behavior. We then perform MIRL via decentralized equilibrium by employing single-agent Maximum Entropy IRL to infer a reward function for each agent, where we simulate the behavior of other teammates according to the time-varying distribution over profiles. We evaluate our approach in a simulated 2-player search-and-rescue operation where the goal of the agents, playing different roles, is to search for and evacuate victims in the environment. Our results show that the choice of baseline profiles is paramount to the recovery of the ground-truth rewards, and that MIRL-ToM is able to recover the rewards used by agents interacting both with known and unknown teammates.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130288836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Price of Anarchy in a Double-Sided Critical Distribution System 双面临界分配系统中无政府状态的代价

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-20 DOI: 10.48550/arXiv.2302.09959

David Sychrovsky, Jakub Černý, Sylvain Lichau, M. Loebl

Measures of allocation optimality differ significantly when distributing standard tradable goods in peaceful times and scarce resources in crises. While realistic markets offer asymptotic efficiency, they may not necessarily guarantee fair allocation desirable when distributing the critical resources. To achieve fairness, mechanisms often rely on a central authority, which may act inefficiently in times of need when swiftness and good organization are crucial. In this work, we study a hybrid trading system called Crisdis, introduced by Jedliv{c}kov'{a} et al., which combines fair allocation of buying rights with a market - leveraging the best of both worlds. A frustration of a buyer in Crisdis is defined as a difference between the amount of goods they are entitled to according to the assigned buying rights and the amount of goods they are able to acquire by trading. We define a Price of Anarchy (PoA) in this system as a conceptual analogue of the original definition in the context of frustration. Our main contribution is a study of PoA in realistic complex double-sided market mechanisms for Crisdis. The performed empirical analysis suggests that in contrast to market free of governmental interventions, the PoA in our system decreases.

在和平时期分配标准可贸易商品和在危机时期分配稀缺资源时，分配最优性的衡量标准有很大不同。虽然现实的市场提供渐近效率，但在分配关键资源时，它们不一定保证理想的公平分配。为了实现公平，机制往往依赖于一个中央权威机构，在需要的时候，当速度和良好的组织至关重要时，这个机构可能会效率低下。在这项工作中，我们研究了由Jedliv{c}kov {a}等人引入的一种名为Crisdis的混合交易系统，该系统将购买权的公平分配与市场结合起来，充分利用了两者的优点。在Crisdis中，买方的挫败感被定义为根据分配的购买权他们有权获得的货物数量与他们能够通过交易获得的货物数量之间的差异。在这个系统中，我们将无政府状态的代价(PoA)定义为挫败感背景下原始定义的概念类比。我们的主要贡献是在现实复杂的双边市场机制下对危机的PoA进行研究。实证分析表明，与没有政府干预的市场相比，我国体系的PoA有所下降。

{"title":"Price of Anarchy in a Double-Sided Critical Distribution System","authors":"David Sychrovsky, Jakub Černý, Sylvain Lichau, M. Loebl","doi":"10.48550/arXiv.2302.09959","DOIUrl":"https://doi.org/10.48550/arXiv.2302.09959","url":null,"abstract":"Measures of allocation optimality differ significantly when distributing standard tradable goods in peaceful times and scarce resources in crises. While realistic markets offer asymptotic efficiency, they may not necessarily guarantee fair allocation desirable when distributing the critical resources. To achieve fairness, mechanisms often rely on a central authority, which may act inefficiently in times of need when swiftness and good organization are crucial. In this work, we study a hybrid trading system called Crisdis, introduced by Jedliv{c}kov'{a} et al., which combines fair allocation of buying rights with a market - leveraging the best of both worlds. A frustration of a buyer in Crisdis is defined as a difference between the amount of goods they are entitled to according to the assigned buying rights and the amount of goods they are able to acquire by trading. We define a Price of Anarchy (PoA) in this system as a conceptual analogue of the original definition in the context of frustration. Our main contribution is a study of PoA in realistic complex double-sided market mechanisms for Crisdis. The performed empirical analysis suggests that in contrast to market free of governmental interventions, the PoA in our system decreases.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121637451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Co-evolution of Social and Non-Social Guilt 社会和非社会内疚的共同进化

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-20 DOI: 10.48550/arXiv.2302.09859

Theodor Cimpeanu, L. Pereira, H. Anh

Building ethical machines may involve bestowing upon them the emotional capacity to self-evaluate and repent on their actions. While reparative measures, such as apologies, are often considered as possible strategic interactions, the explicit evolution of the emotion of guilt as a behavioural phenotype is not yet well understood. Here, we study the co-evolution of social and non-social guilt of homogeneous or heterogeneous populations, including well-mixed, lattice and scale-free networks. Socially aware guilt comes at a cost, as it requires agents to make demanding efforts to observe and understand the internal state and behaviour of others, while non-social guilt only requires the awareness of the agents' own state and hence incurs no social cost. Those choosing to be non-social are however more sensitive to exploitation by other agents due to their social unawareness. Resorting to methods from evolutionary game theory, we study analytically, and through extensive numerical and agent-based simulations, whether and how such social and non-social guilt can evolve and deploy, depending on the underlying structure of the populations, or systems, of agents. The results show that, in both lattice and scale-free networks, emotional guilt prone strategies are dominant for a larger range of the guilt and social costs incurred, compared to the well-mixed population setting, leading therefore to significantly higher levels of cooperation for a wider range of the costs. In structured population settings, both social and non-social guilt can evolve and deploy through clustering with emotional prone strategies, allowing them to be protected from exploiters, especially in case of non-social (less costly) strategies. Overall, our findings provide important insights into the design and engineering of self-organised and distributed cooperative multi-agent systems.

建立道德机器可能需要赋予它们自我评价和对自己行为忏悔的情感能力。虽然道歉等补救措施通常被认为是一种可能的战略互动，但作为一种行为表型，内疚情绪的明确演变尚未得到很好的理解。在这里，我们研究了同质或异质群体的社会和非社会罪恶感的共同进化，包括混合良好的，晶格和无标度网络。社会意识内疚是有成本的，因为它需要行为人付出苛刻的努力来观察和理解他人的内部状态和行为，而非社会内疚只需要行为人意识到自己的状态，因此不会产生社会成本。然而，那些选择非社交的人，由于他们对社会的无知，对其他代理人的剥削更敏感。借助进化博弈论的方法，我们通过广泛的数值模拟和基于主体的模拟，分析研究这种社会和非社会内疚是否以及如何根据主体群体或系统的潜在结构进化和部署。结果表明，在格子网络和无标度网络中，与混合良好的群体设置相比，情感内疚倾向策略在更大范围的内疚和社会成本中占主导地位，因此导致在更大范围的成本中有更高水平的合作。在结构化的人口环境中，社会和非社会内疚都可以通过情感倾向策略的聚集而进化和部署，使他们免受剥削，特别是在非社会(成本较低)策略的情况下。总的来说，我们的发现为自组织和分布式合作多智能体系统的设计和工程提供了重要的见解。

{"title":"Co-evolution of Social and Non-Social Guilt","authors":"Theodor Cimpeanu, L. Pereira, H. Anh","doi":"10.48550/arXiv.2302.09859","DOIUrl":"https://doi.org/10.48550/arXiv.2302.09859","url":null,"abstract":"Building ethical machines may involve bestowing upon them the emotional capacity to self-evaluate and repent on their actions. While reparative measures, such as apologies, are often considered as possible strategic interactions, the explicit evolution of the emotion of guilt as a behavioural phenotype is not yet well understood. Here, we study the co-evolution of social and non-social guilt of homogeneous or heterogeneous populations, including well-mixed, lattice and scale-free networks. Socially aware guilt comes at a cost, as it requires agents to make demanding efforts to observe and understand the internal state and behaviour of others, while non-social guilt only requires the awareness of the agents' own state and hence incurs no social cost. Those choosing to be non-social are however more sensitive to exploitation by other agents due to their social unawareness. Resorting to methods from evolutionary game theory, we study analytically, and through extensive numerical and agent-based simulations, whether and how such social and non-social guilt can evolve and deploy, depending on the underlying structure of the populations, or systems, of agents. The results show that, in both lattice and scale-free networks, emotional guilt prone strategies are dominant for a larger range of the guilt and social costs incurred, compared to the well-mixed population setting, leading therefore to significantly higher levels of cooperation for a wider range of the costs. In structured population settings, both social and non-social guilt can evolve and deploy through clustering with emotional prone strategies, allowing them to be protected from exploiters, especially in case of non-social (less costly) strategies. Overall, our findings provide important insights into the design and engineering of self-organised and distributed cooperative multi-agent systems.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"312 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122773018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Matching Algorithms under Diversity-Based Reservations 基于多样性保留的匹配算法

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-19 DOI: 10.48550/arXiv.2302.09449

H. Aziz, S. Chu, Zhaohong Sun

Selection under category or diversity constraints is a ubiquitous and widely-applicable problem that is encountered in immigration, school choice, hiring, and healthcare rationing. These diversity constraints are typically represented by minimum and maximum quotas on various categories or types. We undertake a detailed comparative study of applicant selection algorithms with respect to the diversity goals.

在类别或多样性约束下的选择是一个普遍存在的问题，在移民、学校选择、招聘和医疗配给中都会遇到。这些多样性限制通常以各种类别或类型的最小和最大配额来表示。我们就多样性目标对申请人选择算法进行了详细的比较研究。

引用次数: 1

Distributed Planning with Asynchronous Execution with Local Navigation for Multi-agent Pickup and Delivery Problem 多智能体取货问题的本地导航异步执行分布式规划

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-18 DOI: 10.48550/arXiv.2302.09250

Yuki Miyashita, Tomoki Yamauchi, T. Sugawara

We propose a distributed planning method with asynchronous execution for multi-agent pickup and delivery (MAPD) problems for environments with occasional delays in agents' activities and flexible endpoints. MAPD is a crucial problem framework with many applications; however, most existing studies assume ideal agent behaviors and environments, such as a fixed speed of agents, synchronized movements, and a well-designed environment with many short detours for multiple agents to perform tasks easily. However, such an environment is often infeasible; for example, the moving speed of agents may be affected by weather and floor conditions and is often prone to delays. The proposed method can relax some infeasible conditions to apply MAPD in more realistic environments by allowing fluctuated speed in agents' actions and flexible working locations (endpoints). Our experiments showed that our method enables agents to perform MAPD in such an environment efficiently, compared to the baseline methods. We also analyzed the behaviors of agents using our method and discuss the limitations.

本文提出了一种异步执行的多智能体拾取和交付(MAPD)问题的分布式规划方法，该方法适用于智能体活动偶尔延迟和灵活端点的环境。MAPD是一个具有许多应用的关键问题框架;然而，现有的大多数研究都假设了理想的智能体行为和环境，如智能体的速度固定，运动同步，以及设计良好的环境，有许多短弯路，以便多个智能体轻松执行任务。然而，这样的环境往往是不可行的;例如，代理的移动速度可能会受到天气和地面状况的影响，并且经常容易延迟。该方法允许智能体动作速度的波动和工作位置(端点)的灵活，从而放宽了在更现实的环境中应用MAPD的一些不可行条件。我们的实验表明，与基线方法相比，我们的方法使代理能够在这样的环境中有效地执行MAPD。我们还分析了使用我们的方法的代理行为，并讨论了该方法的局限性。

引用次数: 0

Learning Density-Based Correlated Equilibria for Markov Games 基于学习密度的马尔可夫博弈相关均衡

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-16 DOI: 10.48550/arXiv.2302.08001

Libo Zhang, Yang Chen, Toru Takisaka, B. Khoussainov, Michael Witbrock, Jiamou Liu

Correlated Equilibrium (CE) is a well-established solution concept that captures coordination among agents and enjoys good algorithmic properties. In real-world multi-agent systems, in addition to being in an equilibrium, agents' policies are often expected to meet requirements with respect to safety, and fairness. Such additional requirements can often be expressed in terms of the state density which measures the state-visitation frequencies during the course of a game. However, existing CE notions or CE-finding approaches cannot explicitly specify a CE with particular properties concerning state density; they do so implicitly by either modifying reward functions or using value functions as the selection criteria. The resulting CE may thus not fully fulfil the state-density requirements. In this paper, we propose Density-Based Correlated Equilibria (DBCE), a new notion of CE that explicitly takes state density as selection criterion. Concretely, we instantiate DBCE by specifying different state-density requirements motivated by real-world applications. To compute DBCE, we put forward the Density Based Correlated Policy Iteration algorithm for the underlying control problem. We perform experiments on various games where results demonstrate the advantage of our CE-finding approach over existing methods in scenarios with state-density concerns.

相关均衡(CE)是一个成熟的求解概念，它捕捉了智能体之间的协调，具有良好的算法特性。在现实世界的多智能体系统中，除了处于平衡状态之外，智能体的策略通常还需要满足安全性和公平性方面的要求。这种额外的需求通常可以用状态密度来表示，即在游戏过程中测量状态访问频率。然而，现有的CE概念或CE查找方法不能明确指定具有有关状态密度的特定属性的CE;他们通过修改奖励函数或使用价值函数作为选择标准来实现这一点。因此，所得的行政长官可能不能完全符合状态密度的要求。本文提出了以状态密度作为选择标准的基于密度的相关平衡(DBCE)概念。具体地说，我们通过指定由实际应用程序驱动的不同状态密度需求来实例化DBCE。为了计算DBCE，我们针对底层控制问题提出了基于密度的相关策略迭代算法。我们对各种游戏进行了实验，结果表明我们的ce查找方法在状态密度问题的情况下优于现有方法。

{"title":"Learning Density-Based Correlated Equilibria for Markov Games","authors":"Libo Zhang, Yang Chen, Toru Takisaka, B. Khoussainov, Michael Witbrock, Jiamou Liu","doi":"10.48550/arXiv.2302.08001","DOIUrl":"https://doi.org/10.48550/arXiv.2302.08001","url":null,"abstract":"Correlated Equilibrium (CE) is a well-established solution concept that captures coordination among agents and enjoys good algorithmic properties. In real-world multi-agent systems, in addition to being in an equilibrium, agents' policies are often expected to meet requirements with respect to safety, and fairness. Such additional requirements can often be expressed in terms of the state density which measures the state-visitation frequencies during the course of a game. However, existing CE notions or CE-finding approaches cannot explicitly specify a CE with particular properties concerning state density; they do so implicitly by either modifying reward functions or using value functions as the selection criteria. The resulting CE may thus not fully fulfil the state-density requirements. In this paper, we propose Density-Based Correlated Equilibria (DBCE), a new notion of CE that explicitly takes state density as selection criterion. Concretely, we instantiate DBCE by specifying different state-density requirements motivated by real-world applications. To compute DBCE, we put forward the Density Based Correlated Policy Iteration algorithm for the underlying control problem. We perform experiments on various games where results demonstrate the advantage of our CE-finding approach over existing methods in scenarios with state-density concerns.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128987727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TiZero: Mastering Multi-Agent Football with Curriculum Learning and Self-Play TiZero:通过课程学习和自我发挥掌握多agent足球

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-15 DOI: 10.48550/arXiv.2302.07515

Fanqing Lin, Shiyu Huang, Tim Pearce, Wenze Chen, Weijuan Tu

Multi-agent football poses an unsolved challenge in AI research. Existing work has focused on tackling simplified scenarios of the game, or else leveraging expert demonstrations. In this paper, we develop a multi-agent system to play the full 11 vs. 11 game mode, without demonstrations. This game mode contains aspects that present major challenges to modern reinforcement learning algorithms; multi-agent coordination, long-term planning, and non-transitivity. To address these challenges, we present TiZero; a self-evolving, multi-agent system that learns from scratch. TiZero introduces several innovations, including adaptive curriculum learning, a novel self-play strategy, and an objective that optimizes the policies of multiple agents jointly. Experimentally, it outperforms previous systems by a large margin on the Google Research Football environment, increasing win rates by over 30%. To demonstrate the generality of TiZero's innovations, they are assessed on several environments beyond football; Overcooked, Multi-agent Particle-Environment, Tic-Tac-Toe and Connect-Four.

多智能体足球是人工智能研究中一个尚未解决的难题。现有的工作主要集中在解决游戏的简化场景，或者利用专家演示。在本文中，我们开发了一个多智能体系统来玩完整的11对11游戏模式，没有演示。这种游戏模式包含了对现代强化学习算法提出重大挑战的方面;多代理协调、长期规划和非传递性。为了应对这些挑战，我们推出了TiZero;一个自我进化的多智能体系统，可以从零开始学习。TiZero引入了一些创新，包括自适应课程学习，一种新颖的自我游戏策略，以及一个共同优化多个代理策略的目标。实验结果表明，该系统在谷歌Research Football环境下的表现大大优于之前的系统，胜率提高了30%以上。为了证明TiZero创新的普遍性，我们在足球之外的几个环境中对它们进行了评估;过度烹饪，多代理粒子环境，井字游戏和连接四。

引用次数: 2

Differentially Private Diffusion Auction: The Single-unit Case 差异私人扩散拍卖:单件案例

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-14 DOI: 10.48550/arXiv.2302.07072

Fengjuan Jia, Mengxiao Zhang, Jiamou Liu, B. Khoussainov

Diffusion auction refers to an emerging paradigm of online marketplace where an auctioneer utilises a social network to attract potential buyers. Diffusion auction poses significant privacy risks. From the auction outcome, it is possible to infer hidden, and potentially sensitive, preferences of buyers. To mitigate such risks, we initiate the study of differential privacy (DP) in diffusion auction mechanisms. DP is a well-established notion of privacy that protects a system against inference attacks. Achieving DP in diffusion auctions is non-trivial as the well-designed auction rules are required to incentivise the buyers to truthfully report their neighbourhood. We study the single-unit case and design two differentially private diffusion mechanisms (DPDMs): recursive DPDM and layered DPDM. We prove that these mechanisms guarantee differential privacy, incentive compatibility and individual rationality for both valuations and neighbourhood. We then empirically compare their performance on real and synthetic datasets.

扩散拍卖指的是一种新兴的在线市场模式，拍卖商利用社交网络吸引潜在买家。扩散拍卖带来了巨大的隐私风险。从拍卖结果中，可以推断出买家隐藏的、潜在敏感的偏好。为了降低这种风险，我们开始研究扩散拍卖机制中的差分隐私(DP)。DP是一种完善的隐私概念，可以保护系统免受推理攻击。在扩散拍卖中实现DP是非常重要的，因为需要设计良好的拍卖规则来激励买家如实报告他们的邻居。我们研究了单单元情况，设计了两种不同的私有扩散机制:递归DPDM和分层DPDM。我们证明了这些机制保证了估价和邻域的差异隐私性、激励兼容性和个体合理性。然后，我们通过经验比较它们在真实和合成数据集上的性能。

引用次数: 0

Bringing Diversity to Autonomous Vehicles: An Interpretable Multi-vehicle Decision-making and Planning Framework 为自动驾驶汽车带来多样性:一个可解释的多车辆决策和规划框架

Adaptive Agents and Multi-Agent Systems

Pub Date : 2023-02-14 DOI: 10.48550/arXiv.2302.06803

Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li

With the development of autonomous driving, it is becoming increasingly common for autonomous vehicles (AVs) and human-driven vehicles (HVs) to travel on the same roads. Existing single-vehicle planning algorithms on board struggle to handle sophisticated social interactions in the real world. Decisions made by these methods are difficult to understand for humans, raising the risk of crashes and making them unlikely to be applied in practice. Moreover, vehicle flows produced by open-source traffic simulators suffer from being overly conservative and lacking behavioral diversity. We propose a hierarchical multi-vehicle decision-making and planning framework with several advantages. The framework jointly makes decisions for all vehicles within the flow and reacts promptly to the dynamic environment through a high-frequency planning module. The decision module produces interpretable action sequences that can explicitly communicate self-intent to the surrounding HVs. We also present the cooperation factor and trajectory weight set, bringing diversity to autonomous vehicles in traffic at both the social and individual levels. The superiority of our proposed framework is validated through experiments with multiple scenarios, and the diverse behaviors in the generated vehicle trajectories are demonstrated through closed-loop simulations.

随着自动驾驶技术的发展，自动驾驶汽车(AVs)和人类驾驶汽车(HVs)在同一条道路上行驶变得越来越普遍。现有的单车规划算法难以处理现实世界中复杂的社会互动。人类很难理解这些方法做出的决定，这增加了撞车的风险，也不太可能在实践中应用。此外，由开源交通模拟器生成的车流存在过于保守和缺乏行为多样性的问题。我们提出了一个分层的多车辆决策和规划框架，它具有几个优点。该框架通过高频规划模块共同为车流中的所有车辆做出决策，并对动态环境做出快速反应。决策模块产生可解释的动作序列，可以显式地将自我意图传达给周围的hv。我们还提出了合作因子和轨迹权重集，在社会和个人层面上为交通中的自动驾驶汽车带来多样性。通过多场景实验验证了该框架的优越性，并通过闭环仿真验证了生成的车辆轨迹中的多种行为。

{"title":"Bringing Diversity to Autonomous Vehicles: An Interpretable Multi-vehicle Decision-making and Planning Framework","authors":"Licheng Wen, Pinlong Cai, Daocheng Fu, Song Mao, Yikang Li","doi":"10.48550/arXiv.2302.06803","DOIUrl":"https://doi.org/10.48550/arXiv.2302.06803","url":null,"abstract":"With the development of autonomous driving, it is becoming increasingly common for autonomous vehicles (AVs) and human-driven vehicles (HVs) to travel on the same roads. Existing single-vehicle planning algorithms on board struggle to handle sophisticated social interactions in the real world. Decisions made by these methods are difficult to understand for humans, raising the risk of crashes and making them unlikely to be applied in practice. Moreover, vehicle flows produced by open-source traffic simulators suffer from being overly conservative and lacking behavioral diversity. We propose a hierarchical multi-vehicle decision-making and planning framework with several advantages. The framework jointly makes decisions for all vehicles within the flow and reacts promptly to the dynamic environment through a high-frequency planning module. The decision module produces interpretable action sequences that can explicitly communicate self-intent to the surrounding HVs. We also present the cooperation factor and trajectory weight set, bringing diversity to autonomous vehicles in traffic at both the social and individual levels. The superiority of our proposed framework is validated through experiments with multiple scenarios, and the diverse behaviors in the generated vehicle trajectories are demonstrated through closed-loop simulations.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129266134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Adaptive Agents and Multi-Agent Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀