Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius
Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets. We then show why neglecting the nature of the data is problematic, through salient examples of how tightly algorithmic performance is coupled to the dataset used, necessitating a common foundation for experiments in the field. In response, we take a big step towards improving data usage and data awareness in offline MARL, with three key contributions: (1) a clear guideline for generating novel datasets; (2) a standardisation of over 80 existing datasets, hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and (3) a suite of analysis tools that allow us to understand these datasets better, aiding further development.
{"title":"Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning","authors":"Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius","doi":"arxiv-2409.12001","DOIUrl":"https://doi.org/arxiv-2409.12001","url":null,"abstract":"Offline multi-agent reinforcement learning (MARL) is an exciting direction of\u0000research that uses static datasets to find optimal control policies for\u0000multi-agent systems. Though the field is by definition data-driven, efforts\u0000have thus far neglected data in their drive to achieve state-of-the-art\u0000results. We first substantiate this claim by surveying the literature, showing\u0000how the majority of works generate their own datasets without consistent\u0000methodology and provide sparse information about the characteristics of these\u0000datasets. We then show why neglecting the nature of the data is problematic,\u0000through salient examples of how tightly algorithmic performance is coupled to\u0000the dataset used, necessitating a common foundation for experiments in the\u0000field. In response, we take a big step towards improving data usage and data\u0000awareness in offline MARL, with three key contributions: (1) a clear guideline\u0000for generating novel datasets; (2) a standardisation of over 80 existing\u0000datasets, hosted in a publicly available repository, using a consistent storage\u0000format and easy-to-use API; and (3) a suite of analysis tools that allow us to\u0000understand these datasets better, aiding further development.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang
Human-in-the-loop reinforcement learning integrates human expertise to accelerate agent learning and provide critical guidance and feedback in complex fields. However, many existing approaches focus on single-agent tasks and require continuous human involvement during the training process, significantly increasing the human workload and limiting scalability. In this paper, we propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks. HARP integrates automatic agent regrouping with strategic human assistance during deployment, enabling and allowing non-experts to offer effective guidance with minimal intervention. During training, agents dynamically adjust their groupings to optimize collaborative task completion. When deployed, they actively seek human assistance and utilize the Permutation Invariant Group Critic to evaluate and refine human-proposed groupings, allowing non-expert users to contribute valuable suggestions. In multiple collaboration scenarios, our approach is able to leverage limited guidance from non-experts and enhance performance. The project can be found at https://github.com/huawen-hu/HARP.
人在环强化学习整合了人类的专业知识,以加速代理学习,并在复杂领域提供关键指导和反馈。然而,现有的许多方法侧重于单个代理任务,在训练过程中需要人类持续参与,这大大增加了人类的工作量,限制了可扩展性。在本文中,我们提出了HARP(Human-Assisted Regrouping with Permutation Invariant Critic),这是一种多代理强化学习框架,专为面向群体的任务而设计。HARP将自动代理重组与部署过程中的策略性人工辅助整合在一起,使非专业人员能够以最少的干预提供有效的指导。在训练过程中,代理会动态调整它们的分组,以优化协作任务的完成。在部署时,它们会主动寻求人类的帮助,并利用 "置换不变分组批判器"(Permutation Invariant GroupCritic)来评估和完善人类提出的分组,让非专业人员也能提出有价值的建议。在多种协作场景中,我们的方法能够利用非专家提供的有限指导并提高性能。该项目见 https://github.com/huawen-hu/HARP。
{"title":"HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning","authors":"Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang","doi":"arxiv-2409.11741","DOIUrl":"https://doi.org/arxiv-2409.11741","url":null,"abstract":"Human-in-the-loop reinforcement learning integrates human expertise to\u0000accelerate agent learning and provide critical guidance and feedback in complex\u0000fields. However, many existing approaches focus on single-agent tasks and\u0000require continuous human involvement during the training process, significantly\u0000increasing the human workload and limiting scalability. In this paper, we\u0000propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a\u0000multi-agent reinforcement learning framework designed for group-oriented tasks.\u0000HARP integrates automatic agent regrouping with strategic human assistance\u0000during deployment, enabling and allowing non-experts to offer effective\u0000guidance with minimal intervention. During training, agents dynamically adjust\u0000their groupings to optimize collaborative task completion. When deployed, they\u0000actively seek human assistance and utilize the Permutation Invariant Group\u0000Critic to evaluate and refine human-proposed groupings, allowing non-expert\u0000users to contribute valuable suggestions. In multiple collaboration scenarios,\u0000our approach is able to leverage limited guidance from non-experts and enhance\u0000performance. The project can be found at https://github.com/huawen-hu/HARP.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.
{"title":"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":"https://doi.org/arxiv-2409.11058","url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\u0000fields, including precision agriculture, search and rescue, and remote sensing.\u0000However, exploring unknown environments remains a significant challenge. This\u0000study aims to address this challenge by utilizing on-policy Reinforcement\u0000Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two\u0000dimensional} area of interest with multiple UAVs. The UAVs will avoid collision\u0000with obstacles and each other and do the exploration in a distributed manner.\u0000The proposed solution includes actor-critic networks using deep convolutional\u0000neural networks {(CNN)} and long short-term memory (LSTM) for identifying the\u0000UAVs and areas that have already been covered. Compared to other RL techniques,\u0000such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\u0000simulation results demonstrate the superiority of the proposed PPO approach.\u0000Also, the results show that combining LSTM with CNN in critic can improve\u0000exploration. Since the proposed exploration has to work in unknown\u0000environments, the results showed that the proposed setup can complete the\u0000coverage when we have new maps that differ from the trained maps. Finally, we\u0000showed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI agents have the potential to aid users on a variety of consequential tasks, including conducting scientific research. To spur the development of useful agents, we need benchmarks that are challenging, but more crucially, directly correspond to real-world tasks of interest. This paper introduces such a benchmark, designed to measure the accuracy of AI agents in tackling a crucial yet surprisingly challenging aspect of scientific research: computational reproducibility. This task, fundamental to the scientific process, involves reproducing the results of a study using the provided code and data. We introduce CORE-Bench (Computational Reproducibility Agent Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine). Tasks in CORE-Bench consist of three difficulty levels and include both language-only and vision-language tasks. We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way, saving days of evaluation time for each run compared to a sequential implementation. We evaluated two baseline agents: the general-purpose AutoGPT and a task-specific agent called CORE-Agent. We tested both variants using two underlying language models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks. Having agents that can reproduce existing work is a necessary step towards building agents that can conduct novel research and could verify and improve the performance of other research agents. We hope that CORE-Bench can improve the state of reproducibility and spur the development of future research agents.
{"title":"CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark","authors":"Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan","doi":"arxiv-2409.11363","DOIUrl":"https://doi.org/arxiv-2409.11363","url":null,"abstract":"AI agents have the potential to aid users on a variety of consequential\u0000tasks, including conducting scientific research. To spur the development of\u0000useful agents, we need benchmarks that are challenging, but more crucially,\u0000directly correspond to real-world tasks of interest. This paper introduces such\u0000a benchmark, designed to measure the accuracy of AI agents in tackling a\u0000crucial yet surprisingly challenging aspect of scientific research:\u0000computational reproducibility. This task, fundamental to the scientific\u0000process, involves reproducing the results of a study using the provided code\u0000and data. We introduce CORE-Bench (Computational Reproducibility Agent\u0000Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers\u0000across three disciplines (computer science, social science, and medicine).\u0000Tasks in CORE-Bench consist of three difficulty levels and include both\u0000language-only and vision-language tasks. We provide an evaluation system to\u0000measure the accuracy of agents in a fast and parallelizable way, saving days of\u0000evaluation time for each run compared to a sequential implementation. We\u0000evaluated two baseline agents: the general-purpose AutoGPT and a task-specific\u0000agent called CORE-Agent. We tested both variants using two underlying language\u0000models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on\u0000the hardest task, showing the vast scope for improvement in automating routine\u0000scientific tasks. Having agents that can reproduce existing work is a necessary\u0000step towards building agents that can conduct novel research and could verify\u0000and improve the performance of other research agents. We hope that CORE-Bench\u0000can improve the state of reproducibility and spur the development of future\u0000research agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel zone-based flocking control approach suitable for dynamic multi-agent systems (MAS). Inspired by Reynolds behavioral rules for $boids$, flocking behavioral rules with the zones of repulsion, conflict, attraction, and surveillance are introduced. For each agent, using only bearing and distance measurements, behavioral deviation vectors quantify the deviations from the local separation, local and global flock velocity alignment, local cohesion, obstacle avoidance and boundary conditions, and strategic separation for avoiding alien agents. The control strategy uses the local perception-based behavioral deviation vectors to guide each agent's motion. Additionally, the control strategy incorporates a directionally-aware obstacle avoidance mechanism that prioritizes obstacles in the agent's forward path. Simulation results validate the effectiveness of this approach in creating flexible, adaptable, and scalable flocking behavior.
{"title":"Bearing-Distance Based Flocking with Zone-Based Interactions","authors":"Hossein B. Jond","doi":"arxiv-2409.10047","DOIUrl":"https://doi.org/arxiv-2409.10047","url":null,"abstract":"This paper presents a novel zone-based flocking control approach suitable for\u0000dynamic multi-agent systems (MAS). Inspired by Reynolds behavioral rules for\u0000$boids$, flocking behavioral rules with the zones of repulsion, conflict,\u0000attraction, and surveillance are introduced. For each agent, using only bearing\u0000and distance measurements, behavioral deviation vectors quantify the deviations\u0000from the local separation, local and global flock velocity alignment, local\u0000cohesion, obstacle avoidance and boundary conditions, and strategic separation\u0000for avoiding alien agents. The control strategy uses the local perception-based\u0000behavioral deviation vectors to guide each agent's motion. Additionally, the\u0000control strategy incorporates a directionally-aware obstacle avoidance\u0000mechanism that prioritizes obstacles in the agent's forward path. Simulation\u0000results validate the effectiveness of this approach in creating flexible,\u0000adaptable, and scalable flocking behavior.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In today's businesses, marketing has been a central trend for growth. Marketing quality is equally important as product quality and relevant metrics. Quality of Marketing depends on targeting the right person. Technology adaptations have been slow in many fields but have captured some aspects of human life to make an impact. For instance, in marketing, recent developments have provided a significant shift toward data-driven approaches. In this paper, we present an advertisement model using behavioral and tracking analysis. We extract users' behavioral data upholding their privacy principle and perform data manipulations and pattern mining for effective analysis. We present a model using the agent-based modeling (ABM) technique, with the target audience of rapid transit system users to target the right person for advertisement applications. We also outline the Overview, Design, and Details concept of ABM.
{"title":"Context-aware Advertisement Modeling and Applications in Rapid Transit Systems","authors":"Afzal Ahmed, Muhammad Raees","doi":"arxiv-2409.09956","DOIUrl":"https://doi.org/arxiv-2409.09956","url":null,"abstract":"In today's businesses, marketing has been a central trend for growth.\u0000Marketing quality is equally important as product quality and relevant metrics.\u0000Quality of Marketing depends on targeting the right person. Technology\u0000adaptations have been slow in many fields but have captured some aspects of\u0000human life to make an impact. For instance, in marketing, recent developments\u0000have provided a significant shift toward data-driven approaches. In this paper,\u0000we present an advertisement model using behavioral and tracking analysis. We\u0000extract users' behavioral data upholding their privacy principle and perform\u0000data manipulations and pattern mining for effective analysis. We present a\u0000model using the agent-based modeling (ABM) technique, with the target audience\u0000of rapid transit system users to target the right person for advertisement\u0000applications. We also outline the Overview, Design, and Details concept of ABM.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address a variant of multi-agent path finding in continuous environment (CE-MAPF), where agents move along sets of smooth curves. Collisions between agents are resolved via avoidance in the space domain. A new Continuous Environment Conflict-Based Search (CE-CBS) algorithm is proposed in this work. CE-CBS combines conflict-based search (CBS) for the high-level search framework with RRT* for low-level path planning. The CE-CBS algorithm is tested under various settings on diverse CE-MAPF instances. Experimental results show that CE-CBS is competitive w.r.t. to other algorithms that consider continuous aspect in MAPF such as MAPF with continuous time.
{"title":"Multi-agent Path Finding in Continuous Environment","authors":"Kristýna Janovská, Pavel Surynek","doi":"arxiv-2409.10680","DOIUrl":"https://doi.org/arxiv-2409.10680","url":null,"abstract":"We address a variant of multi-agent path finding in continuous environment\u0000(CE-MAPF), where agents move along sets of smooth curves. Collisions between\u0000agents are resolved via avoidance in the space domain. A new Continuous\u0000Environment Conflict-Based Search (CE-CBS) algorithm is proposed in this work.\u0000CE-CBS combines conflict-based search (CBS) for the high-level search framework\u0000with RRT* for low-level path planning. The CE-CBS algorithm is tested under\u0000various settings on diverse CE-MAPF instances. Experimental results show that\u0000CE-CBS is competitive w.r.t. to other algorithms that consider continuous\u0000aspect in MAPF such as MAPF with continuous time.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eden Hartman, Yonatan Aumann, Avinatan Hassidim, Erel Segal-Halevi
Two prominent objectives in social choice are utilitarian - maximizing the sum of agents' utilities, and leximin - maximizing the smallest agent's utility, then the second-smallest, etc. Utilitarianism is typically computationally easier to attain but is generally viewed as less fair. This paper presents a general reduction scheme that, given a utilitarian solver, produces a distribution over outcomes that is leximin in expectation. Importantly, the scheme is robust in the sense that, given an approximate utilitarian solver, it produces an outcome that is approximately-leximin (in expectation) - with the same approximation factor. We apply our scheme to several social choice problems: stochastic allocations of indivisible goods, giveaway lotteries, and fair lotteries for participatory budgeting.
{"title":"Reducing Leximin Fairness to Utilitarian Optimization","authors":"Eden Hartman, Yonatan Aumann, Avinatan Hassidim, Erel Segal-Halevi","doi":"arxiv-2409.10395","DOIUrl":"https://doi.org/arxiv-2409.10395","url":null,"abstract":"Two prominent objectives in social choice are utilitarian - maximizing the\u0000sum of agents' utilities, and leximin - maximizing the smallest agent's\u0000utility, then the second-smallest, etc. Utilitarianism is typically\u0000computationally easier to attain but is generally viewed as less fair. This\u0000paper presents a general reduction scheme that, given a utilitarian solver,\u0000produces a distribution over outcomes that is leximin in expectation.\u0000Importantly, the scheme is robust in the sense that, given an approximate\u0000utilitarian solver, it produces an outcome that is approximately-leximin (in\u0000expectation) - with the same approximation factor. We apply our scheme to\u0000several social choice problems: stochastic allocations of indivisible goods,\u0000giveaway lotteries, and fair lotteries for participatory budgeting.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To deploy safe and agile robots in cluttered environments, there is a need to develop fully decentralized controllers that guarantee safety, respect actuation limits, prevent deadlocks, and scale to thousands of agents. Current approaches fall short of meeting all these goals: optimization-based methods ensure safety but lack scalability, while learning-based methods scale but do not guarantee safety. We propose a novel algorithm to achieve safe and scalable control for multiple agents under limited actuation. Specifically, our approach includes: $(i)$ learning a decentralized neural Integral Control Barrier function (neural ICBF) for scalable, input-constrained control, $(ii)$ embedding a lightweight decentralized Model Predictive Control-based Integral Control Barrier Function (MPC-ICBF) into the neural network policy to ensure safety while maintaining scalability, and $(iii)$ introducing a novel method to minimize deadlocks based on gradient-based optimization techniques from machine learning to address local minima in deadlocks. Our numerical simulations show that this approach outperforms state-of-the-art multi-agent control algorithms in terms of safety, input constraint satisfaction, and minimizing deadlocks. Additionally, we demonstrate strong generalization across scenarios with varying agent counts, scaling up to 1000 agents.
{"title":"Decentralized Safe and Scalable Multi-Agent Control under Limited Actuation","authors":"Vrushabh Zinage, Abhishek Jha, Rohan Chandra, Efstathios Bakolas","doi":"arxiv-2409.09573","DOIUrl":"https://doi.org/arxiv-2409.09573","url":null,"abstract":"To deploy safe and agile robots in cluttered environments, there is a need to\u0000develop fully decentralized controllers that guarantee safety, respect\u0000actuation limits, prevent deadlocks, and scale to thousands of agents. Current\u0000approaches fall short of meeting all these goals: optimization-based methods\u0000ensure safety but lack scalability, while learning-based methods scale but do\u0000not guarantee safety. We propose a novel algorithm to achieve safe and scalable\u0000control for multiple agents under limited actuation. Specifically, our approach\u0000includes: $(i)$ learning a decentralized neural Integral Control Barrier\u0000function (neural ICBF) for scalable, input-constrained control, $(ii)$\u0000embedding a lightweight decentralized Model Predictive Control-based Integral\u0000Control Barrier Function (MPC-ICBF) into the neural network policy to ensure\u0000safety while maintaining scalability, and $(iii)$ introducing a novel method to\u0000minimize deadlocks based on gradient-based optimization techniques from machine\u0000learning to address local minima in deadlocks. Our numerical simulations show\u0000that this approach outperforms state-of-the-art multi-agent control algorithms\u0000in terms of safety, input constraint satisfaction, and minimizing deadlocks.\u0000Additionally, we demonstrate strong generalization across scenarios with\u0000varying agent counts, scaling up to 1000 agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The public goods game describes a social dilemma in which a large proportion of agents act as conditional cooperators (CC): they only act cooperatively if they see others acting cooperatively because they satisfice with the social norm to be in line with what others are doing instead of optimizing cooperation. CCs are guided by aspiration-based reinforcement learning guided by past experiences of interactions with others and satisficing aspirations. In many real-world settings, reinforcing social norms do not emerge. In this paper, we propose that an optimizing reinforcement agent can facilitate cooperation through nudges, i.e. indirect mechanisms for cooperation to happen. The agent's goal is to motivate CCs into cooperation through its own actions to create social norms that signal that others are cooperating. We introduce a multi-agent reinforcement learning model for public goods games, with 3 CC learning agents using aspirational reinforcement learning and 1 nudging agent using deep reinforcement learning to learn nudges that optimize cooperation. For our nudging agent, we model two distinct reward functions, one maximizing the total game return (sum DRL) and one maximizing the number of cooperative contributions contributions higher than a proportional threshold (prop DRL). Our results show that our aspiration-based RL model for CC agents is consistent with empirically observed CC behavior. Games combining 3 CC RL agents and one nudging RL agent outperform the baseline consisting of 4 CC RL agents only. The sum DRL nudging agent increases the total sum of contributions by 8.22% and the total proportion of cooperative contributions by 12.42%, while the prop nudging DRL increases the total sum of contributions by 8.85% and the total proportion of cooperative contributions by 14.87%. Our findings advance the literature on public goods games and reinforcement learning.
公共物品博弈描述了一种社会困境,在这种困境中,大部分行为主体都是有条件的合作者(CC):他们只有在看到他人采取合作行动时才会采取合作行动,因为他们满足于社会规范,与他人的行为保持一致,而不是优化合作。CC 以过去与他人互动的经验和满足愿望的愿望为指导,进行基于愿望的强化学习。在现实世界的许多环境中,强化社会规范并没有出现。在本文中,我们提出一个优化强化代理可以通过暗示(即合作发生的间接机制)来促进合作。该代理的目标是通过自己的行动来激励 CC 进行合作,从而建立社会规范,向他人发出合作的信号。我们为公共物品博弈引入了多代理强化学习模型,其中 3 个 CC 学习代理使用愿望强化学习,1 个劝告代理使用深度强化学习来学习优化合作的劝告。对于我们的劝告代理,我们模拟了两个不同的奖励函数,一个是最大化总博弈收益(总和 DRL),另一个是最大化高于比例阈值的合作贡献贡献数(比例 DRL)。我们的结果表明,我们基于愿望的 CC 代理 RL 模型与经验观察到的 CC 行为是一致的。由 3 个 CC RL 代理和一个推断 RL 代理组成的游戏优于仅由 4 个 CC RL 代理组成的基线游戏。总和 DRL 推断代理使贡献总和增加了 8.22%,合作贡献总比例增加了 12.42%,而道具推断 DRL 使贡献总和增加了 8.85%,合作贡献总比例增加了 14.87%。我们的研究结果推动了有关公益博弈和强化学习的文献的发展。
{"title":"Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model","authors":"Shatayu Kulkarni, Sabine Brunswicker","doi":"arxiv-2409.09509","DOIUrl":"https://doi.org/arxiv-2409.09509","url":null,"abstract":"The public goods game describes a social dilemma in which a large proportion\u0000of agents act as conditional cooperators (CC): they only act cooperatively if\u0000they see others acting cooperatively because they satisfice with the social\u0000norm to be in line with what others are doing instead of optimizing\u0000cooperation. CCs are guided by aspiration-based reinforcement learning guided\u0000by past experiences of interactions with others and satisficing aspirations. In\u0000many real-world settings, reinforcing social norms do not emerge. In this\u0000paper, we propose that an optimizing reinforcement agent can facilitate\u0000cooperation through nudges, i.e. indirect mechanisms for cooperation to happen.\u0000The agent's goal is to motivate CCs into cooperation through its own actions to\u0000create social norms that signal that others are cooperating. We introduce a\u0000multi-agent reinforcement learning model for public goods games, with 3 CC\u0000learning agents using aspirational reinforcement learning and 1 nudging agent\u0000using deep reinforcement learning to learn nudges that optimize cooperation.\u0000For our nudging agent, we model two distinct reward functions, one maximizing\u0000the total game return (sum DRL) and one maximizing the number of cooperative\u0000contributions contributions higher than a proportional threshold (prop DRL).\u0000Our results show that our aspiration-based RL model for CC agents is consistent\u0000with empirically observed CC behavior. Games combining 3 CC RL agents and one\u0000nudging RL agent outperform the baseline consisting of 4 CC RL agents only. The\u0000sum DRL nudging agent increases the total sum of contributions by 8.22% and the\u0000total proportion of cooperative contributions by 12.42%, while the prop nudging\u0000DRL increases the total sum of contributions by 8.85% and the total proportion\u0000of cooperative contributions by 14.87%. Our findings advance the literature on\u0000public goods games and reinforcement learning.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}