首页 > 最新文献

arXiv - CS - Multiagent Systems最新文献

英文 中文
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning 将数据置于离线多代理强化学习的中心位置
Pub Date : 2024-09-18 DOI: arxiv-2409.12001
Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius
Offline multi-agent reinforcement learning (MARL) is an exciting direction ofresearch that uses static datasets to find optimal control policies formulti-agent systems. Though the field is by definition data-driven, effortshave thus far neglected data in their drive to achieve state-of-the-artresults. We first substantiate this claim by surveying the literature, showinghow the majority of works generate their own datasets without consistentmethodology and provide sparse information about the characteristics of thesedatasets. We then show why neglecting the nature of the data is problematic,through salient examples of how tightly algorithmic performance is coupled tothe dataset used, necessitating a common foundation for experiments in thefield. In response, we take a big step towards improving data usage and dataawareness in offline MARL, with three key contributions: (1) a clear guidelinefor generating novel datasets; (2) a standardisation of over 80 existingdatasets, hosted in a publicly available repository, using a consistent storageformat and easy-to-use API; and (3) a suite of analysis tools that allow us tounderstand these datasets better, aiding further development.
离线多代理强化学习(MARL)是一个令人兴奋的研究方向,它利用静态数据集来寻找多代理系统的最优控制策略。虽然从定义上讲,该领域是数据驱动的,但迄今为止,人们在努力取得最先进结果的过程中忽略了数据。我们首先通过对文献的调查证实了这一说法,并展示了大多数作品是如何在没有一致方法的情况下生成自己的数据集,并提供了有关数据集特征的稀缺信息。然后,我们通过一些突出的例子,说明算法性能与所使用的数据集是如何紧密联系在一起的,这就需要为该领域的实验提供一个共同的基础,从而说明为什么忽视数据的性质是有问题的。作为回应,我们在改进离线 MARL 中的数据使用和数据感知方面迈出了一大步,主要贡献有三:(1)为生成新数据集提供了明确的指导;(2)对现有的 80 多个数据集进行了标准化,这些数据集托管在一个公开可用的存储库中,使用一致的存储格式和易于使用的 API;(3)提供了一套分析工具,使我们能够更好地理解这些数据集,从而有助于进一步的开发。
{"title":"Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning","authors":"Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius","doi":"arxiv-2409.12001","DOIUrl":"https://doi.org/arxiv-2409.12001","url":null,"abstract":"Offline multi-agent reinforcement learning (MARL) is an exciting direction of\u0000research that uses static datasets to find optimal control policies for\u0000multi-agent systems. Though the field is by definition data-driven, efforts\u0000have thus far neglected data in their drive to achieve state-of-the-art\u0000results. We first substantiate this claim by surveying the literature, showing\u0000how the majority of works generate their own datasets without consistent\u0000methodology and provide sparse information about the characteristics of these\u0000datasets. We then show why neglecting the nature of the data is problematic,\u0000through salient examples of how tightly algorithmic performance is coupled to\u0000the dataset used, necessitating a common foundation for experiments in the\u0000field. In response, we take a big step towards improving data usage and data\u0000awareness in offline MARL, with three key contributions: (1) a clear guideline\u0000for generating novel datasets; (2) a standardisation of over 80 existing\u0000datasets, hosted in a publicly available repository, using a consistent storage\u0000format and easy-to-use API; and (3) a suite of analysis tools that allow us to\u0000understand these datasets better, aiding further development.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning HARP:用于多代理强化学习的具有换向不变性批判的人工辅助重组法
Pub Date : 2024-09-18 DOI: arxiv-2409.11741
Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang
Human-in-the-loop reinforcement learning integrates human expertise toaccelerate agent learning and provide critical guidance and feedback in complexfields. However, many existing approaches focus on single-agent tasks andrequire continuous human involvement during the training process, significantlyincreasing the human workload and limiting scalability. In this paper, wepropose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), amulti-agent reinforcement learning framework designed for group-oriented tasks.HARP integrates automatic agent regrouping with strategic human assistanceduring deployment, enabling and allowing non-experts to offer effectiveguidance with minimal intervention. During training, agents dynamically adjusttheir groupings to optimize collaborative task completion. When deployed, theyactively seek human assistance and utilize the Permutation Invariant GroupCritic to evaluate and refine human-proposed groupings, allowing non-expertusers to contribute valuable suggestions. In multiple collaboration scenarios,our approach is able to leverage limited guidance from non-experts and enhanceperformance. The project can be found at https://github.com/huawen-hu/HARP.
人在环强化学习整合了人类的专业知识,以加速代理学习,并在复杂领域提供关键指导和反馈。然而,现有的许多方法侧重于单个代理任务,在训练过程中需要人类持续参与,这大大增加了人类的工作量,限制了可扩展性。在本文中,我们提出了HARP(Human-Assisted Regrouping with Permutation Invariant Critic),这是一种多代理强化学习框架,专为面向群体的任务而设计。HARP将自动代理重组与部署过程中的策略性人工辅助整合在一起,使非专业人员能够以最少的干预提供有效的指导。在训练过程中,代理会动态调整它们的分组,以优化协作任务的完成。在部署时,它们会主动寻求人类的帮助,并利用 "置换不变分组批判器"(Permutation Invariant GroupCritic)来评估和完善人类提出的分组,让非专业人员也能提出有价值的建议。在多种协作场景中,我们的方法能够利用非专家提供的有限指导并提高性能。该项目见 https://github.com/huawen-hu/HARP。
{"title":"HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning","authors":"Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang","doi":"arxiv-2409.11741","DOIUrl":"https://doi.org/arxiv-2409.11741","url":null,"abstract":"Human-in-the-loop reinforcement learning integrates human expertise to\u0000accelerate agent learning and provide critical guidance and feedback in complex\u0000fields. However, many existing approaches focus on single-agent tasks and\u0000require continuous human involvement during the training process, significantly\u0000increasing the human workload and limiting scalability. In this paper, we\u0000propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a\u0000multi-agent reinforcement learning framework designed for group-oriented tasks.\u0000HARP integrates automatic agent regrouping with strategic human assistance\u0000during deployment, enabling and allowing non-experts to offer effective\u0000guidance with minimal intervention. During training, agents dynamically adjust\u0000their groupings to optimize collaborative task completion. When deployed, they\u0000actively seek human assistance and utilize the Permutation Invariant Group\u0000Critic to evaluate and refine human-proposed groupings, allowing non-expert\u0000users to contribute valuable suggestions. In multiple collaboration scenarios,\u0000our approach is able to leverage limited guidance from non-experts and enhance\u0000performance. The project can be found at https://github.com/huawen-hu/HARP.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration 用于多无人飞行器探索的政策上行动者批判强化学习
Pub Date : 2024-09-17 DOI: arxiv-2409.11058
Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
Unmanned aerial vehicles (UAVs) have become increasingly popular in variousfields, including precision agriculture, search and rescue, and remote sensing.However, exploring unknown environments remains a significant challenge. Thisstudy aims to address this challenge by utilizing on-policy ReinforcementLearning (RL) with Proximal Policy Optimization (PPO) to explore the {twodimensional} area of interest with multiple UAVs. The UAVs will avoid collisionwith obstacles and each other and do the exploration in a distributed manner.The proposed solution includes actor-critic networks using deep convolutionalneural networks {(CNN)} and long short-term memory (LSTM) for identifying theUAVs and areas that have already been covered. Compared to other RL techniques,such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), thesimulation results demonstrate the superiority of the proposed PPO approach.Also, the results show that combining LSTM with CNN in critic can improveexploration. Since the proposed exploration has to work in unknownenvironments, the results showed that the proposed setup can complete thecoverage when we have new maps that differ from the trained maps. Finally, weshowed how tuning hyper parameters may affect the overall performance.
无人驾驶飞行器(UAV)在精准农业、搜救和遥感等各个领域越来越受欢迎。然而,探索未知环境仍然是一项重大挑战。本研究旨在利用策略强化学习(RL)和近端策略优化(PPO)来解决这一难题,利用多架无人飞行器探索{二维}感兴趣的区域。所提出的解决方案包括使用深度卷积神经网络 {(CNN)} 和长短期记忆(LSTM)的行动者批判网络,用于识别无人机和已覆盖区域。与其他 RL 技术(如策略梯度(PG)和异步优势行动者批判(A3C))相比,仿真结果证明了所提出的 PPO 方法的优越性。由于提议的探索必须在未知环境中工作,结果表明,当我们获得与训练地图不同的新地图时,提议的设置可以完成覆盖。最后,我们展示了调整超参数会如何影响整体性能。
{"title":"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":"https://doi.org/arxiv-2409.11058","url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\u0000fields, including precision agriculture, search and rescue, and remote sensing.\u0000However, exploring unknown environments remains a significant challenge. This\u0000study aims to address this challenge by utilizing on-policy Reinforcement\u0000Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two\u0000dimensional} area of interest with multiple UAVs. The UAVs will avoid collision\u0000with obstacles and each other and do the exploration in a distributed manner.\u0000The proposed solution includes actor-critic networks using deep convolutional\u0000neural networks {(CNN)} and long short-term memory (LSTM) for identifying the\u0000UAVs and areas that have already been covered. Compared to other RL techniques,\u0000such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\u0000simulation results demonstrate the superiority of the proposed PPO approach.\u0000Also, the results show that combining LSTM with CNN in critic can improve\u0000exploration. Since the proposed exploration has to work in unknown\u0000environments, the results showed that the proposed setup can complete the\u0000coverage when we have new maps that differ from the trained maps. Finally, we\u0000showed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark CORE-Bench:通过计算可重复性代理基准促进已发表研究的可信度
Pub Date : 2024-09-17 DOI: arxiv-2409.11363
Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan
AI agents have the potential to aid users on a variety of consequentialtasks, including conducting scientific research. To spur the development ofuseful agents, we need benchmarks that are challenging, but more crucially,directly correspond to real-world tasks of interest. This paper introduces sucha benchmark, designed to measure the accuracy of AI agents in tackling acrucial yet surprisingly challenging aspect of scientific research:computational reproducibility. This task, fundamental to the scientificprocess, involves reproducing the results of a study using the provided codeand data. We introduce CORE-Bench (Computational Reproducibility AgentBenchmark), a benchmark consisting of 270 tasks based on 90 scientific papersacross three disciplines (computer science, social science, and medicine).Tasks in CORE-Bench consist of three difficulty levels and include bothlanguage-only and vision-language tasks. We provide an evaluation system tomeasure the accuracy of agents in a fast and parallelizable way, saving days ofevaluation time for each run compared to a sequential implementation. Weevaluated two baseline agents: the general-purpose AutoGPT and a task-specificagent called CORE-Agent. We tested both variants using two underlying languagemodels: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% onthe hardest task, showing the vast scope for improvement in automating routinescientific tasks. Having agents that can reproduce existing work is a necessarystep towards building agents that can conduct novel research and could verifyand improve the performance of other research agents. We hope that CORE-Benchcan improve the state of reproducibility and spur the development of futureresearch agents.
人工智能代理有可能帮助用户完成各种重要任务,包括开展科学研究。为了促进有用代理的开发,我们需要具有挑战性的基准,但更重要的是,这些基准应直接与现实世界中的相关任务相对应。本文介绍了这样一种基准,旨在衡量人工智能代理在处理科学研究中一个重要但却具有惊人挑战性的方面--计算可重复性--时的准确性。这项任务是科学研究过程的基础,涉及使用提供的代码和数据重现研究结果。我们介绍了 CORE-Bench(计算可重现性代理基准),这是一个基于 90 篇科学论文的 270 个任务组成的基准,横跨三个学科(计算机科学、社会科学和医学)。我们提供了一个评估系统,以快速、可并行的方式测量代理的准确性,与顺序实施相比,每次运行可节省数天的评估时间。我们评估了两个基准代理:通用的 AutoGPT 和名为 CORE-Agent 的特定任务代理。我们使用两种底层语言模型对这两种变体进行了测试:GPT-4o和GPT-4o-mini。最好的代理在最难的任务上达到了 21% 的准确率,这表明常规科学任务的自动化还有很大的改进空间。拥有能重现现有工作的代理是建立能进行新颖研究的代理的必要步骤,也能验证和改进其他研究代理的性能。我们希望 CORE-Bench 能够改善可重现性的状况,并促进未来研究代理的发展。
{"title":"CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark","authors":"Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan","doi":"arxiv-2409.11363","DOIUrl":"https://doi.org/arxiv-2409.11363","url":null,"abstract":"AI agents have the potential to aid users on a variety of consequential\u0000tasks, including conducting scientific research. To spur the development of\u0000useful agents, we need benchmarks that are challenging, but more crucially,\u0000directly correspond to real-world tasks of interest. This paper introduces such\u0000a benchmark, designed to measure the accuracy of AI agents in tackling a\u0000crucial yet surprisingly challenging aspect of scientific research:\u0000computational reproducibility. This task, fundamental to the scientific\u0000process, involves reproducing the results of a study using the provided code\u0000and data. We introduce CORE-Bench (Computational Reproducibility Agent\u0000Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers\u0000across three disciplines (computer science, social science, and medicine).\u0000Tasks in CORE-Bench consist of three difficulty levels and include both\u0000language-only and vision-language tasks. We provide an evaluation system to\u0000measure the accuracy of agents in a fast and parallelizable way, saving days of\u0000evaluation time for each run compared to a sequential implementation. We\u0000evaluated two baseline agents: the general-purpose AutoGPT and a task-specific\u0000agent called CORE-Agent. We tested both variants using two underlying language\u0000models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on\u0000the hardest task, showing the vast scope for improvement in automating routine\u0000scientific tasks. Having agents that can reproduce existing work is a necessary\u0000step towards building agents that can conduct novel research and could verify\u0000and improve the performance of other research agents. We hope that CORE-Bench\u0000can improve the state of reproducibility and spur the development of future\u0000research agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bearing-Distance Based Flocking with Zone-Based Interactions 基于方位距离的成群与基于区域的互动
Pub Date : 2024-09-16 DOI: arxiv-2409.10047
Hossein B. Jond
This paper presents a novel zone-based flocking control approach suitable fordynamic multi-agent systems (MAS). Inspired by Reynolds behavioral rules for$boids$, flocking behavioral rules with the zones of repulsion, conflict,attraction, and surveillance are introduced. For each agent, using only bearingand distance measurements, behavioral deviation vectors quantify the deviationsfrom the local separation, local and global flock velocity alignment, localcohesion, obstacle avoidance and boundary conditions, and strategic separationfor avoiding alien agents. The control strategy uses the local perception-basedbehavioral deviation vectors to guide each agent's motion. Additionally, thecontrol strategy incorporates a directionally-aware obstacle avoidancemechanism that prioritizes obstacles in the agent's forward path. Simulationresults validate the effectiveness of this approach in creating flexible,adaptable, and scalable flocking behavior.
本文提出了一种适用于动态多代理系统(MAS)的基于区域的新型成群控制方法。受雷诺兹boids行为规则的启发,本文引入了具有排斥区、冲突区、吸引区和监视区的成群行为规则。对于每个代理,仅使用方位和距离测量,行为偏差矢量量化了与局部分离、局部和全局羊群速度一致性、局部凝聚力、障碍物规避和边界条件以及规避外来代理的战略分离的偏差。控制策略使用基于本地感知的行为偏差向量来指导每个代理的运动。此外,该控制策略还包含一个方向感知障碍物规避机制,可优先处理代理前进路径上的障碍物。仿真结果验证了这种方法在创建灵活、适应性强和可扩展的植群行为方面的有效性。
{"title":"Bearing-Distance Based Flocking with Zone-Based Interactions","authors":"Hossein B. Jond","doi":"arxiv-2409.10047","DOIUrl":"https://doi.org/arxiv-2409.10047","url":null,"abstract":"This paper presents a novel zone-based flocking control approach suitable for\u0000dynamic multi-agent systems (MAS). Inspired by Reynolds behavioral rules for\u0000$boids$, flocking behavioral rules with the zones of repulsion, conflict,\u0000attraction, and surveillance are introduced. For each agent, using only bearing\u0000and distance measurements, behavioral deviation vectors quantify the deviations\u0000from the local separation, local and global flock velocity alignment, local\u0000cohesion, obstacle avoidance and boundary conditions, and strategic separation\u0000for avoiding alien agents. The control strategy uses the local perception-based\u0000behavioral deviation vectors to guide each agent's motion. Additionally, the\u0000control strategy incorporates a directionally-aware obstacle avoidance\u0000mechanism that prioritizes obstacles in the agent's forward path. Simulation\u0000results validate the effectiveness of this approach in creating flexible,\u0000adaptable, and scalable flocking behavior.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-aware Advertisement Modeling and Applications in Rapid Transit Systems 快速公交系统中的情境感知广告建模与应用
Pub Date : 2024-09-16 DOI: arxiv-2409.09956
Afzal Ahmed, Muhammad Raees
In today's businesses, marketing has been a central trend for growth.Marketing quality is equally important as product quality and relevant metrics.Quality of Marketing depends on targeting the right person. Technologyadaptations have been slow in many fields but have captured some aspects ofhuman life to make an impact. For instance, in marketing, recent developmentshave provided a significant shift toward data-driven approaches. In this paper,we present an advertisement model using behavioral and tracking analysis. Weextract users' behavioral data upholding their privacy principle and performdata manipulations and pattern mining for effective analysis. We present amodel using the agent-based modeling (ABM) technique, with the target audienceof rapid transit system users to target the right person for advertisementapplications. We also outline the Overview, Design, and Details concept of ABM.
在当今的企业中,营销已成为企业发展的核心趋势。营销质量与产品质量和相关指标同等重要。技术在许多领域的应用都很缓慢,但却在人类生活的某些方面产生了影响。例如,在市场营销领域,最近的发展提供了向数据驱动方法的重大转变。在本文中,我们提出了一种利用行为和跟踪分析的广告模型。我们在保护用户隐私的前提下提取用户的行为数据,并进行数据处理和模式挖掘,从而进行有效分析。我们利用基于代理的建模(ABM)技术,以快速公交系统用户为目标受众,提出了一个广告应用模型,以锁定合适的目标人群。我们还概述了 ABM 的概述、设计和细节概念。
{"title":"Context-aware Advertisement Modeling and Applications in Rapid Transit Systems","authors":"Afzal Ahmed, Muhammad Raees","doi":"arxiv-2409.09956","DOIUrl":"https://doi.org/arxiv-2409.09956","url":null,"abstract":"In today's businesses, marketing has been a central trend for growth.\u0000Marketing quality is equally important as product quality and relevant metrics.\u0000Quality of Marketing depends on targeting the right person. Technology\u0000adaptations have been slow in many fields but have captured some aspects of\u0000human life to make an impact. For instance, in marketing, recent developments\u0000have provided a significant shift toward data-driven approaches. In this paper,\u0000we present an advertisement model using behavioral and tracking analysis. We\u0000extract users' behavioral data upholding their privacy principle and perform\u0000data manipulations and pattern mining for effective analysis. We present a\u0000model using the agent-based modeling (ABM) technique, with the target audience\u0000of rapid transit system users to target the right person for advertisement\u0000applications. We also outline the Overview, Design, and Details concept of ABM.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-agent Path Finding in Continuous Environment 连续环境中的多代理路径查找
Pub Date : 2024-09-16 DOI: arxiv-2409.10680
Kristýna Janovská, Pavel Surynek
We address a variant of multi-agent path finding in continuous environment(CE-MAPF), where agents move along sets of smooth curves. Collisions betweenagents are resolved via avoidance in the space domain. A new ContinuousEnvironment Conflict-Based Search (CE-CBS) algorithm is proposed in this work.CE-CBS combines conflict-based search (CBS) for the high-level search frameworkwith RRT* for low-level path planning. The CE-CBS algorithm is tested undervarious settings on diverse CE-MAPF instances. Experimental results show thatCE-CBS is competitive w.r.t. to other algorithms that consider continuousaspect in MAPF such as MAPF with continuous time.
我们研究的是连续环境中多代理路径搜索(CE-MAPF)的一种变体,其中代理沿着平滑曲线集移动。代理之间的碰撞通过空间域中的回避来解决。CE-CBS 结合了高层搜索框架的基于冲突的搜索(CBS)和底层路径规划的 RRT*。在不同的 CE-MAPF 实例上对 CE-CBS 算法进行了测试。实验结果表明,CE-CBS 与其他考虑 MAPF 连续性的算法(如连续时间 MAPF)相比,具有很强的竞争力。
{"title":"Multi-agent Path Finding in Continuous Environment","authors":"Kristýna Janovská, Pavel Surynek","doi":"arxiv-2409.10680","DOIUrl":"https://doi.org/arxiv-2409.10680","url":null,"abstract":"We address a variant of multi-agent path finding in continuous environment\u0000(CE-MAPF), where agents move along sets of smooth curves. Collisions between\u0000agents are resolved via avoidance in the space domain. A new Continuous\u0000Environment Conflict-Based Search (CE-CBS) algorithm is proposed in this work.\u0000CE-CBS combines conflict-based search (CBS) for the high-level search framework\u0000with RRT* for low-level path planning. The CE-CBS algorithm is tested under\u0000various settings on diverse CE-MAPF instances. Experimental results show that\u0000CE-CBS is competitive w.r.t. to other algorithms that consider continuous\u0000aspect in MAPF such as MAPF with continuous time.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Leximin Fairness to Utilitarian Optimization 将 Leximin 公平还原为功利主义优化
Pub Date : 2024-09-16 DOI: arxiv-2409.10395
Eden Hartman, Yonatan Aumann, Avinatan Hassidim, Erel Segal-Halevi
Two prominent objectives in social choice are utilitarian - maximizing thesum of agents' utilities, and leximin - maximizing the smallest agent'sutility, then the second-smallest, etc. Utilitarianism is typicallycomputationally easier to attain but is generally viewed as less fair. Thispaper presents a general reduction scheme that, given a utilitarian solver,produces a distribution over outcomes that is leximin in expectation.Importantly, the scheme is robust in the sense that, given an approximateutilitarian solver, it produces an outcome that is approximately-leximin (inexpectation) - with the same approximation factor. We apply our scheme toseveral social choice problems: stochastic allocations of indivisible goods,giveaway lotteries, and fair lotteries for participatory budgeting.
社会选择的两个主要目标是功利主义和利己主义,前者是最大化代理人的效用总和,后者是最大化最小代理人的效用,然后是第二小代理人的效用,等等。功利主义通常在计算上更容易实现,但一般被认为不太公平。重要的是,该方案是稳健的,因为给定一个近似的功利主义求解器,它所产生的结果近似于leximin(无预期)--具有相同的近似因子。我们将我们的方案应用于各种社会选择问题:不可分割物品的随机分配、赠品抽签和参与式预算的公平抽签。
{"title":"Reducing Leximin Fairness to Utilitarian Optimization","authors":"Eden Hartman, Yonatan Aumann, Avinatan Hassidim, Erel Segal-Halevi","doi":"arxiv-2409.10395","DOIUrl":"https://doi.org/arxiv-2409.10395","url":null,"abstract":"Two prominent objectives in social choice are utilitarian - maximizing the\u0000sum of agents' utilities, and leximin - maximizing the smallest agent's\u0000utility, then the second-smallest, etc. Utilitarianism is typically\u0000computationally easier to attain but is generally viewed as less fair. This\u0000paper presents a general reduction scheme that, given a utilitarian solver,\u0000produces a distribution over outcomes that is leximin in expectation.\u0000Importantly, the scheme is robust in the sense that, given an approximate\u0000utilitarian solver, it produces an outcome that is approximately-leximin (in\u0000expectation) - with the same approximation factor. We apply our scheme to\u0000several social choice problems: stochastic allocations of indivisible goods,\u0000giveaway lotteries, and fair lotteries for participatory budgeting.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized Safe and Scalable Multi-Agent Control under Limited Actuation 有限致动下的分散式安全可扩展多代理控制
Pub Date : 2024-09-15 DOI: arxiv-2409.09573
Vrushabh Zinage, Abhishek Jha, Rohan Chandra, Efstathios Bakolas
To deploy safe and agile robots in cluttered environments, there is a need todevelop fully decentralized controllers that guarantee safety, respectactuation limits, prevent deadlocks, and scale to thousands of agents. Currentapproaches fall short of meeting all these goals: optimization-based methodsensure safety but lack scalability, while learning-based methods scale but donot guarantee safety. We propose a novel algorithm to achieve safe and scalablecontrol for multiple agents under limited actuation. Specifically, our approachincludes: $(i)$ learning a decentralized neural Integral Control Barrierfunction (neural ICBF) for scalable, input-constrained control, $(ii)$embedding a lightweight decentralized Model Predictive Control-based IntegralControl Barrier Function (MPC-ICBF) into the neural network policy to ensuresafety while maintaining scalability, and $(iii)$ introducing a novel method tominimize deadlocks based on gradient-based optimization techniques from machinelearning to address local minima in deadlocks. Our numerical simulations showthat this approach outperforms state-of-the-art multi-agent control algorithmsin terms of safety, input constraint satisfaction, and minimizing deadlocks.Additionally, we demonstrate strong generalization across scenarios withvarying agent counts, scaling up to 1000 agents.
为了在杂乱的环境中部署安全而灵活的机器人,需要开发完全分散的控制器,以保证安全性、尊重作用极限、防止死锁并扩展到数千个代理。目前的方法无法满足所有这些目标:基于优化的方法能保证安全性,但缺乏可扩展性,而基于学习的方法虽能扩展,但无法保证安全性。我们提出了一种新颖的算法,在有限的驱动力下实现对多个代理的安全和可扩展控制。具体来说,我们的方法包括(i)$学习分散式神经积分控制障碍函数(neural ICBF),实现可扩展的输入受限控制;(ii)$在神经网络策略中嵌入轻量级分散式基于模型预测控制的积分控制障碍函数(MPC-ICBF),确保安全的同时保持可扩展性;(iii)$引入一种新方法,基于机器学习中的梯度优化技术,最大限度地减少死锁,解决死锁中的局部极小值问题。我们的数值模拟表明,这种方法在安全性、输入约束满足度和死锁最小化方面都优于最先进的多代理控制算法。
{"title":"Decentralized Safe and Scalable Multi-Agent Control under Limited Actuation","authors":"Vrushabh Zinage, Abhishek Jha, Rohan Chandra, Efstathios Bakolas","doi":"arxiv-2409.09573","DOIUrl":"https://doi.org/arxiv-2409.09573","url":null,"abstract":"To deploy safe and agile robots in cluttered environments, there is a need to\u0000develop fully decentralized controllers that guarantee safety, respect\u0000actuation limits, prevent deadlocks, and scale to thousands of agents. Current\u0000approaches fall short of meeting all these goals: optimization-based methods\u0000ensure safety but lack scalability, while learning-based methods scale but do\u0000not guarantee safety. We propose a novel algorithm to achieve safe and scalable\u0000control for multiple agents under limited actuation. Specifically, our approach\u0000includes: $(i)$ learning a decentralized neural Integral Control Barrier\u0000function (neural ICBF) for scalable, input-constrained control, $(ii)$\u0000embedding a lightweight decentralized Model Predictive Control-based Integral\u0000Control Barrier Function (MPC-ICBF) into the neural network policy to ensure\u0000safety while maintaining scalability, and $(iii)$ introducing a novel method to\u0000minimize deadlocks based on gradient-based optimization techniques from machine\u0000learning to address local minima in deadlocks. Our numerical simulations show\u0000that this approach outperforms state-of-the-art multi-agent control algorithms\u0000in terms of safety, input constraint satisfaction, and minimizing deadlocks.\u0000Additionally, we demonstrate strong generalization across scenarios with\u0000varying agent counts, scaling up to 1000 agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model 有条件合作的学习督促:多代理强化学习模型
Pub Date : 2024-09-14 DOI: arxiv-2409.09509
Shatayu Kulkarni, Sabine Brunswicker
The public goods game describes a social dilemma in which a large proportionof agents act as conditional cooperators (CC): they only act cooperatively ifthey see others acting cooperatively because they satisfice with the socialnorm to be in line with what others are doing instead of optimizingcooperation. CCs are guided by aspiration-based reinforcement learning guidedby past experiences of interactions with others and satisficing aspirations. Inmany real-world settings, reinforcing social norms do not emerge. In thispaper, we propose that an optimizing reinforcement agent can facilitatecooperation through nudges, i.e. indirect mechanisms for cooperation to happen.The agent's goal is to motivate CCs into cooperation through its own actions tocreate social norms that signal that others are cooperating. We introduce amulti-agent reinforcement learning model for public goods games, with 3 CClearning agents using aspirational reinforcement learning and 1 nudging agentusing deep reinforcement learning to learn nudges that optimize cooperation.For our nudging agent, we model two distinct reward functions, one maximizingthe total game return (sum DRL) and one maximizing the number of cooperativecontributions contributions higher than a proportional threshold (prop DRL).Our results show that our aspiration-based RL model for CC agents is consistentwith empirically observed CC behavior. Games combining 3 CC RL agents and onenudging RL agent outperform the baseline consisting of 4 CC RL agents only. Thesum DRL nudging agent increases the total sum of contributions by 8.22% and thetotal proportion of cooperative contributions by 12.42%, while the prop nudgingDRL increases the total sum of contributions by 8.85% and the total proportionof cooperative contributions by 14.87%. Our findings advance the literature onpublic goods games and reinforcement learning.
公共物品博弈描述了一种社会困境,在这种困境中,大部分行为主体都是有条件的合作者(CC):他们只有在看到他人采取合作行动时才会采取合作行动,因为他们满足于社会规范,与他人的行为保持一致,而不是优化合作。CC 以过去与他人互动的经验和满足愿望的愿望为指导,进行基于愿望的强化学习。在现实世界的许多环境中,强化社会规范并没有出现。在本文中,我们提出一个优化强化代理可以通过暗示(即合作发生的间接机制)来促进合作。该代理的目标是通过自己的行动来激励 CC 进行合作,从而建立社会规范,向他人发出合作的信号。我们为公共物品博弈引入了多代理强化学习模型,其中 3 个 CC 学习代理使用愿望强化学习,1 个劝告代理使用深度强化学习来学习优化合作的劝告。对于我们的劝告代理,我们模拟了两个不同的奖励函数,一个是最大化总博弈收益(总和 DRL),另一个是最大化高于比例阈值的合作贡献贡献数(比例 DRL)。我们的结果表明,我们基于愿望的 CC 代理 RL 模型与经验观察到的 CC 行为是一致的。由 3 个 CC RL 代理和一个推断 RL 代理组成的游戏优于仅由 4 个 CC RL 代理组成的基线游戏。总和 DRL 推断代理使贡献总和增加了 8.22%,合作贡献总比例增加了 12.42%,而道具推断 DRL 使贡献总和增加了 8.85%,合作贡献总比例增加了 14.87%。我们的研究结果推动了有关公益博弈和强化学习的文献的发展。
{"title":"Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model","authors":"Shatayu Kulkarni, Sabine Brunswicker","doi":"arxiv-2409.09509","DOIUrl":"https://doi.org/arxiv-2409.09509","url":null,"abstract":"The public goods game describes a social dilemma in which a large proportion\u0000of agents act as conditional cooperators (CC): they only act cooperatively if\u0000they see others acting cooperatively because they satisfice with the social\u0000norm to be in line with what others are doing instead of optimizing\u0000cooperation. CCs are guided by aspiration-based reinforcement learning guided\u0000by past experiences of interactions with others and satisficing aspirations. In\u0000many real-world settings, reinforcing social norms do not emerge. In this\u0000paper, we propose that an optimizing reinforcement agent can facilitate\u0000cooperation through nudges, i.e. indirect mechanisms for cooperation to happen.\u0000The agent's goal is to motivate CCs into cooperation through its own actions to\u0000create social norms that signal that others are cooperating. We introduce a\u0000multi-agent reinforcement learning model for public goods games, with 3 CC\u0000learning agents using aspirational reinforcement learning and 1 nudging agent\u0000using deep reinforcement learning to learn nudges that optimize cooperation.\u0000For our nudging agent, we model two distinct reward functions, one maximizing\u0000the total game return (sum DRL) and one maximizing the number of cooperative\u0000contributions contributions higher than a proportional threshold (prop DRL).\u0000Our results show that our aspiration-based RL model for CC agents is consistent\u0000with empirically observed CC behavior. Games combining 3 CC RL agents and one\u0000nudging RL agent outperform the baseline consisting of 4 CC RL agents only. The\u0000sum DRL nudging agent increases the total sum of contributions by 8.22% and the\u0000total proportion of cooperative contributions by 12.42%, while the prop nudging\u0000DRL increases the total sum of contributions by 8.85% and the total proportion\u0000of cooperative contributions by 14.87%. Our findings advance the literature on\u0000public goods games and reinforcement learning.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multiagent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1