首页 > 最新文献

arXiv - CS - Multiagent Systems最新文献

英文 中文
Improving the Prediction of Individual Engagement in Recommendations Using Cognitive Models 利用认知模型改进对个人参与推荐的预测
Pub Date : 2024-08-28 DOI: arxiv-2408.16147
Roderick Seow, Yunfan Zhao, Duncan Wood, Milind Tambe, Cleotilde Gonzalez
For public health programs with limited resources, the ability to predict howbehaviors change over time and in response to interventions is crucial fordeciding when and to whom interventions should be allocated. Using data from areal-world maternal health program, we demonstrate how a cognitive model basedon Instance-Based Learning (IBL) Theory can augment existing purelycomputational approaches. Our findings show that, compared to generaltime-series forecasters (e.g., LSTMs), IBL models, which reflect humandecision-making processes, better predict the dynamics of individuals' states.Additionally, IBL provides estimates of the volatility in individuals' statesand their sensitivity to interventions, which can improve the efficiency oftraining of other time series models.
对于资源有限的公共卫生项目来说,预测行为如何随时间和干预措施而变化的能力对于决定何时以及向谁分配干预措施至关重要。利用来自全球孕产妇健康项目的数据,我们展示了基于实例学习(IBL)理论的认知模型如何增强现有的纯计算方法。我们的研究结果表明,与一般的时间序列预测模型(如 LSTM)相比,反映人类决策过程的 IBL 模型能更好地预测个体状态的动态变化。此外,IBL 还能估计个体状态的波动性及其对干预措施的敏感性,从而提高其他时间序列模型的训练效率。
{"title":"Improving the Prediction of Individual Engagement in Recommendations Using Cognitive Models","authors":"Roderick Seow, Yunfan Zhao, Duncan Wood, Milind Tambe, Cleotilde Gonzalez","doi":"arxiv-2408.16147","DOIUrl":"https://doi.org/arxiv-2408.16147","url":null,"abstract":"For public health programs with limited resources, the ability to predict how\u0000behaviors change over time and in response to interventions is crucial for\u0000deciding when and to whom interventions should be allocated. Using data from a\u0000real-world maternal health program, we demonstrate how a cognitive model based\u0000on Instance-Based Learning (IBL) Theory can augment existing purely\u0000computational approaches. Our findings show that, compared to general\u0000time-series forecasters (e.g., LSTMs), IBL models, which reflect human\u0000decision-making processes, better predict the dynamics of individuals' states.\u0000Additionally, IBL provides estimates of the volatility in individuals' states\u0000and their sensitivity to interventions, which can improve the efficiency of\u0000training of other time series models.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles TrafficGamer:利用博弈论规则为安全关键场景提供可靠灵活的交通模拟
Pub Date : 2024-08-28 DOI: arxiv-2408.15538
Guanren Qiao, Guorui Quan, Jiawei Yu, Shujun Jia, Guiliang Liu
While modern Autonomous Vehicle (AV) systems can develop reliable drivingpolicies under regular traffic conditions, they frequently struggle withsafety-critical traffic scenarios. This difficulty primarily arises from therarity of such scenarios in driving datasets and the complexities associatedwith predictive modeling among multiple vehicles. To support the testing andrefinement of AV policies, simulating safety-critical traffic events is anessential challenge to be addressed. In this work, we introduce TrafficGamer,which facilitates game-theoretic traffic simulation by viewing common roaddriving as a multi-agent game. In evaluating the empirical performance acrossvarious real-world datasets, TrafficGamer ensures both fidelity andexploitability of the simulated scenarios, guaranteeing that they not onlystatically align with real-world traffic distribution but also efficientlycapture equilibriums for representing safety-critical scenarios involvingmultiple agents. Additionally, the results demonstrate that TrafficGamerexhibits highly flexible simulation across various contexts. Specifically, wedemonstrate that the generated scenarios can dynamically adapt to equilibriumsof varying tightness by configuring risk-sensitive constraints duringoptimization. To the best of our knowledge, TrafficGamer is the first simulatorcapable of generating diverse traffic scenarios involving multiple agents. Wehave provided a demo webpage for the project athttps://qiaoguanren.github.io/trafficgamer-demo/.
虽然现代自动驾驶汽车(AV)系统能够在常规交通条件下制定可靠的驾驶政策,但在安全至关重要的交通场景下,它们经常会陷入困境。这种困难主要源于驾驶数据集中此类场景的稀缺性以及与多车辆间预测建模相关的复杂性。为了支持测试和完善自动驾驶汽车政策,模拟安全关键交通事件是一项亟待解决的挑战。在这项工作中,我们引入了 TrafficGamer,它通过将普通道路驾驶视为多代理游戏,促进了游戏理论交通模拟。在对各种真实世界数据集的实证性能进行评估时,TrafficGamer 确保了模拟场景的保真度和可利用性,保证了这些场景不仅能与真实世界的交通分布保持一致,还能有效捕捉到代表涉及多个代理的安全关键场景的均衡点。此外,研究结果表明,TrafficGamerex 在各种情况下都能进行高度灵活的模拟。具体来说,我们证明了通过在优化过程中配置风险敏感约束,生成的场景可以动态适应不同松紧度的均衡。据我们所知,TrafficGamer 是第一个能够生成涉及多个代理的各种交通场景的模拟器。我们提供了该项目的演示网页:https://qiaoguanren.github.io/trafficgamer-demo/。
{"title":"TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles","authors":"Guanren Qiao, Guorui Quan, Jiawei Yu, Shujun Jia, Guiliang Liu","doi":"arxiv-2408.15538","DOIUrl":"https://doi.org/arxiv-2408.15538","url":null,"abstract":"While modern Autonomous Vehicle (AV) systems can develop reliable driving\u0000policies under regular traffic conditions, they frequently struggle with\u0000safety-critical traffic scenarios. This difficulty primarily arises from the\u0000rarity of such scenarios in driving datasets and the complexities associated\u0000with predictive modeling among multiple vehicles. To support the testing and\u0000refinement of AV policies, simulating safety-critical traffic events is an\u0000essential challenge to be addressed. In this work, we introduce TrafficGamer,\u0000which facilitates game-theoretic traffic simulation by viewing common road\u0000driving as a multi-agent game. In evaluating the empirical performance across\u0000various real-world datasets, TrafficGamer ensures both fidelity and\u0000exploitability of the simulated scenarios, guaranteeing that they not only\u0000statically align with real-world traffic distribution but also efficiently\u0000capture equilibriums for representing safety-critical scenarios involving\u0000multiple agents. Additionally, the results demonstrate that TrafficGamer\u0000exhibits highly flexible simulation across various contexts. Specifically, we\u0000demonstrate that the generated scenarios can dynamically adapt to equilibriums\u0000of varying tightness by configuring risk-sensitive constraints during\u0000optimization. To the best of our knowledge, TrafficGamer is the first simulator\u0000capable of generating diverse traffic scenarios involving multiple agents. We\u0000have provided a demo webpage for the project at\u0000https://qiaoguanren.github.io/trafficgamer-demo/.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Attention Inference of Network Topology in Multi-Agent Systems 多代理系统中网络拓扑的图注意推理
Pub Date : 2024-08-27 DOI: arxiv-2408.15449
Akshay Kolli, Reza Azadeh, Kshitj Jerath
Accurately identifying the underlying graph structures of multi-agent systemsremains a difficult challenge. Our work introduces a novel machinelearning-based solution that leverages the attention mechanism to predictfuture states of multi-agent systems by learning node representations. Thegraph structure is then inferred from the strength of the attention values.This approach is applied to both linear consensus dynamics and the non-lineardynamics of Kuramoto oscillators, resulting in implicit learning the graph bylearning good agent representations. Our results demonstrate that the presenteddata-driven graph attention machine learning model can identify the networktopology in multi-agent systems, even when the underlying dynamic model is notknown, as evidenced by the F1 scores achieved in the link prediction.
准确识别多代理系统的底层图结构仍然是一项艰巨的挑战。我们的工作引入了一种新颖的基于机器学习的解决方案,它利用注意力机制,通过学习节点表征来预测多机器人系统的未来状态。这种方法既适用于线性共识动力学,也适用于仓本振荡器的非线性动力学,从而通过学习良好的代理表征来隐式学习图谱。我们的研究结果表明,提出的数据驱动图注意力机器学习模型可以识别多代理系统中的网络拓扑结构,即使在不知道底层动态模型的情况下也是如此,这一点可以从链接预测中获得的 F1 分数得到证明。
{"title":"Graph Attention Inference of Network Topology in Multi-Agent Systems","authors":"Akshay Kolli, Reza Azadeh, Kshitj Jerath","doi":"arxiv-2408.15449","DOIUrl":"https://doi.org/arxiv-2408.15449","url":null,"abstract":"Accurately identifying the underlying graph structures of multi-agent systems\u0000remains a difficult challenge. Our work introduces a novel machine\u0000learning-based solution that leverages the attention mechanism to predict\u0000future states of multi-agent systems by learning node representations. The\u0000graph structure is then inferred from the strength of the attention values.\u0000This approach is applied to both linear consensus dynamics and the non-linear\u0000dynamics of Kuramoto oscillators, resulting in implicit learning the graph by\u0000learning good agent representations. Our results demonstrate that the presented\u0000data-driven graph attention machine learning model can identify the network\u0000topology in multi-agent systems, even when the underlying dynamic model is not\u0000known, as evidenced by the F1 scores achieved in the link prediction.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized Unlabeled Multi-agent Pathfinding Via Target And Priority Swapping (With Supplementary) 通过目标和优先级交换实现分散式无标记多代理寻路(附补充内容)
Pub Date : 2024-08-27 DOI: arxiv-2408.14948
Stepan Dergachev, Konstantin Yakovlev
In this paper we study a challenging variant of the multi-agent pathfindingproblem (MAPF), when a set of agents must reach a set of goal locations, but itdoes not matter which agent reaches a specific goal - Anonymous MAPF (AMAPF).Current optimal and suboptimal AMAPF solvers rely on the existence of acentralized controller which is in charge of both target assignment andpathfinding. We extend the state of the art and present the first AMAPF solvercapable of solving the problem at hand in a fully decentralized fashion, wheneach agent makes decisions individually and relies only on the localcommunication with the others. The core of our method is a priority and targetswapping procedure tailored to produce consistent goal assignments (i.e. makingsure that no two agents are heading towards the same goal). Coupled with anestablished rule-based path planning, we end up with a TP-SWAP, an efficientand flexible approach to solve decentralized AMAPF. On the theoretical side, weprove that TP-SWAP is complete (i.e. TP-SWAP guarantees that each target willbe reached by some agent). Empirically, we evaluate TP-SWAP across a wide rangeof setups and compare it to both centralized and decentralized baselines.Indeed, TP-SWAP outperforms the fully-decentralized competitor and can evenoutperform the semi-decentralized one (i.e. the one relying on the initialconsistent goal assignment) in terms of flowtime (a widespread cost objectivein MAPF
在本文中,我们研究了多代理寻路问题(MAPF)的一个具有挑战性的变体,即一组代理必须到达一组目标位置,但哪个代理到达特定目标并不重要--匿名 MAPF(AMAPF)。目前最优和次优的 AMAPF 求解器都依赖于集中式控制器的存在,该控制器负责目标分配和寻路。我们扩展了这一技术领域,提出了首个能够以完全分散的方式解决当前问题的 AMAPF 求解器,即每个代理单独做出决策,仅依赖与其他代理的本地通信。我们方法的核心是一个优先级和目标交换程序,该程序专门用于产生一致的目标分配(即确保没有两个代理朝着同一个目标前进)。与基于规则的既定路径规划相结合,我们最终得到了 TP-SWAP,一种高效灵活的方法,用于解决分散式 AMAPF。在理论方面,我们证明 TP-SWAP 是完整的(即 TP-SWAP 保证每个目标都将由某个代理到达)。事实上,TP-SWAP 在流量时间(MAPF 中的一个普遍成本目标)方面优于完全分散的竞争者,甚至优于半分散的竞争者(即依赖于初始一致目标分配的竞争者)。
{"title":"Decentralized Unlabeled Multi-agent Pathfinding Via Target And Priority Swapping (With Supplementary)","authors":"Stepan Dergachev, Konstantin Yakovlev","doi":"arxiv-2408.14948","DOIUrl":"https://doi.org/arxiv-2408.14948","url":null,"abstract":"In this paper we study a challenging variant of the multi-agent pathfinding\u0000problem (MAPF), when a set of agents must reach a set of goal locations, but it\u0000does not matter which agent reaches a specific goal - Anonymous MAPF (AMAPF).\u0000Current optimal and suboptimal AMAPF solvers rely on the existence of a\u0000centralized controller which is in charge of both target assignment and\u0000pathfinding. We extend the state of the art and present the first AMAPF solver\u0000capable of solving the problem at hand in a fully decentralized fashion, when\u0000each agent makes decisions individually and relies only on the local\u0000communication with the others. The core of our method is a priority and target\u0000swapping procedure tailored to produce consistent goal assignments (i.e. making\u0000sure that no two agents are heading towards the same goal). Coupled with an\u0000established rule-based path planning, we end up with a TP-SWAP, an efficient\u0000and flexible approach to solve decentralized AMAPF. On the theoretical side, we\u0000prove that TP-SWAP is complete (i.e. TP-SWAP guarantees that each target will\u0000be reached by some agent). Empirically, we evaluate TP-SWAP across a wide range\u0000of setups and compare it to both centralized and decentralized baselines.\u0000Indeed, TP-SWAP outperforms the fully-decentralized competitor and can even\u0000outperform the semi-decentralized one (i.e. the one relying on the initial\u0000consistent goal assignment) in terms of flowtime (a widespread cost objective\u0000in MAPF","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective 智能仓库的多代理目标分配和路径查找:多智能体深度强化学习的合作视角
Pub Date : 2024-08-25 DOI: arxiv-2408.13750
Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li
Multi-agent target assignment and path planning (TAPF) are two key problemsin intelligent warehouse. However, most literature only addresses one of thesetwo problems separately. In this study, we propose a method to simultaneouslysolve target assignment and path planning from a perspective of cooperativemulti-agent deep reinforcement learning (RL). To the best of our knowledge,this is the first work to model the TAPF problem for intelligent warehouse tocooperative multi-agent deep RL, and the first to simultaneously address TAPFbased on multi-agent deep RL. Furthermore, previous literature rarely considersthe physical dynamics of agents. In this study, the physical dynamics of theagents is considered. Experimental results show that our method performs wellin various task settings, which means that the target assignment is solvedreasonably well and the planned path is almost shortest. Moreover, our methodis more time-efficient than baselines.
多机器人目标分配和路径规划(TAPF)是智能仓库中的两个关键问题。然而,大多数文献只单独解决了其中一个问题。在本研究中,我们从合作式多机器人深度强化学习(RL)的角度出发,提出了一种同时解决目标分配和路径规划问题的方法。据我们所知,这是第一项将智能仓库的 TAPF 问题建模为合作多智能体深度强化学习的工作,也是第一项基于多智能体深度强化学习同时解决 TAPF 问题的工作。此外,以前的文献很少考虑代理的物理动态。本研究考虑了代理的物理动态。实验结果表明,我们的方法在各种任务设置中表现良好,这意味着目标分配得到了合理的解决,规划路径几乎是最短的。此外,我们的方法比基准方法更省时。
{"title":"Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective","authors":"Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li","doi":"arxiv-2408.13750","DOIUrl":"https://doi.org/arxiv-2408.13750","url":null,"abstract":"Multi-agent target assignment and path planning (TAPF) are two key problems\u0000in intelligent warehouse. However, most literature only addresses one of these\u0000two problems separately. In this study, we propose a method to simultaneously\u0000solve target assignment and path planning from a perspective of cooperative\u0000multi-agent deep reinforcement learning (RL). To the best of our knowledge,\u0000this is the first work to model the TAPF problem for intelligent warehouse to\u0000cooperative multi-agent deep RL, and the first to simultaneously address TAPF\u0000based on multi-agent deep RL. Furthermore, previous literature rarely considers\u0000the physical dynamics of agents. In this study, the physical dynamics of the\u0000agents is considered. Experimental results show that our method performs well\u0000in various task settings, which means that the target assignment is solved\u0000reasonably well and the planned path is almost shortest. Moreover, our method\u0000is more time-efficient than baselines.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reaching New Heights in Multi-Agent Collective Construction 达到多代理集体建设的新高度
Pub Date : 2024-08-24 DOI: arxiv-2408.13615
Martin Rameš, Pavel Surynek
We propose a new approach for multi-agent collective construction, based onthe idea of reversible ramps. Our ReRamp algorithm utilizes reversibleside-ramps to generate construction plans for ramped block structures higherand larger than was previously possible using state-of-the-art planningalgorithms, given the same building area. We compare the ReRamp algorithm tosimilar state-of-the-art algorithms on a set of benchmark instances, where wedemonstrate its superior computational speed. We also establish in ourexperiments that the ReRamp algorithm is capable of generating plans for asingle-story house, an important milestone on the road to real-worldmulti-agent construction applications.
我们基于可逆坡道的理念,提出了一种新的多机器人集体施工方法。在建筑面积相同的情况下,我们的 ReRamp 算法利用可逆侧斜坡生成斜坡块结构的施工计划,比以前使用最先进的规划算法生成的计划更高、更大。我们在一组基准实例上将 ReRamp 算法与最先进的同类算法进行了比较,结果表明 ReRamp 算法的计算速度更胜一筹。我们还在实验中证实,ReRamp 算法能够生成单层房屋的规划,这是通往真实世界多智能体建筑应用道路上的一个重要里程碑。
{"title":"Reaching New Heights in Multi-Agent Collective Construction","authors":"Martin Rameš, Pavel Surynek","doi":"arxiv-2408.13615","DOIUrl":"https://doi.org/arxiv-2408.13615","url":null,"abstract":"We propose a new approach for multi-agent collective construction, based on\u0000the idea of reversible ramps. Our ReRamp algorithm utilizes reversible\u0000side-ramps to generate construction plans for ramped block structures higher\u0000and larger than was previously possible using state-of-the-art planning\u0000algorithms, given the same building area. We compare the ReRamp algorithm to\u0000similar state-of-the-art algorithms on a set of benchmark instances, where we\u0000demonstrate its superior computational speed. We also establish in our\u0000experiments that the ReRamp algorithm is capable of generating plans for a\u0000single-story house, an important milestone on the road to real-world\u0000multi-agent construction applications.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning 多代理强化学习中增强多任务泛化的混合训练
Pub Date : 2024-08-24 DOI: arxiv-2408.13567
Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti
In multi-agent reinforcement learning (MARL), achieving multi-taskgeneralization to diverse agents and objectives presents significantchallenges. Existing online MARL algorithms primarily focus on single-taskperformance, but their lack of multi-task generalization capabilities typicallyresults in substantial computational waste and limited real-life applicability.Meanwhile, existing offline multi-task MARL approaches are heavily dependent ondata quality, often resulting in poor performance on unseen tasks. In thispaper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training forEnhanced Multi-Task Generalization, which integrates online and offlinelearning to ensure both multi-task generalization and training efficiency.Specifically, our framework extracts potential general skills from offlinemulti-task datasets. We then train policies to select the optimal skills underthe centralized training and decentralized execution paradigm (CTDE). Duringthis stage, we utilize a replay buffer that integrates both offline data andonline interactions. We empirically demonstrate that our framework effectivelyextracts and refines general skills, yielding impressive generalization tounseen tasks. Comparative analyses on the StarCraft multi-agent challenge showthat HyGen outperforms a wide range of existing solely online and offlinemethods.
在多代理强化学习(MARL)中,实现对不同代理和目标的多任务泛化是一项重大挑战。现有的在线 MARL 算法主要集中在单任务性能上,但它们缺乏多任务泛化能力,通常会造成大量的计算浪费,而且在现实生活中的适用性有限。在本文中,我们介绍了一种新颖的混合 MARL 框架 HyGen,即 "增强多任务泛化的混合训练"(Hybrid Training forEnhanced Multi-Task Generalization),它整合了在线和离线学习,以确保多任务泛化和训练效率。然后,我们在集中训练和分散执行范式(CTDE)下训练策略,以选择最佳技能。在这一阶段,我们利用重放缓冲器整合离线数据和在线交互。我们通过经验证明,我们的框架能够有效地提取和完善通用技能,并在所见任务中产生令人印象深刻的泛化效果。对《星际争霸》(StarCraft)多机器人挑战赛的比较分析表明,HyGen 的表现优于现有的各种纯线上和离线方法。
{"title":"Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning","authors":"Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Sartoretti","doi":"arxiv-2408.13567","DOIUrl":"https://doi.org/arxiv-2408.13567","url":null,"abstract":"In multi-agent reinforcement learning (MARL), achieving multi-task\u0000generalization to diverse agents and objectives presents significant\u0000challenges. Existing online MARL algorithms primarily focus on single-task\u0000performance, but their lack of multi-task generalization capabilities typically\u0000results in substantial computational waste and limited real-life applicability.\u0000Meanwhile, existing offline multi-task MARL approaches are heavily dependent on\u0000data quality, often resulting in poor performance on unseen tasks. In this\u0000paper, we introduce HyGen, a novel hybrid MARL framework, Hybrid Training for\u0000Enhanced Multi-Task Generalization, which integrates online and offline\u0000learning to ensure both multi-task generalization and training efficiency.\u0000Specifically, our framework extracts potential general skills from offline\u0000multi-task datasets. We then train policies to select the optimal skills under\u0000the centralized training and decentralized execution paradigm (CTDE). During\u0000this stage, we utilize a replay buffer that integrates both offline data and\u0000online interactions. We empirically demonstrate that our framework effectively\u0000extracts and refines general skills, yielding impressive generalization to\u0000unseen tasks. Comparative analyses on the StarCraft multi-agent challenge show\u0000that HyGen outperforms a wide range of existing solely online and offline\u0000methods.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Collaboration of LLM based Agents for Finite Element Analysis 优化有限元分析中基于 LLM 的代理协作
Pub Date : 2024-08-23 DOI: arxiv-2408.13406
Chuan Tian, Yilei Zhang
This paper investigates the interactions between multiple agents within LargeLanguage Models (LLMs) in the context of programming and coding tasks. Weutilize the AutoGen framework to facilitate communication among agents,evaluating different configurations based on the success rates from 40 randomruns for each setup. The study focuses on developing a flexible automationframework for applying the Finite Element Method (FEM) to solve linear elasticproblems. Our findings emphasize the importance of optimizing agent roles andclearly defining their responsibilities, rather than merely increasing thenumber of agents. Effective collaboration among agents is shown to be crucialfor addressing general FEM challenges. This research demonstrates the potentialof LLM multi-agent systems to enhance computational automation in simulationmethodologies, paving the way for future advancements in engineering andartificial intelligence.
本文以编程和编码任务为背景,研究了大型语言模型(LLM)中多个代理之间的交互。我们利用 AutoGen 框架来促进代理之间的交流,并根据每个设置的 40 次随机运行的成功率来评估不同的配置。研究重点是开发一个灵活的自动化框架,用于应用有限元法(FEM)解决线性弹性问题。我们的研究结果强调了优化代理角色和明确定义其职责的重要性,而不仅仅是增加代理数量。研究表明,代理之间的有效协作对于解决一般有限元难题至关重要。这项研究证明了 LLM 多代理系统在提高仿真方法计算自动化方面的潜力,为未来工程和人工智能领域的进步铺平了道路。
{"title":"Optimizing Collaboration of LLM based Agents for Finite Element Analysis","authors":"Chuan Tian, Yilei Zhang","doi":"arxiv-2408.13406","DOIUrl":"https://doi.org/arxiv-2408.13406","url":null,"abstract":"This paper investigates the interactions between multiple agents within Large\u0000Language Models (LLMs) in the context of programming and coding tasks. We\u0000utilize the AutoGen framework to facilitate communication among agents,\u0000evaluating different configurations based on the success rates from 40 random\u0000runs for each setup. The study focuses on developing a flexible automation\u0000framework for applying the Finite Element Method (FEM) to solve linear elastic\u0000problems. Our findings emphasize the importance of optimizing agent roles and\u0000clearly defining their responsibilities, rather than merely increasing the\u0000number of agents. Effective collaboration among agents is shown to be crucial\u0000for addressing general FEM challenges. This research demonstrates the potential\u0000of LLM multi-agent systems to enhance computational automation in simulation\u0000methodologies, paving the way for future advancements in engineering and\u0000artificial intelligence.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEDCO: Medical Education Copilots Based on A Multi-Agent Framework MEDCO:基于多代理框架的医学教育协同机器人
Pub Date : 2024-08-22 DOI: arxiv-2408.12496
Hao Wei, Jianing Qiu, Haibao Yu, Wu Yuan
Large language models (LLMs) have had a significant impact on diverseresearch domains, including medicine and healthcare. However, the potential ofLLMs as copilots in medical education remains underexplored. CurrentAI-assisted educational tools are limited by their solitary learning approachand inability to simulate the multi-disciplinary and interactive nature ofactual medical training. To address these limitations, we propose MEDCO(Medical EDucation COpilots), a novel multi-agent-based copilot systemspecially developed to emulate real-world medical training environments. MEDCOincorporates three primary agents: an agentic patient, an expert doctor, and aradiologist, facilitating a multi-modal and interactive learning environment.Our framework emphasizes the learning of proficient question-asking skills,multi-disciplinary collaboration, and peer discussions between students. Ourexperiments show that simulated virtual students who underwent training withMEDCO not only achieved substantial performance enhancements comparable tothose of advanced models, but also demonstrated human-like learning behaviorsand improvements, coupled with an increase in the number of learning samples.This work contributes to medical education by introducing a copilot thatimplements an interactive and collaborative learning approach. It also providesvaluable insights into the effectiveness of AI-integrated training paradigms.
大型语言模型(LLMs)已经对包括医学和医疗保健在内的多个研究领域产生了重大影响。然而,LLMs 在医学教育中作为副驾驶的潜力仍未得到充分发掘。目前的人工智能辅助教育工具受到其单独学习方法的限制,无法模拟实际医学培训的多学科和互动性质。为了解决这些局限性,我们提出了 MEDCO(Medical EDucation COpilots),这是一种基于多机器人的新型副驾驶系统,专门为模拟真实世界的医学培训环境而开发。MEDCO 包含三个主要代理:代理病人、专家医生和放射科医生,为多模式互动学习环境提供了便利。我们的框架强调学习熟练的提问技能、多学科协作和学生之间的同伴讨论。我们的框架强调学习熟练的提问技能、多学科协作和学生之间的同伴讨论。我们的实验表明,接受过 MEDCO 培训的模拟虚拟学生不仅取得了可与高级模型相媲美的大幅性能提升,而且还表现出了类似人类的学习行为和改进,同时增加了学习样本的数量。这项工作通过引入一种采用交互式协作学习方法的副驾驶,为医学教育做出了贡献,同时也为人工智能集成培训范例的有效性提供了宝贵的见解。
{"title":"MEDCO: Medical Education Copilots Based on A Multi-Agent Framework","authors":"Hao Wei, Jianing Qiu, Haibao Yu, Wu Yuan","doi":"arxiv-2408.12496","DOIUrl":"https://doi.org/arxiv-2408.12496","url":null,"abstract":"Large language models (LLMs) have had a significant impact on diverse\u0000research domains, including medicine and healthcare. However, the potential of\u0000LLMs as copilots in medical education remains underexplored. Current\u0000AI-assisted educational tools are limited by their solitary learning approach\u0000and inability to simulate the multi-disciplinary and interactive nature of\u0000actual medical training. To address these limitations, we propose MEDCO\u0000(Medical EDucation COpilots), a novel multi-agent-based copilot system\u0000specially developed to emulate real-world medical training environments. MEDCO\u0000incorporates three primary agents: an agentic patient, an expert doctor, and a\u0000radiologist, facilitating a multi-modal and interactive learning environment.\u0000Our framework emphasizes the learning of proficient question-asking skills,\u0000multi-disciplinary collaboration, and peer discussions between students. Our\u0000experiments show that simulated virtual students who underwent training with\u0000MEDCO not only achieved substantial performance enhancements comparable to\u0000those of advanced models, but also demonstrated human-like learning behaviors\u0000and improvements, coupled with an increase in the number of learning samples.\u0000This work contributes to medical education by introducing a copilot that\u0000implements an interactive and collaborative learning approach. It also provides\u0000valuable insights into the effectiveness of AI-integrated training paradigms.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards 平衡法:LLM 设计的不安定强盗奖励的优先级策略
Pub Date : 2024-08-22 DOI: arxiv-2408.12112
Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe
LLMs are increasingly used to design reward functions based on humanpreferences in Reinforcement Learning (RL). We focus on LLM-designed rewardsfor Restless Multi-Armed Bandits, a framework for allocating limited resourcesamong agents. In applications such as public health, this approach empowersgrassroots health workers to tailor automated allocation decisions to communityneeds. In the presence of multiple agents, altering the reward function basedon human preferences can impact subpopulations very differently, leading tocomplex tradeoffs and a multi-objective resource allocation problem. We are thefirst to present a principled method termed Social Choice Language Model fordealing with these tradeoffs for LLM-designed rewards for multiagent plannersin general and restless bandits in particular. The novel part of our model is atransparent and configurable selection component, called an adjudicator,external to the LLM that controls complex tradeoffs via a user-selected socialwelfare function. Our experiments demonstrate that our model reliably selectsmore effective, aligned, and balanced reward functions compared to purelyLLM-based approaches.
在强化学习(RL)中,LLM 越来越多地被用来设计基于人类偏好的奖励函数。我们将重点放在为无休止多臂强盗(Restless Multi-Armed Bandits)设计的 LLM 奖励上,这是一种在代理之间分配有限资源的框架。在公共卫生等应用中,这种方法能让基层卫生工作者根据社区需求做出自动分配决策。在存在多个代理的情况下,根据人类偏好改变奖励函数会对子群体产生截然不同的影响,从而导致复杂的权衡和多目标资源分配问题。我们首次提出了一种称为 "社会选择语言模型 "的原则性方法,用于处理 LLM 设计的多代理规划者奖励的这些权衡问题,特别是 "不安分的强盗 "问题。我们模型的新颖之处在于,它是一个透明的、可配置的选择组件,称为 "裁决者",它位于 LLM 外部,通过用户选择的社会福利函数来控制复杂的权衡。我们的实验证明,与纯粹基于 LLM 的方法相比,我们的模型能够可靠地选择更加有效、一致和平衡的奖励函数。
{"title":"Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards","authors":"Shresth Verma, Niclas Boehmer, Lingkai Kong, Milind Tambe","doi":"arxiv-2408.12112","DOIUrl":"https://doi.org/arxiv-2408.12112","url":null,"abstract":"LLMs are increasingly used to design reward functions based on human\u0000preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards\u0000for Restless Multi-Armed Bandits, a framework for allocating limited resources\u0000among agents. In applications such as public health, this approach empowers\u0000grassroots health workers to tailor automated allocation decisions to community\u0000needs. In the presence of multiple agents, altering the reward function based\u0000on human preferences can impact subpopulations very differently, leading to\u0000complex tradeoffs and a multi-objective resource allocation problem. We are the\u0000first to present a principled method termed Social Choice Language Model for\u0000dealing with these tradeoffs for LLM-designed rewards for multiagent planners\u0000in general and restless bandits in particular. The novel part of our model is a\u0000transparent and configurable selection component, called an adjudicator,\u0000external to the LLM that controls complex tradeoffs via a user-selected social\u0000welfare function. Our experiments demonstrate that our model reliably selects\u0000more effective, aligned, and balanced reward functions compared to purely\u0000LLM-based approaches.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multiagent Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1