XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI:arxiv-2409.11852

Jianye Xu, Omar Sobhy, Bassam Alrifaee

{"title":"XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity","authors":"Jianye Xu, Omar Sobhy, Bassam Alrifaee","doi":"arxiv-2409.11852","DOIUrl":null,"url":null,"abstract":"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\nLearning (MARL), arising from agents simultaneously learning and altering their\npolicies. This creates a non-stationary environment from the perspective of\neach individual agent, often leading to suboptimal or even unconverged learning\noutcomes. We propose an open-source framework named XP-MARL, which augments\nMARL with auxiliary prioritization to address this challenge in cooperative\nsettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\nand letting higher-priority agents establish their actions first would\nstabilize the learning process and thus mitigate non-stationarity and 2)\nenabled by our proposed mechanism called action propagation, where\nhigher-priority agents act first and communicate their actions, providing a\nmore stationary environment for others. Moreover, instead of using a predefined\nor heuristic priority assignment, XP-MARL learns priority-assignment policies\nwith an auxiliary MARL problem, leading to a joint learning scheme. Experiments\nin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\ndemonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\noutperforms a state-of-the-art approach, which improves the baseline by only\n12.8%. Code: github.com/cas-lab-munich/sigmarl","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a predefined or heuristic priority assignment, XP-MARL learns priority-assignment policies with an auxiliary MARL problem, leading to a joint learning scheme. Experiments in a motion-planning scenario involving Connected and Automated Vehicles (CAVs) demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and outperforms a state-of-the-art approach, which improves the baseline by only 12.8%. Code: github.com/cas-lab-munich/sigmarl

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

XP-MARL：多代理强化学习中的辅助优先级以解决非稳定性问题

非稳态性是多代理强化学习（MARL）中的一个基本挑战，它产生于代理同时学习和改变其策略。从单个代理的角度来看，这创造了一个非稳态环境，往往会导致次优甚至不融合的学习结果。我们提出了一个名为 XP-MARL 的开源框架，该框架通过辅助优先级排序来增强 MARL，以应对合作环境中的这一挑战。XP-MARL 1）建立在我们的假设之上，即确定代理的优先级并让优先级较高的代理首先确定其行动将稳定学习过程，从而缓解非稳态问题；2）通过我们提出的行动传播机制得以实现，即优先级较高的代理首先行动并传播其行动，为其他代理提供更稳定的环境。此外，XP-MARL 不使用预定义或启发式优先级分配，而是通过一个辅助 MARL 问题来学习优先级分配策略，从而形成一种联合学习方案。在涉及车联网和自动驾驶汽车（CAV）的运动规划场景中进行的实验表明，XP-MARL 将基线模型的安全性提高了 84.4%，优于最先进的方法，后者仅将基线提高了 12.8%。代码：github.com/cas-lab-munich/sigmarl

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量