GDT：基于自适应分组动态拓扑空间的多代理强化学习框架

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Sciences Pub Date : 2024-11-15 DOI:10.1016/j.ins.2024.121646

Licheng Sun , Hongbin Ma , Zhentao Guo

{"title":"GDT：基于自适应分组动态拓扑空间的多代理强化学习框架","authors":"Licheng Sun , Hongbin Ma , Zhentao Guo","doi":"10.1016/j.ins.2024.121646","DOIUrl":null,"url":null,"abstract":"<div><div>In many real-world scenarios, tasks involve coordinating multiple agents, such as managing robot clusters, drone swarms, and autonomous vehicles. These tasks are commonly addressed using Multi-Agent Reinforcement Learning (MARL). However, existing MARL algorithms often lack foresight regarding the number and types of agents involved, requiring agents to generalize across various task configurations. This may lead to suboptimal performance due to underestimated action values and the selection of less effective joint policies. To address these challenges, we propose a novel multi-agent deep reinforcement learning framework, called multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space (GDT). GDT utilizes a group mesh topology to interconnect the local action value functions of each agent, enabling effective coordination and knowledge sharing among agents. By computing three different interpretations of action value functions, GDT overcomes monotonicity constraints and derives more effective overall action value functions. Additionally, GDT groups agents with high similarity to facilitate parameter sharing, thereby enhancing knowledge transfer and generalization across different scenarios. Furthermore, GDT introduces a strategy regularization method for optimal exploration of multiple action spaces. This method assigns each agent an independent entropy temperature during exploration, enabling agents to efficiently explore potential actions and approximate total state values. Experimental results demonstrate that our approach, termed GDT, significantly outperforms state-of-the-art algorithms on Google Research Football (GRF) and the StarCraft Multi-Agent Challenge (SMAC). Particularly in SMAC tasks, GDT achieves a success rate of nearly 100% across almost all Hard Map and Super Hard Map scenarios. Additionally, we validate the effectiveness of our algorithm on Non-monotonic Matrix Games.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"691 ","pages":"Article 121646"},"PeriodicalIF":8.1000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GDT: Multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space\",\"authors\":\"Licheng Sun , Hongbin Ma , Zhentao Guo\",\"doi\":\"10.1016/j.ins.2024.121646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In many real-world scenarios, tasks involve coordinating multiple agents, such as managing robot clusters, drone swarms, and autonomous vehicles. These tasks are commonly addressed using Multi-Agent Reinforcement Learning (MARL). However, existing MARL algorithms often lack foresight regarding the number and types of agents involved, requiring agents to generalize across various task configurations. This may lead to suboptimal performance due to underestimated action values and the selection of less effective joint policies. To address these challenges, we propose a novel multi-agent deep reinforcement learning framework, called multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space (GDT). GDT utilizes a group mesh topology to interconnect the local action value functions of each agent, enabling effective coordination and knowledge sharing among agents. By computing three different interpretations of action value functions, GDT overcomes monotonicity constraints and derives more effective overall action value functions. Additionally, GDT groups agents with high similarity to facilitate parameter sharing, thereby enhancing knowledge transfer and generalization across different scenarios. Furthermore, GDT introduces a strategy regularization method for optimal exploration of multiple action spaces. This method assigns each agent an independent entropy temperature during exploration, enabling agents to efficiently explore potential actions and approximate total state values. Experimental results demonstrate that our approach, termed GDT, significantly outperforms state-of-the-art algorithms on Google Research Football (GRF) and the StarCraft Multi-Agent Challenge (SMAC). Particularly in SMAC tasks, GDT achieves a success rate of nearly 100% across almost all Hard Map and Super Hard Map scenarios. Additionally, we validate the effectiveness of our algorithm on Non-monotonic Matrix Games.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"691 \",\"pages\":\"Article 121646\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025524015603\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524015603","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在现实世界的许多场景中，任务都涉及协调多个代理，例如管理机器人集群、无人机群和自动驾驶汽车。这些任务通常使用多代理强化学习（MARL）来解决。然而，现有的多代理强化学习算法往往缺乏对所涉及代理的数量和类型的预见性，要求代理在各种任务配置中进行泛化。由于低估了行动值并选择了效果较差的联合策略，这可能会导致性能不理想。为了应对这些挑战，我们提出了一种新颖的多代理深度强化学习框架，即基于自适应分组动态拓扑空间（GDT）的多代理强化学习框架。GDT 利用组网拓扑结构将每个代理的局部行动值函数相互连接起来，从而实现代理之间的有效协调和知识共享。通过计算行动值函数的三种不同解释，GDT 克服了单调性限制，并推导出更有效的整体行动值函数。此外，GDT 还将具有高度相似性的代理进行分组，以促进参数共享，从而加强不同情景下的知识传递和泛化。此外，GDT 还引入了一种策略正则化方法，用于优化对多个行动空间的探索。该方法在探索过程中为每个代理分配一个独立的熵温，使代理能够高效地探索潜在的行动并近似地计算总状态值。实验结果表明，在谷歌研究足球赛（GRF）和星际争霸多代理挑战赛（SMAC）上，我们的方法（称为 GDT）明显优于最先进的算法。特别是在 SMAC 任务中，GDT 在几乎所有 "高难度地图 "和 "超高难度地图 "场景中的成功率都接近 100%。此外，我们还在非单调矩阵游戏中验证了我们算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GDT: Multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space

In many real-world scenarios, tasks involve coordinating multiple agents, such as managing robot clusters, drone swarms, and autonomous vehicles. These tasks are commonly addressed using Multi-Agent Reinforcement Learning (MARL). However, existing MARL algorithms often lack foresight regarding the number and types of agents involved, requiring agents to generalize across various task configurations. This may lead to suboptimal performance due to underestimated action values and the selection of less effective joint policies. To address these challenges, we propose a novel multi-agent deep reinforcement learning framework, called multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space (GDT). GDT utilizes a group mesh topology to interconnect the local action value functions of each agent, enabling effective coordination and knowledge sharing among agents. By computing three different interpretations of action value functions, GDT overcomes monotonicity constraints and derives more effective overall action value functions. Additionally, GDT groups agents with high similarity to facilitate parameter sharing, thereby enhancing knowledge transfer and generalization across different scenarios. Furthermore, GDT introduces a strategy regularization method for optimal exploration of multiple action spaces. This method assigns each agent an independent entropy temperature during exploration, enabling agents to efficiently explore potential actions and approximate total state values. Experimental results demonstrate that our approach, termed GDT, significantly outperforms state-of-the-art algorithms on Google Research Football (GRF) and the StarCraft Multi-Agent Challenge (SMAC). Particularly in SMAC tasks, GDT achieves a success rate of nearly 100% across almost all Hard Map and Super Hard Map scenarios. Additionally, we validate the effectiveness of our algorithm on Non-monotonic Matrix Games.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.

期刊最新文献

Editorial Board Community structure testing by counting frequent common neighbor sets Finite-time secure synchronization for stochastic complex networks with delayed coupling under deception attacks: A two-step switching control scheme Adaptive granular data compression and interval granulation for efficient classification Introducing fairness in network visualization