High-Sample-Efficient Multiagent Reinforcement Learning for Navigation and Collision Avoidance of UAV Swarms in Multitask Environments

IF 8.2 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Internet of Things Journal Pub Date : 2024-11-07 DOI:10.1109/JIOT.2024.3409169
Jiaming Cheng;Ni Li;Ban Wang;Shuhui Bu;Ming Zhou
{"title":"High-Sample-Efficient Multiagent Reinforcement Learning for Navigation and Collision Avoidance of UAV Swarms in Multitask Environments","authors":"Jiaming Cheng;Ni Li;Ban Wang;Shuhui Bu;Ming Zhou","doi":"10.1109/JIOT.2024.3409169","DOIUrl":null,"url":null,"abstract":"Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":null,"pages":null},"PeriodicalIF":8.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10747043/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多任务环境中无人机群导航和避撞的高采样效率多代理强化学习
多代理强化学习(MARL)算法在无人机群等物联网设备中大有可为。然而,大规模蜂群系统具有动态性质,其代理数量和观察到的邻居不断变化,这给 MARL 适应性带来了挑战。现有方法难以提取有意义的特征,而且需要大量经验样本,导致样本效率低、风险率高。此外,这些方法在特定任务场景下有效,但在多任务场景下表现不佳。为了克服这些挑战,本研究为无人机群提出了一种高样本效率和可扩展的 MARL 方法。所提出的方法在策略网络的状态表示方面采用了基于超网络的嵌入注意(HEA)机制,在值函数方面采用了多编码器门控变换器和多层注意(MEGTrMA)机制。HEA 可自动为每个代理生成权重,以适应动态场景,从而增强表示能力和适应性,同时降低试错成本,提高学习效率。MEGTrMA 捕获了每个代理对全局观测的贡献,建立了代理之间的长期依赖关系,促进了多任务场景下的稳定策略学习。仿真结果表明,所提出的方法具有可扩展性、通用性和高样本效率。与从头开始学习相比,我们的方法通过逐步增加无人飞行器及其相应邻居的数量,将训练时间显著缩短到初始时间的五分之一以下。此外,大规模无人机群的平均碰撞次数减少了一个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Internet of Things Journal
IEEE Internet of Things Journal Computer Science-Information Systems
CiteScore
17.60
自引率
13.20%
发文量
1982
期刊介绍: The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.
期刊最新文献
Reputation-Driven Asynchronous Federated Learning for Enhanced Trajectory Prediction With Blockchain An Information Theoretic Approach to Distributed Detection for Mobile Wireless Sensor Networks Under Byzantine Attack in Entirely Unknown or Complicated Environment: Design, Analysis, and Evaluation of the Attack Strategy Enabling Distributed Generative Artificial Intelligence in 6G: Mobile Edge Generation OFDM Reference Signal Pattern Design Criteria for Integrated Communication and Sensing Budget-Constrained Resource Allocation and Pricing in VEC: A MSMLMF Stackelberg Game With Contract Incentive Mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1