{"title":"High-Sample-Efficient Multiagent Reinforcement Learning for Navigation and Collision Avoidance of UAV Swarms in Multitask Environments","authors":"Jiaming Cheng;Ni Li;Ban Wang;Shuhui Bu;Ming Zhou","doi":"10.1109/JIOT.2024.3409169","DOIUrl":null,"url":null,"abstract":"Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":null,"pages":null},"PeriodicalIF":8.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10747043/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.
期刊介绍:
The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.