样本高效多智能体强化学习及其群集控制演示

Yunbo Qiu, Yuzhu Zhan, Yue Jin, Jian Wang, Xudong Zhang
{"title":"样本高效多智能体强化学习及其群集控制演示","authors":"Yunbo Qiu, Yuzhu Zhan, Yue Jin, Jian Wang, Xudong Zhang","doi":"10.1109/VTC2022-Fall57202.2022.10012835","DOIUrl":null,"url":null,"abstract":"Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.","PeriodicalId":326047,"journal":{"name":"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control\",\"authors\":\"Yunbo Qiu, Yuzhu Zhan, Yue Jin, Jian Wang, Xudong Zhang\",\"doi\":\"10.1109/VTC2022-Fall57202.2022.10012835\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.\",\"PeriodicalId\":326047,\"journal\":{\"name\":\"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VTC2022-Fall57202.2022.10012835\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VTC2022-Fall57202.2022.10012835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在多智能体系统中,如多智能体无人机和多智能体自主水下航行器,群集控制是一个重要的问题,它提高了智能体的协同性和安全性。与传统方法相比,多智能体强化学习(MARL)更灵活地解决了群集控制问题。然而,基于MARL的方法存在样本效率低下的问题,因为它们需要从智能体和环境之间的相互作用中收集大量的经验。我们提出了一种新的MARL预训练方法(PwD-MARL),它可以利用传统方法预先收集的非专家演示来预训练智能体。在预训练过程中,智能体通过MARL和行为克隆同时从演示中学习策略,防止演示过拟合。通过非专家演示的预训练,PwD-MARL在热启动在线MARL过程中提高了样本效率。实验表明,在群集控制问题中,即使演示不好或演示很少,PwD-MARL也能提高样本效率和策略性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control
Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Non-Orthogonal Neighbor Election Random Access for Distributed 6G Wireless Networks Coverage Performance Analysis of Piggyback Mobile IoT in 5G Vehicular Networks Performance Comparison of Error-Control Schemes in Collaborative Multiple-Input Multiple-Output Systems Valuation-Aware Federated Learning: An Auction-Based Approach for User Selection Design of Robust LoS-MIMO Transmission in HAPS Feeder Link
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1