多智能体群体控制传递强化学习算法研究

Q3 Engineering 西北工业大学学报 Pub Date : 2023-04-01 DOI:10.1051/jnwpu/20234120389
Penglin Hu, Q. Pan, Yaning Guo, Chunhui Zhao
{"title":"多智能体群体控制传递强化学习算法研究","authors":"Penglin Hu, Q. Pan, Yaning Guo, Chunhui Zhao","doi":"10.1051/jnwpu/20234120389","DOIUrl":null,"url":null,"abstract":"Considering the obstacle avoidance and collision avoidance for multi-agent cooperative formation in multi-obstacle environment, a formation control algorithm based on transfer learning and reinforcement learning is proposed. Firstly, in the source task learning stage, the large storage space required by Q-table solution is avoided by using the value function approximation method, which effectively reduces the storage space requirement and improves the solving speed of the algorithm. Secondly, in the learning phase of the target task, Gaussian clustering algorithm was used to classify the source tasks. According to the distance between the clustering center and the target task, the optimal source task class was selected for target task learning, which effectively avoided the negative transfer phenomenon, and improved the generalization ability and convergence speed of reinforcement learning algorithm. Finally, the simulation results show that this method can effectively form and maintain formation configuration of multi-agent system in complex environment with obstacles, and realize obstacle avoidance and collision avoidance at the same time.","PeriodicalId":39691,"journal":{"name":"西北工业大学学报","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on learning algorithm of transfer reinforcement for multi-agent formation control\",\"authors\":\"Penglin Hu, Q. Pan, Yaning Guo, Chunhui Zhao\",\"doi\":\"10.1051/jnwpu/20234120389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Considering the obstacle avoidance and collision avoidance for multi-agent cooperative formation in multi-obstacle environment, a formation control algorithm based on transfer learning and reinforcement learning is proposed. Firstly, in the source task learning stage, the large storage space required by Q-table solution is avoided by using the value function approximation method, which effectively reduces the storage space requirement and improves the solving speed of the algorithm. Secondly, in the learning phase of the target task, Gaussian clustering algorithm was used to classify the source tasks. According to the distance between the clustering center and the target task, the optimal source task class was selected for target task learning, which effectively avoided the negative transfer phenomenon, and improved the generalization ability and convergence speed of reinforcement learning algorithm. Finally, the simulation results show that this method can effectively form and maintain formation configuration of multi-agent system in complex environment with obstacles, and realize obstacle avoidance and collision avoidance at the same time.\",\"PeriodicalId\":39691,\"journal\":{\"name\":\"西北工业大学学报\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"西北工业大学学报\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.1051/jnwpu/20234120389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"西北工业大学学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1051/jnwpu/20234120389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

考虑到多障碍环境下多智能体协同编队的避障和防撞问题,提出了一种基于迁移学习和强化学习的编队控制算法。首先,在源任务学习阶段,使用值函数近似方法避免了Q表求解所需的大存储空间,有效地降低了存储空间需求,提高了算法的求解速度。其次,在目标任务的学习阶段,采用高斯聚类算法对源任务进行分类。根据聚类中心与目标任务的距离,选择最优的源任务类进行目标任务学习,有效避免了负迁移现象,提高了强化学习算法的泛化能力和收敛速度。最后,仿真结果表明,该方法能够在有障碍物的复杂环境中有效地形成和保持多智能体系统的编队配置,同时实现避障和防撞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Study on learning algorithm of transfer reinforcement for multi-agent formation control
Considering the obstacle avoidance and collision avoidance for multi-agent cooperative formation in multi-obstacle environment, a formation control algorithm based on transfer learning and reinforcement learning is proposed. Firstly, in the source task learning stage, the large storage space required by Q-table solution is avoided by using the value function approximation method, which effectively reduces the storage space requirement and improves the solving speed of the algorithm. Secondly, in the learning phase of the target task, Gaussian clustering algorithm was used to classify the source tasks. According to the distance between the clustering center and the target task, the optimal source task class was selected for target task learning, which effectively avoided the negative transfer phenomenon, and improved the generalization ability and convergence speed of reinforcement learning algorithm. Finally, the simulation results show that this method can effectively form and maintain formation configuration of multi-agent system in complex environment with obstacles, and realize obstacle avoidance and collision avoidance at the same time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
西北工业大学学报
西北工业大学学报 Engineering-Engineering (all)
CiteScore
1.30
自引率
0.00%
发文量
6201
审稿时长
12 weeks
期刊介绍:
期刊最新文献
Research on the safe separation corridor of the combined aircraft and its generation method Cracking mechanism analysis and experimental verification of encapsulated module under high low temperature cycle considering residual stress AFDX network equipment fault diagnosis technology MUSIC algorithm based on eigenvalue clustering Target recognition algorithm based on HRRP time-spectrogram feature and multi-scale asymmetric convolutional neural network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1