用于多无人飞行器探索的政策上行动者批判强化学习

Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
{"title":"用于多无人飞行器探索的政策上行动者批判强化学习","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\nfields, including precision agriculture, search and rescue, and remote sensing.\nHowever, exploring unknown environments remains a significant challenge. This\nstudy aims to address this challenge by utilizing on-policy Reinforcement\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\nwith obstacles and each other and do the exploration in a distributed manner.\nThe proposed solution includes actor-critic networks using deep convolutional\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\nUAVs and areas that have already been covered. Compared to other RL techniques,\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\nsimulation results demonstrate the superiority of the proposed PPO approach.\nAlso, the results show that combining LSTM with CNN in critic can improve\nexploration. Since the proposed exploration has to work in unknown\nenvironments, the results showed that the proposed setup can complete the\ncoverage when we have new maps that differ from the trained maps. Finally, we\nshowed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration\",\"authors\":\"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub\",\"doi\":\"arxiv-2409.11058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\\nfields, including precision agriculture, search and rescue, and remote sensing.\\nHowever, exploring unknown environments remains a significant challenge. This\\nstudy aims to address this challenge by utilizing on-policy Reinforcement\\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\\nwith obstacles and each other and do the exploration in a distributed manner.\\nThe proposed solution includes actor-critic networks using deep convolutional\\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\\nUAVs and areas that have already been covered. Compared to other RL techniques,\\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\\nsimulation results demonstrate the superiority of the proposed PPO approach.\\nAlso, the results show that combining LSTM with CNN in critic can improve\\nexploration. Since the proposed exploration has to work in unknown\\nenvironments, the results showed that the proposed setup can complete the\\ncoverage when we have new maps that differ from the trained maps. Finally, we\\nshowed how tuning hyper parameters may affect the overall performance.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

无人驾驶飞行器(UAV)在精准农业、搜救和遥感等各个领域越来越受欢迎。然而,探索未知环境仍然是一项重大挑战。本研究旨在利用策略强化学习(RL)和近端策略优化(PPO)来解决这一难题,利用多架无人飞行器探索{二维}感兴趣的区域。所提出的解决方案包括使用深度卷积神经网络 {(CNN)} 和长短期记忆(LSTM)的行动者批判网络,用于识别无人机和已覆盖区域。与其他 RL 技术(如策略梯度(PG)和异步优势行动者批判(A3C))相比,仿真结果证明了所提出的 PPO 方法的优越性。由于提议的探索必须在未知环境中工作,结果表明,当我们获得与训练地图不同的新地图时,提议的设置可以完成覆盖。最后,我们展示了调整超参数会如何影响整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1