用于多无人飞行器探索的政策上行动者批判强化学习

arXiv - CS - Multiagent Systems Pub Date : 2024-09-17 DOI:arxiv-2409.11058

Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub

{"title":"用于多无人飞行器探索的政策上行动者批判强化学习","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\nfields, including precision agriculture, search and rescue, and remote sensing.\nHowever, exploring unknown environments remains a significant challenge. This\nstudy aims to address this challenge by utilizing on-policy Reinforcement\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\nwith obstacles and each other and do the exploration in a distributed manner.\nThe proposed solution includes actor-critic networks using deep convolutional\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\nUAVs and areas that have already been covered. Compared to other RL techniques,\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\nsimulation results demonstrate the superiority of the proposed PPO approach.\nAlso, the results show that combining LSTM with CNN in critic can improve\nexploration. Since the proposed exploration has to work in unknown\nenvironments, the results showed that the proposed setup can complete the\ncoverage when we have new maps that differ from the trained maps. Finally, we\nshowed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration\",\"authors\":\"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub\",\"doi\":\"arxiv-2409.11058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\\nfields, including precision agriculture, search and rescue, and remote sensing.\\nHowever, exploring unknown environments remains a significant challenge. This\\nstudy aims to address this challenge by utilizing on-policy Reinforcement\\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\\nwith obstacles and each other and do the exploration in a distributed manner.\\nThe proposed solution includes actor-critic networks using deep convolutional\\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\\nUAVs and areas that have already been covered. Compared to other RL techniques,\\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\\nsimulation results demonstrate the superiority of the proposed PPO approach.\\nAlso, the results show that combining LSTM with CNN in critic can improve\\nexploration. Since the proposed exploration has to work in unknown\\nenvironments, the results showed that the proposed setup can complete the\\ncoverage when we have new maps that differ from the trained maps. Finally, we\\nshowed how tuning hyper parameters may affect the overall performance.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

无人驾驶飞行器（UAV）在精准农业、搜救和遥感等各个领域越来越受欢迎。然而，探索未知环境仍然是一项重大挑战。本研究旨在利用策略强化学习（RL）和近端策略优化（PPO）来解决这一难题，利用多架无人飞行器探索{二维}感兴趣的区域。所提出的解决方案包括使用深度卷积神经网络 {(CNN)} 和长短期记忆（LSTM）的行动者批判网络，用于识别无人机和已覆盖区域。与其他 RL 技术（如策略梯度（PG）和异步优势行动者批判（A3C））相比，仿真结果证明了所提出的 PPO 方法的优越性。由于提议的探索必须在未知环境中工作，结果表明，当我们获得与训练地图不同的新地图时，提议的设置可以完成覆盖。最后，我们展示了调整超参数会如何影响整体性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration

Unmanned aerial vehicles (UAVs) have become increasingly popular in various fields, including precision agriculture, search and rescue, and remote sensing. However, exploring unknown environments remains a significant challenge. This study aims to address this challenge by utilizing on-policy Reinforcement Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two dimensional} area of interest with multiple UAVs. The UAVs will avoid collision with obstacles and each other and do the exploration in a distributed manner. The proposed solution includes actor-critic networks using deep convolutional neural networks {(CNN)} and long short-term memory (LSTM) for identifying the UAVs and areas that have already been covered. Compared to other RL techniques, such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the simulation results demonstrate the superiority of the proposed PPO approach. Also, the results show that combining LSTM with CNN in critic can improve exploration. Since the proposed exploration has to work in unknown environments, the results showed that the proposed setup can complete the coverage when we have new maps that differ from the trained maps. Finally, we showed how tuning hyper parameters may affect the overall performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量