Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
{"title":"用于多无人飞行器探索的政策上行动者批判强化学习","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\nfields, including precision agriculture, search and rescue, and remote sensing.\nHowever, exploring unknown environments remains a significant challenge. This\nstudy aims to address this challenge by utilizing on-policy Reinforcement\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\nwith obstacles and each other and do the exploration in a distributed manner.\nThe proposed solution includes actor-critic networks using deep convolutional\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\nUAVs and areas that have already been covered. Compared to other RL techniques,\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\nsimulation results demonstrate the superiority of the proposed PPO approach.\nAlso, the results show that combining LSTM with CNN in critic can improve\nexploration. Since the proposed exploration has to work in unknown\nenvironments, the results showed that the proposed setup can complete the\ncoverage when we have new maps that differ from the trained maps. Finally, we\nshowed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration\",\"authors\":\"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub\",\"doi\":\"arxiv-2409.11058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\\nfields, including precision agriculture, search and rescue, and remote sensing.\\nHowever, exploring unknown environments remains a significant challenge. This\\nstudy aims to address this challenge by utilizing on-policy Reinforcement\\nLearning (RL) with Proximal Policy Optimization (PPO) to explore the {two\\ndimensional} area of interest with multiple UAVs. The UAVs will avoid collision\\nwith obstacles and each other and do the exploration in a distributed manner.\\nThe proposed solution includes actor-critic networks using deep convolutional\\nneural networks {(CNN)} and long short-term memory (LSTM) for identifying the\\nUAVs and areas that have already been covered. Compared to other RL techniques,\\nsuch as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\\nsimulation results demonstrate the superiority of the proposed PPO approach.\\nAlso, the results show that combining LSTM with CNN in critic can improve\\nexploration. Since the proposed exploration has to work in unknown\\nenvironments, the results showed that the proposed setup can complete the\\ncoverage when we have new maps that differ from the trained maps. Finally, we\\nshowed how tuning hyper parameters may affect the overall performance.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration
Unmanned aerial vehicles (UAVs) have become increasingly popular in various
fields, including precision agriculture, search and rescue, and remote sensing.
However, exploring unknown environments remains a significant challenge. This
study aims to address this challenge by utilizing on-policy Reinforcement
Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two
dimensional} area of interest with multiple UAVs. The UAVs will avoid collision
with obstacles and each other and do the exploration in a distributed manner.
The proposed solution includes actor-critic networks using deep convolutional
neural networks {(CNN)} and long short-term memory (LSTM) for identifying the
UAVs and areas that have already been covered. Compared to other RL techniques,
such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the
simulation results demonstrate the superiority of the proposed PPO approach.
Also, the results show that combining LSTM with CNN in critic can improve
exploration. Since the proposed exploration has to work in unknown
environments, the results showed that the proposed setup can complete the
coverage when we have new maps that differ from the trained maps. Finally, we
showed how tuning hyper parameters may affect the overall performance.