{"title":"基于专家经验指导下深度强化学习的无人机操纵决策算法","authors":"Guang Zhan, Kun Zhang, Ke Li, Haiyin Piao","doi":"10.23919/jsee.2024.000022","DOIUrl":null,"url":null,"abstract":"Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.","PeriodicalId":50030,"journal":{"name":"Journal of Systems Engineering and Electronics","volume":"51 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UAV Maneuvering Decision-Making Algorithm Based on Deep Reinforcement Learning Under the Guidance of Expert Experience\",\"authors\":\"Guang Zhan, Kun Zhang, Ke Li, Haiyin Piao\",\"doi\":\"10.23919/jsee.2024.000022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.\",\"PeriodicalId\":50030,\"journal\":{\"name\":\"Journal of Systems Engineering and Electronics\",\"volume\":\"51 1\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Engineering and Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.23919/jsee.2024.000022\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Electronics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.23919/jsee.2024.000022","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
UAV Maneuvering Decision-Making Algorithm Based on Deep Reinforcement Learning Under the Guidance of Expert Experience
Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battle-field. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.