UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

人工智能技术学报(英文) Pub Date : 2021-12-07 DOI:10.37965/jait.2021.12003

Shuangxia Bai, Shaomei Song, Shiyang Liang, Jianmei Wang, Bo Li, E. Neretin

{"title":"UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm","authors":"Shuangxia Bai, Shaomei Song, Shiyang Liang, Jianmei Wang, Bo Li, E. Neretin","doi":"10.37965/jait.2021.12003","DOIUrl":null,"url":null,"abstract":"Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.","PeriodicalId":70996,"journal":{"name":"人工智能技术学报(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能技术学报(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.37965/jait.2021.12003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于双延迟深度确定性策略梯度算法的无人机机动决策算法

针对无人机空战中基于态势信息的智能决策问题，提出了一种基于深度强化学习的机动决策方法。利用马尔可夫决策过程建立了无人机自主机动模型。采用深度强化学习中的Twin Delayed Deep Deterministic Policy Gradient(TD3)算法和Deep Deterministic Policy Gradient(DDPG)算法对模型进行训练，并对两种算法的实验结果进行了分析和比较。仿真实验结果表明，与DDPG算法相比，TD3算法具有更强的决策性能和更快的收敛速度，更适合于求解作战问题。本文提出的算法使无人机能够根据位置、速度、相对方位角等态势信息自主做出机动决策，调整动作以接近并成功打击敌人，为无人机在空战中进行智能机动决策提供了一种新方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

人工智能技术学报(英文)

CiteScore

8.70

自引率

0.00%

发文量