基于深度强化学习的机动目标拦截制导

IF 1.1 4区工程技术 Q3 ENGINEERING, AEROSPACE International Journal of Aerospace Engineering Pub Date : 2023-09-13 DOI:10.1155/2023/7924190

Zhe Hu, Liang Xiao, Jun Guan, Wenjun Yi, Hongqiao Yin

{"title":"基于深度强化学习的机动目标拦截制导","authors":"Zhe Hu, Liang Xiao, Jun Guan, Wenjun Yi, Hongqiao Yin","doi":"10.1155/2023/7924190","DOIUrl":null,"url":null,"abstract":"In this paper, a novel guidance law based on a reinforcement learning (RL) algorithm is presented to deal with the maneuvering target interception problem using a deep deterministic policy gradient descent neural network. We take the missile’s line-of-sight (LOS) rate as the observation of the RL algorithm and propose a novel reward function, which is constructed with the miss distance and LOS rate to train the neural network off-line. In the guidance process, the trained neural network has the capacity of mapping the missile’s LOS rate to the normal acceleration of the missile directly, so as to generate guidance commands in real time. Under the actor-critic (AC) framework, we adopt the twin-delayed deep deterministic policy gradient (TD3) algorithm by taking the minimum value between a pair of critics to reduce overestimation. Simulation results show that the proposed TD3-based RL guidance law outperforms the current state of the RL guidance law, has better performance to cope with continuous action and state space, and also has a faster convergence speed and higher reward. Furthermore, the proposed RL guidance law has better accuracy and robustness when intercepting a maneuvering target, and the LOS rate is converged.","PeriodicalId":13748,"journal":{"name":"International Journal of Aerospace Engineering","volume":"48 1","pages":"0"},"PeriodicalIF":1.1000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning\",\"authors\":\"Zhe Hu, Liang Xiao, Jun Guan, Wenjun Yi, Hongqiao Yin\",\"doi\":\"10.1155/2023/7924190\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a novel guidance law based on a reinforcement learning (RL) algorithm is presented to deal with the maneuvering target interception problem using a deep deterministic policy gradient descent neural network. We take the missile’s line-of-sight (LOS) rate as the observation of the RL algorithm and propose a novel reward function, which is constructed with the miss distance and LOS rate to train the neural network off-line. In the guidance process, the trained neural network has the capacity of mapping the missile’s LOS rate to the normal acceleration of the missile directly, so as to generate guidance commands in real time. Under the actor-critic (AC) framework, we adopt the twin-delayed deep deterministic policy gradient (TD3) algorithm by taking the minimum value between a pair of critics to reduce overestimation. Simulation results show that the proposed TD3-based RL guidance law outperforms the current state of the RL guidance law, has better performance to cope with continuous action and state space, and also has a faster convergence speed and higher reward. Furthermore, the proposed RL guidance law has better accuracy and robustness when intercepting a maneuvering target, and the LOS rate is converged.\",\"PeriodicalId\":13748,\"journal\":{\"name\":\"International Journal of Aerospace Engineering\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Aerospace Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2023/7924190\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Aerospace Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2023/7924190","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种基于强化学习(RL)算法的新型制导律，利用深度确定性策略梯度下降神经网络处理机动目标拦截问题。我们以导弹的视距(LOS)率作为RL算法的观测值，并提出了一种新的奖励函数，该函数由脱靶量和LOS率组成，用于离线训练神经网络。在制导过程中，训练后的神经网络具有将导弹的LOS速率直接映射到导弹法向加速度的能力，从而实时生成制导命令。在行动者-评论家(AC)框架下，我们采用双延迟深度确定性策略梯度(TD3)算法，通过取一对评论家之间的最小值来减少高估。仿真结果表明，提出的基于td3的RL制导律优于当前状态下的RL制导律，具有更好的连续动作和状态空间处理性能，收敛速度更快，奖励更高。此外，所提出的RL制导律在拦截机动目标时具有更好的精度和鲁棒性，并且LOS率是收敛的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Intercept Guidance of Maneuvering Targets with Deep Reinforcement Learning

In this paper, a novel guidance law based on a reinforcement learning (RL) algorithm is presented to deal with the maneuvering target interception problem using a deep deterministic policy gradient descent neural network. We take the missile’s line-of-sight (LOS) rate as the observation of the RL algorithm and propose a novel reward function, which is constructed with the miss distance and LOS rate to train the neural network off-line. In the guidance process, the trained neural network has the capacity of mapping the missile’s LOS rate to the normal acceleration of the missile directly, so as to generate guidance commands in real time. Under the actor-critic (AC) framework, we adopt the twin-delayed deep deterministic policy gradient (TD3) algorithm by taking the minimum value between a pair of critics to reduce overestimation. Simulation results show that the proposed TD3-based RL guidance law outperforms the current state of the RL guidance law, has better performance to cope with continuous action and state space, and also has a faster convergence speed and higher reward. Furthermore, the proposed RL guidance law has better accuracy and robustness when intercepting a maneuvering target, and the LOS rate is converged.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Aerospace Engineering ENGINEERING, AEROSPACE-

CiteScore

2.70

自引率

7.10%

发文量

195

审稿时长

22 weeks

期刊介绍： International Journal of Aerospace Engineering aims to serve the international aerospace engineering community through dissemination of scientific knowledge on practical engineering and design methodologies pertaining to aircraft and space vehicles. Original unpublished manuscripts are solicited on all areas of aerospace engineering including but not limited to: -Mechanics of materials and structures- Aerodynamics and fluid mechanics- Dynamics and control- Aeroacoustics- Aeroelasticity- Propulsion and combustion- Avionics and systems- Flight simulation and mechanics- Unmanned air vehicles (UAVs). Review articles on any of the above topics are also welcome.