An Intelligent Maneuver Decision-Making Approach for Air Combat Based on Deep Reinforcement Learning and Transformer Networks.

IF 2 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY Entropy Pub Date : 2024-11-29 DOI:10.3390/e26121036

Wentao Li, Feng Fang, Dongliang Peng, Shuning Han

{"title":"An Intelligent Maneuver Decision-Making Approach for Air Combat Based on Deep Reinforcement Learning and Transformer Networks.","authors":"Wentao Li, Feng Fang, Dongliang Peng, Shuning Han","doi":"10.3390/e26121036","DOIUrl":null,"url":null,"abstract":"<p><p>The traditional maneuver decision-making approaches are highly dependent on accurate and complete situation information, and their decision-making quality becomes poor when opponent information is occasionally missing in complex electromagnetic environments. In order to solve this problem, an autonomous maneuver decision-making approach is developed based on deep reinforcement learning (DRL) architecture. Meanwhile, a Transformer network is integrated into the actor and critic networks, which can find the potential dependency relationships among the time series trajectory data. By using these relationships, the information loss is partially compensated, which leads to maneuvering decisions being more accurate. The issues of limited experience samples, low sampling efficiency, and poor stability in the agent training state appear when the Transformer network is introduced into DRL. To address these issues, the measures of designing an effective decision-making reward, a prioritized sampling method, and a dynamic learning rate adjustment mechanism are proposed. Numerous simulation results show that the proposed approach outperforms the traditional DRL algorithms, with a higher win rate in the case of opponent information loss.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"26 12","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727636/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26121036","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The traditional maneuver decision-making approaches are highly dependent on accurate and complete situation information, and their decision-making quality becomes poor when opponent information is occasionally missing in complex electromagnetic environments. In order to solve this problem, an autonomous maneuver decision-making approach is developed based on deep reinforcement learning (DRL) architecture. Meanwhile, a Transformer network is integrated into the actor and critic networks, which can find the potential dependency relationships among the time series trajectory data. By using these relationships, the information loss is partially compensated, which leads to maneuvering decisions being more accurate. The issues of limited experience samples, low sampling efficiency, and poor stability in the agent training state appear when the Transformer network is introduced into DRL. To address these issues, the measures of designing an effective decision-making reward, a prioritized sampling method, and a dynamic learning rate adjustment mechanism are proposed. Numerous simulation results show that the proposed approach outperforms the traditional DRL algorithms, with a higher win rate in the case of opponent information loss.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度强化学习和变压器网络的空战智能机动决策方法。

传统的机动决策方法高度依赖于准确完整的态势信息，在复杂的电磁环境中，当对手信息偶尔缺失时，其决策质量就会下降。为了解决这一问题，提出了一种基于深度强化学习（DRL）架构的自主机动决策方法。同时，将一个Transformer网络集成到演员和评论家网络中，该网络可以发现时间序列轨迹数据之间潜在的依赖关系。通过使用这些关系，可以部分补偿信息损失，从而使机动决策更加准确。将Transformer网络引入DRL后，会出现经验样本有限、采样效率低、智能体训练状态稳定性差的问题。针对这些问题，提出了设计有效的决策奖励、优先抽样方法和动态学习率调整机制等措施。大量仿真结果表明，该方法优于传统的DRL算法，在对手信息丢失的情况下具有更高的胜率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.