三维空间双人零和博弈的增强型 LSTM-DQN 算法

IF 2.3 4区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS IET Control Theory and Applications Pub Date : 2024-05-14 DOI:10.1049/cth2.12677

Bo Lu, Le Ru, Maolong Lv, Shiguang Hu, Hongguo Zhang, Zilong Zhao

{"title":"三维空间双人零和博弈的增强型 LSTM-DQN 算法","authors":"Bo Lu, Le Ru, Maolong Lv, Shiguang Hu, Hongguo Zhang, Zilong Zhao","doi":"10.1049/cth2.12677","DOIUrl":null,"url":null,"abstract":"<p>To tackle the challenges presented by the two-player zero sum game (TZSG) in three-dimensional space, this study introduces an enhanced deep Q-learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three-dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the “sparse reward” that arises from prolonged training of intelligence in solving the TZSG in the three-dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM-DQN-HER algorithm outperforms similar algorithm in solving the TZSG in three-dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three-dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three-dimensional space.</p>","PeriodicalId":50382,"journal":{"name":"IET Control Theory and Applications","volume":"18 18","pages":"2798-2812"},"PeriodicalIF":2.3000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cth2.12677","citationCount":"0","resultStr":"{\"title\":\"Enhanced LSTM-DQN algorithm for a two-player zero-sum game in three-dimensional space\",\"authors\":\"Bo Lu, Le Ru, Maolong Lv, Shiguang Hu, Hongguo Zhang, Zilong Zhao\",\"doi\":\"10.1049/cth2.12677\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>To tackle the challenges presented by the two-player zero sum game (TZSG) in three-dimensional space, this study introduces an enhanced deep Q-learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three-dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the “sparse reward” that arises from prolonged training of intelligence in solving the TZSG in the three-dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM-DQN-HER algorithm outperforms similar algorithm in solving the TZSG in three-dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three-dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three-dimensional space.</p>\",\"PeriodicalId\":50382,\"journal\":{\"name\":\"IET Control Theory and Applications\",\"volume\":\"18 18\",\"pages\":\"2798-2812\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cth2.12677\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Control Theory and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cth2.12677\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Control Theory and Applications","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cth2.12677","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

为应对三维空间中的双人零和博弈（TZSG）所带来的挑战，本研究引入了一种利用长短期记忆（LSTM）网络的增强型深度 Q-learning （DQN）算法。该算法的主要目标是增强三维空间中 TZSG 的时间相关性。此外，它还结合了事后经验重放（HER）机制，以提高网络的学习效率，并缓解在解决三维空间中的 TZSG 时，由于长时间的智能训练而产生的 "奖励稀疏 "问题。此外，该方法还增强了整体求解的收敛性和稳定性。为了验证所提方法的有效性，我们设计了一个以机载代理及其相互追逐交互场景为中心的智能训练环境。算法训练和对比结果表明，LSTM-DQN-HER 算法在求解三维空间中的 TZSG 时优于同类算法。总之，本文提出了一种基于 LSTM 并结合 HER 机制的改进 DQN 算法，以解决三维空间中的 TZSG 所带来的挑战。所提出的算法增强了解的时间相关性、学习效率、收敛性和稳定性。仿真结果证实了该算法在求解三维空间中的 TZSG 时的卓越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhanced LSTM-DQN algorithm for a two-player zero-sum game in three-dimensional space

To tackle the challenges presented by the two-player zero sum game (TZSG) in three-dimensional space, this study introduces an enhanced deep Q-learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three-dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the “sparse reward” that arises from prolonged training of intelligence in solving the TZSG in the three-dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM-DQN-HER algorithm outperforms similar algorithm in solving the TZSG in three-dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three-dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three-dimensional space.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Control Theory and Applications 工程技术-工程：电子与电气

CiteScore

5.70

自引率

7.70%

发文量

167

审稿时长

5.1 months

期刊介绍： IET Control Theory & Applications is devoted to control systems in the broadest sense, covering new theoretical results and the applications of new and established control methods. Among the topics of interest are system modelling, identification and simulation, the analysis and design of control systems (including computer-aided design), and practical implementation. The scope encompasses technological, economic, physiological (biomedical) and other systems, including man-machine interfaces. Most of the papers published deal with original work from industrial and government laboratories and universities, but subject reviews and tutorial expositions of current methods are welcomed. Correspondence discussing published papers is also welcomed. Applications papers need not necessarily involve new theory. Papers which describe new realisations of established methods, or control techniques applied in a novel situation, or practical studies which compare various designs, would be of interest. Of particular value are theoretical papers which discuss the applicability of new work or applications which engender new theoretical applications.