Enhanced LSTM-DQN algorithm for a two-player zero-sum game in three-dimensional space

IF 2.3 4区 计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS IET Control Theory and Applications Pub Date : 2024-05-14 DOI:10.1049/cth2.12677
Bo Lu, Le Ru, Maolong Lv, Shiguang Hu, Hongguo Zhang, Zilong Zhao
{"title":"Enhanced LSTM-DQN algorithm for a two-player zero-sum game in three-dimensional space","authors":"Bo Lu,&nbsp;Le Ru,&nbsp;Maolong Lv,&nbsp;Shiguang Hu,&nbsp;Hongguo Zhang,&nbsp;Zilong Zhao","doi":"10.1049/cth2.12677","DOIUrl":null,"url":null,"abstract":"<p>To tackle the challenges presented by the two-player zero sum game (TZSG) in three-dimensional space, this study introduces an enhanced deep Q-learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three-dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the “sparse reward” that arises from prolonged training of intelligence in solving the TZSG in the three-dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM-DQN-HER algorithm outperforms similar algorithm in solving the TZSG in three-dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three-dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three-dimensional space.</p>","PeriodicalId":50382,"journal":{"name":"IET Control Theory and Applications","volume":"18 18","pages":"2798-2812"},"PeriodicalIF":2.3000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cth2.12677","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Control Theory and Applications","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cth2.12677","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

To tackle the challenges presented by the two-player zero sum game (TZSG) in three-dimensional space, this study introduces an enhanced deep Q-learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three-dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the “sparse reward” that arises from prolonged training of intelligence in solving the TZSG in the three-dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM-DQN-HER algorithm outperforms similar algorithm in solving the TZSG in three-dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three-dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three-dimensional space.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
三维空间双人零和博弈的增强型 LSTM-DQN 算法
为应对三维空间中的双人零和博弈(TZSG)所带来的挑战,本研究引入了一种利用长短期记忆(LSTM)网络的增强型深度 Q-learning (DQN)算法。该算法的主要目标是增强三维空间中 TZSG 的时间相关性。此外,它还结合了事后经验重放(HER)机制,以提高网络的学习效率,并缓解在解决三维空间中的 TZSG 时,由于长时间的智能训练而产生的 "奖励稀疏 "问题。此外,该方法还增强了整体求解的收敛性和稳定性。为了验证所提方法的有效性,我们设计了一个以机载代理及其相互追逐交互场景为中心的智能训练环境。算法训练和对比结果表明,LSTM-DQN-HER 算法在求解三维空间中的 TZSG 时优于同类算法。总之,本文提出了一种基于 LSTM 并结合 HER 机制的改进 DQN 算法,以解决三维空间中的 TZSG 所带来的挑战。所提出的算法增强了解的时间相关性、学习效率、收敛性和稳定性。仿真结果证实了该算法在求解三维空间中的 TZSG 时的卓越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IET Control Theory and Applications
IET Control Theory and Applications 工程技术-工程:电子与电气
CiteScore
5.70
自引率
7.70%
发文量
167
审稿时长
5.1 months
期刊介绍: IET Control Theory & Applications is devoted to control systems in the broadest sense, covering new theoretical results and the applications of new and established control methods. Among the topics of interest are system modelling, identification and simulation, the analysis and design of control systems (including computer-aided design), and practical implementation. The scope encompasses technological, economic, physiological (biomedical) and other systems, including man-machine interfaces. Most of the papers published deal with original work from industrial and government laboratories and universities, but subject reviews and tutorial expositions of current methods are welcomed. Correspondence discussing published papers is also welcomed. Applications papers need not necessarily involve new theory. Papers which describe new realisations of established methods, or control techniques applied in a novel situation, or practical studies which compare various designs, would be of interest. Of particular value are theoretical papers which discuss the applicability of new work or applications which engender new theoretical applications.
期刊最新文献
Fault Detection Analysis of State-Triggered Impulsive Boolean Control Networks Specialized Deep Residual Policy Reinforcement Learning Framework for Safe and Adaptive Continuous Control An Adaptive Velocity-Sensorless Control Strategy of Multi-Locomotive Freight Trains Using a Non-Recursive Control and High-Gain Observer Solution to Asymptotic Stability in Tracking Control of Nonlinear Systems With Control Input Differentiation, Actuator Dynamics, and Saturation Constraints Hybrid-Driven Model-Based Reinforcement Learning Approach for Energy Consumption Optimization of HVAC Chilled Water Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1