A Q-Learning Algorithm to Solve the Two-Player Zero-Sum Game Problem for Nonlinear Systems

IF 3.8 4区 计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS International Journal of Adaptive Control and Signal Processing Pub Date : 2025-01-06 DOI:10.1002/acs.3958
Afreen Islam, Anthony Siming Chen, Guido Herrmann
{"title":"A Q-Learning Algorithm to Solve the Two-Player Zero-Sum Game Problem for Nonlinear Systems","authors":"Afreen Islam,&nbsp;Anthony Siming Chen,&nbsp;Guido Herrmann","doi":"10.1002/acs.3958","DOIUrl":null,"url":null,"abstract":"<p>This paper deals with the two-player zero-sum game problem, which is a bounded <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mrow>\n <mi>L</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ {L}_2 $$</annotation>\n </semantics></math>-gain robust control problem. Finding an analytical solution to the complex Hamilton-Jacobi-Issacs (HJI) equation is a challenging task. Hence, a novel Q-learning algorithm for unknown continuous-time (CT) affine-in-inputs nonlinear systems is proposed for generating an approximate solution to the HJI equation, which is valid in a local domain due to the use of a local approximator, that is, a Neural Network (NN) structure. The approach is model-free and does not require the knowledge of system drift dynamics, and input and disturbance gains. The algorithm learns online from measurements of state variables in real time. To generate the local approximate solution of the HJI equation for the two-player zero-sum game problem for nonlinear systems, the proposed non-iterative algorithm requires only a single critic NN instead of the commonly used triple NN approximator structure. A persistence of excitation condition is required to guarantee Uniformly Ultimately Boundedness (UUB) and convergence to the optimal solution. The effectiveness of the proposed Q-learning approach for the two-player zero-sum game problem is demonstrated via simulations of a linear F-16 aircraft plant and a highly complex nonlinear system. Proof of closed-loop system stability is provided using Lyapunov Analysis, and convergence of the approximate solution to the true saddle-point solution is guaranteed in a UUB-sense.</p>","PeriodicalId":50347,"journal":{"name":"International Journal of Adaptive Control and Signal Processing","volume":"39 3","pages":"566-581"},"PeriodicalIF":3.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acs.3958","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Adaptive Control and Signal Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acs.3958","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This paper deals with the two-player zero-sum game problem, which is a bounded L 2 $$ {L}_2 $$ -gain robust control problem. Finding an analytical solution to the complex Hamilton-Jacobi-Issacs (HJI) equation is a challenging task. Hence, a novel Q-learning algorithm for unknown continuous-time (CT) affine-in-inputs nonlinear systems is proposed for generating an approximate solution to the HJI equation, which is valid in a local domain due to the use of a local approximator, that is, a Neural Network (NN) structure. The approach is model-free and does not require the knowledge of system drift dynamics, and input and disturbance gains. The algorithm learns online from measurements of state variables in real time. To generate the local approximate solution of the HJI equation for the two-player zero-sum game problem for nonlinear systems, the proposed non-iterative algorithm requires only a single critic NN instead of the commonly used triple NN approximator structure. A persistence of excitation condition is required to guarantee Uniformly Ultimately Boundedness (UUB) and convergence to the optimal solution. The effectiveness of the proposed Q-learning approach for the two-player zero-sum game problem is demonstrated via simulations of a linear F-16 aircraft plant and a highly complex nonlinear system. Proof of closed-loop system stability is provided using Lyapunov Analysis, and convergence of the approximate solution to the true saddle-point solution is guaranteed in a UUB-sense.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求解非线性系统二人零和博弈问题的q -学习算法
本文研究了二人零和博弈问题,这是一个有界l2 $$ {L}_2 $$ -增益鲁棒控制问题。寻找复杂Hamilton-Jacobi-Issacs (HJI)方程的解析解是一项具有挑战性的任务。因此,针对未知连续时间(CT)仿射输入非线性系统,提出了一种新的q -学习算法,用于生成HJI方程的近似解,由于使用了局部逼近器,即神经网络(NN)结构,该算法在局部域中有效。该方法是无模型的,不需要了解系统漂移动力学、输入增益和干扰增益。该算法通过实时状态变量的测量在线学习。为了对非线性系统的二人零和博弈问题生成HJI方程的局部近似解,本文提出的非迭代算法只需要一个批评家神经网络,而不是常用的三重神经网络逼近器结构。为了保证系统的一致最终有界性和收敛到最优解,需要一个持续的激励条件。通过对一个线性F-16飞机工厂和一个高度复杂的非线性系统的仿真,证明了所提出的q -学习方法对两方零和博弈问题的有效性。利用Lyapunov分析给出了闭环系统稳定性的证明,并在uub意义上保证了真鞍点解的近似解的收敛性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.30
自引率
16.10%
发文量
163
审稿时长
5 months
期刊介绍: The International Journal of Adaptive Control and Signal Processing is concerned with the design, synthesis and application of estimators or controllers where adaptive features are needed to cope with uncertainties.Papers on signal processing should also have some relevance to adaptive systems. The journal focus is on model based control design approaches rather than heuristic or rule based control design methods. All papers will be expected to include significant novel material. Both the theory and application of adaptive systems and system identification are areas of interest. Papers on applications can include problems in the implementation of algorithms for real time signal processing and control. The stability, convergence, robustness and numerical aspects of adaptive algorithms are also suitable topics. The related subjects of controller tuning, filtering, networks and switching theory are also of interest. Principal areas to be addressed include: Auto-Tuning, Self-Tuning and Model Reference Adaptive Controllers Nonlinear, Robust and Intelligent Adaptive Controllers Linear and Nonlinear Multivariable System Identification and Estimation Identification of Linear Parameter Varying, Distributed and Hybrid Systems Multiple Model Adaptive Control Adaptive Signal processing Theory and Algorithms Adaptation in Multi-Agent Systems Condition Monitoring Systems Fault Detection and Isolation Methods Fault Detection and Isolation Methods Fault-Tolerant Control (system supervision and diagnosis) Learning Systems and Adaptive Modelling Real Time Algorithms for Adaptive Signal Processing and Control Adaptive Signal Processing and Control Applications Adaptive Cloud Architectures and Networking Adaptive Mechanisms for Internet of Things Adaptive Sliding Mode Control.
期刊最新文献
Issue Information Quantized Iterative Learning Control for Consensus of Nonlinear Impulsive Multi-Agent Systems With Inter-Channel Encoding-Decoding Mechanisms and Packet Dropouts Adaptive Predefined-Time Control for High-Order Nonlinear Systems With Unmodeled Dynamics Composite Learning Adaptive Optimized Backstepping Control for a Class of Nonlinear Strict-Feedback Systems With Prescribed Performance Robust Fault H ∞ $$ {H}_{\infty } $$ Filtering Design in Finite Frequency Domain for Discrete-Time Switched Singular Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1