Policy Algebraic Equation for the Discrete-Time Linear Quadratic Regulator Problem

IF 7 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automatic Control Pub Date : 2024-09-20 DOI:10.1109/TAC.2024.3465566
Mario Sassano
{"title":"Policy Algebraic Equation for the Discrete-Time Linear Quadratic Regulator Problem","authors":"Mario Sassano","doi":"10.1109/TAC.2024.3465566","DOIUrl":null,"url":null,"abstract":"The discrete-time, infinite-horizon linear quadratic regulator (LQR) is studied with the objective of establishing a unified perspective on the problem by relying simultaneously on Dynamic Programming and the discrete Minimum Principle. While it is well known that the two strategies <italic>independently</i> yield the optimal solution, it is shown here that their <italic>combination</i> provides much deeper insights on the nature of the optimal solution and on the strategies by means of which it can be computed. More precisely, the optimal cost, captured by the matrix P, and the feedback gain matrix K are jointly related via the <italic>observability matrix</i> of the underlying state/costate (Hamiltonian) dynamics when the state alone is measured. Such an abstract property is then instrumental for deriving alternative characterizations of the optimal solution. First, an algebraic equation, referred to as the <italic>policy algebraic equation</i>, is established in the variable K alone and with dimension typically much smaller than the size of the classic ARE arising in discrete-time LQR, although comprising polynomial equations of higher degree. This equation permits the direct construction of the optimal feedback gain (i.e., the <italic>actor</i>) without the need for the simultaneous computation of the optimal cost (i.e., the <italic>critic</i>). The structure of the policy algebraic equation naturally lends itself to an iterative approach towards its solution, which is restricted to the <italic>space of policies alone</i> and which does not require the explicit solution of any intermediate (linear) equation at each step. Furthermore, as a consequence of the above properties, it is possible to derive a Riccati equation in P, although with coefficients defined by polynomial functions of K, with the property that the constant and quadratic terms are symmetric and <italic>sign-definite</i>. This aspect is remarkably different from the classic ARE associated to the discrete-time LQR and more akin to the continuous-time counterpart.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 4","pages":"2106-2121"},"PeriodicalIF":7.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10685125","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10685125/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The discrete-time, infinite-horizon linear quadratic regulator (LQR) is studied with the objective of establishing a unified perspective on the problem by relying simultaneously on Dynamic Programming and the discrete Minimum Principle. While it is well known that the two strategies independently yield the optimal solution, it is shown here that their combination provides much deeper insights on the nature of the optimal solution and on the strategies by means of which it can be computed. More precisely, the optimal cost, captured by the matrix P, and the feedback gain matrix K are jointly related via the observability matrix of the underlying state/costate (Hamiltonian) dynamics when the state alone is measured. Such an abstract property is then instrumental for deriving alternative characterizations of the optimal solution. First, an algebraic equation, referred to as the policy algebraic equation, is established in the variable K alone and with dimension typically much smaller than the size of the classic ARE arising in discrete-time LQR, although comprising polynomial equations of higher degree. This equation permits the direct construction of the optimal feedback gain (i.e., the actor) without the need for the simultaneous computation of the optimal cost (i.e., the critic). The structure of the policy algebraic equation naturally lends itself to an iterative approach towards its solution, which is restricted to the space of policies alone and which does not require the explicit solution of any intermediate (linear) equation at each step. Furthermore, as a consequence of the above properties, it is possible to derive a Riccati equation in P, although with coefficients defined by polynomial functions of K, with the property that the constant and quadratic terms are symmetric and sign-definite. This aspect is remarkably different from the classic ARE associated to the discrete-time LQR and more akin to the continuous-time counterpart.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
离散时间线性二次调节器问题的政策代数方程
利用动态规划和离散最小值原理,对离散时间、无限视界线性二次型调节器(LQR)进行了研究,建立了对该问题的统一认识。虽然众所周知,这两种策略独立产生最优解,但这里显示,它们的组合提供了对最优解的本质和可以计算最优解的策略的更深入的见解。更准确地说,当单独测量状态时,由矩阵P捕获的最优代价和反馈增益矩阵K通过底层状态/协态(哈密顿)动力学的可观察性矩阵联合关联。这样一个抽象的性质对于推导出最优解的其他特征是有帮助的。首先,仅在变量K中建立一个代数方程,称为策略代数方程,其维度通常比离散时间LQR中产生的经典ARE的大小小得多,尽管包含更高次的多项式方程。这个方程允许直接构造最优反馈增益(即行为者),而不需要同时计算最优成本(即评论家)。策略代数方程的结构自然地使其适合于求解的迭代方法,该方法仅局限于策略空间,并且在每一步都不需要任何中间(线性)方程的显式解。此外,作为上述性质的结果,有可能推导出P中的Riccati方程,尽管系数由K的多项式函数定义,具有常数项和二次项对称且符号确定的性质。这方面与离散时间LQR相关的经典ARE明显不同,而更类似于连续时间LQR。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automatic Control
IEEE Transactions on Automatic Control 工程技术-工程:电子与电气
CiteScore
11.30
自引率
5.90%
发文量
824
审稿时长
9 months
期刊介绍: In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered: 1) Papers: Presentation of significant research, development, or application of control concepts. 2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions. In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.
期刊最新文献
Least-squares model-reference adaptive control: extension to higher relative degree plants Algorithmic Feedback Synthesis for Robust Strong Invariance of Continuous Control Systems Model Reference Adaptive Control of Almost Periodic Piecewise Linear Systems with Variable Periods and Disturbance Input Model Predictive Control of Hybrid Dynamical Systems Ring-patterned control for hyperexponential consensus of multi-agent systems with increasing scales
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1