Policy Algebraic Equation for the Discrete-Time Linear Quadratic Regulator Problem

IF 7 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automatic Control Pub Date : 2024-09-20 DOI:10.1109/TAC.2024.3465566

Mario Sassano

{"title":"Policy Algebraic Equation for the Discrete-Time Linear Quadratic Regulator Problem","authors":"Mario Sassano","doi":"10.1109/TAC.2024.3465566","DOIUrl":null,"url":null,"abstract":"The discrete-time, infinite-horizon linear quadratic regulator (LQR) is studied with the objective of establishing a unified perspective on the problem by relying simultaneously on Dynamic Programming and the discrete Minimum Principle. While it is well known that the two strategies <italic>independently yield the optimal solution, it is shown here that their <italic>combination provides much deeper insights on the nature of the optimal solution and on the strategies by means of which it can be computed. More precisely, the optimal cost, captured by the matrix P, and the feedback gain matrix K are jointly related via the <italic>observability matrix of the underlying state/costate (Hamiltonian) dynamics when the state alone is measured. Such an abstract property is then instrumental for deriving alternative characterizations of the optimal solution. First, an algebraic equation, referred to as the <italic>policy algebraic equation, is established in the variable K alone and with dimension typically much smaller than the size of the classic ARE arising in discrete-time LQR, although comprising polynomial equations of higher degree. This equation permits the direct construction of the optimal feedback gain (i.e., the <italic>actor) without the need for the simultaneous computation of the optimal cost (i.e., the <italic>critic). The structure of the policy algebraic equation naturally lends itself to an iterative approach towards its solution, which is restricted to the <italic>space of policies alone and which does not require the explicit solution of any intermediate (linear) equation at each step. Furthermore, as a consequence of the above properties, it is possible to derive a Riccati equation in P, although with coefficients defined by polynomial functions of K, with the property that the constant and quadratic terms are symmetric and <italic>sign-definite. This aspect is remarkably different from the classic ARE associated to the discrete-time LQR and more akin to the continuous-time counterpart.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 4","pages":"2106-2121"},"PeriodicalIF":7.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10685125","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10685125/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The discrete-time, infinite-horizon linear quadratic regulator (LQR) is studied with the objective of establishing a unified perspective on the problem by relying simultaneously on Dynamic Programming and the discrete Minimum Principle. While it is well known that the two strategies independently yield the optimal solution, it is shown here that their combination provides much deeper insights on the nature of the optimal solution and on the strategies by means of which it can be computed. More precisely, the optimal cost, captured by the matrix P, and the feedback gain matrix K are jointly related via the observability matrix of the underlying state/costate (Hamiltonian) dynamics when the state alone is measured. Such an abstract property is then instrumental for deriving alternative characterizations of the optimal solution. First, an algebraic equation, referred to as the policy algebraic equation, is established in the variable K alone and with dimension typically much smaller than the size of the classic ARE arising in discrete-time LQR, although comprising polynomial equations of higher degree. This equation permits the direct construction of the optimal feedback gain (i.e., the actor) without the need for the simultaneous computation of the optimal cost (i.e., the critic). The structure of the policy algebraic equation naturally lends itself to an iterative approach towards its solution, which is restricted to the space of policies alone and which does not require the explicit solution of any intermediate (linear) equation at each step. Furthermore, as a consequence of the above properties, it is possible to derive a Riccati equation in P, although with coefficients defined by polynomial functions of K, with the property that the constant and quadratic terms are symmetric and sign-definite. This aspect is remarkably different from the classic ARE associated to the discrete-time LQR and more akin to the continuous-time counterpart.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

离散时间线性二次调节器问题的政策代数方程

利用动态规划和离散最小值原理，对离散时间、无限视界线性二次型调节器（LQR）进行了研究，建立了对该问题的统一认识。虽然众所周知，这两种策略独立产生最优解，但这里显示，它们的组合提供了对最优解的本质和可以计算最优解的策略的更深入的见解。更准确地说，当单独测量状态时，由矩阵P捕获的最优代价和反馈增益矩阵K通过底层状态/协态（哈密顿）动力学的可观察性矩阵联合关联。这样一个抽象的性质对于推导出最优解的其他特征是有帮助的。首先，仅在变量K中建立一个代数方程，称为策略代数方程，其维度通常比离散时间LQR中产生的经典ARE的大小小得多，尽管包含更高次的多项式方程。这个方程允许直接构造最优反馈增益（即行为者），而不需要同时计算最优成本（即评论家）。策略代数方程的结构自然地使其适合于求解的迭代方法，该方法仅局限于策略空间，并且在每一步都不需要任何中间（线性）方程的显式解。此外，作为上述性质的结果，有可能推导出P中的Riccati方程，尽管系数由K的多项式函数定义，具有常数项和二次项对称且符号确定的性质。这方面与离散时间LQR相关的经典ARE明显不同，而更类似于连续时间LQR。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automatic Control 工程技术-工程：电子与电气

CiteScore

11.30

自引率

5.90%

发文量

824

审稿时长

9 months

期刊介绍： In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered: 1) Papers: Presentation of significant research, development, or application of control concepts. 2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions. In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.