{"title":"Policy Algebraic Equation for the Discrete-Time Linear Quadratic Regulator Problem","authors":"Mario Sassano","doi":"10.1109/TAC.2024.3465566","DOIUrl":null,"url":null,"abstract":"The discrete-time, infinite-horizon linear quadratic regulator (LQR) is studied with the objective of establishing a unified perspective on the problem by relying simultaneously on Dynamic Programming and the discrete Minimum Principle. While it is well known that the two strategies <italic>independently</i> yield the optimal solution, it is shown here that their <italic>combination</i> provides much deeper insights on the nature of the optimal solution and on the strategies by means of which it can be computed. More precisely, the optimal cost, captured by the matrix P, and the feedback gain matrix K are jointly related via the <italic>observability matrix</i> of the underlying state/costate (Hamiltonian) dynamics when the state alone is measured. Such an abstract property is then instrumental for deriving alternative characterizations of the optimal solution. First, an algebraic equation, referred to as the <italic>policy algebraic equation</i>, is established in the variable K alone and with dimension typically much smaller than the size of the classic ARE arising in discrete-time LQR, although comprising polynomial equations of higher degree. This equation permits the direct construction of the optimal feedback gain (i.e., the <italic>actor</i>) without the need for the simultaneous computation of the optimal cost (i.e., the <italic>critic</i>). The structure of the policy algebraic equation naturally lends itself to an iterative approach towards its solution, which is restricted to the <italic>space of policies alone</i> and which does not require the explicit solution of any intermediate (linear) equation at each step. Furthermore, as a consequence of the above properties, it is possible to derive a Riccati equation in P, although with coefficients defined by polynomial functions of K, with the property that the constant and quadratic terms are symmetric and <italic>sign-definite</i>. This aspect is remarkably different from the classic ARE associated to the discrete-time LQR and more akin to the continuous-time counterpart.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 4","pages":"2106-2121"},"PeriodicalIF":7.0000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10685125","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10685125/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The discrete-time, infinite-horizon linear quadratic regulator (LQR) is studied with the objective of establishing a unified perspective on the problem by relying simultaneously on Dynamic Programming and the discrete Minimum Principle. While it is well known that the two strategies independently yield the optimal solution, it is shown here that their combination provides much deeper insights on the nature of the optimal solution and on the strategies by means of which it can be computed. More precisely, the optimal cost, captured by the matrix P, and the feedback gain matrix K are jointly related via the observability matrix of the underlying state/costate (Hamiltonian) dynamics when the state alone is measured. Such an abstract property is then instrumental for deriving alternative characterizations of the optimal solution. First, an algebraic equation, referred to as the policy algebraic equation, is established in the variable K alone and with dimension typically much smaller than the size of the classic ARE arising in discrete-time LQR, although comprising polynomial equations of higher degree. This equation permits the direct construction of the optimal feedback gain (i.e., the actor) without the need for the simultaneous computation of the optimal cost (i.e., the critic). The structure of the policy algebraic equation naturally lends itself to an iterative approach towards its solution, which is restricted to the space of policies alone and which does not require the explicit solution of any intermediate (linear) equation at each step. Furthermore, as a consequence of the above properties, it is possible to derive a Riccati equation in P, although with coefficients defined by polynomial functions of K, with the property that the constant and quadratic terms are symmetric and sign-definite. This aspect is remarkably different from the classic ARE associated to the discrete-time LQR and more akin to the continuous-time counterpart.
期刊介绍:
In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered:
1) Papers: Presentation of significant research, development, or application of control concepts.
2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions.
In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.