Convergence of Policy Gradient Methods for Finite-Horizon Exploratory Linear-Quadratic Control Problems

IF 2.4 2区数学 Q2 AUTOMATION & CONTROL SYSTEMS SIAM Journal on Control and Optimization Pub Date : 2024-03-22 DOI:10.1137/22m1533517

Michael Giegrich, Christoph Reisinger, Yufei Zhang

{"title":"Convergence of Policy Gradient Methods for Finite-Horizon Exploratory Linear-Quadratic Control Problems","authors":"Michael Giegrich, Christoph Reisinger, Yufei Zhang","doi":"10.1137/22m1533517","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 2, Page 1060-1092, April 2024. <br/> Abstract. We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularizers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures–Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":"23 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Control and Optimization","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1137/22m1533517","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

SIAM Journal on Control and Optimization, Volume 62, Issue 2, Page 1060-1092, April 2024.
Abstract. We study the global linear convergence of policy gradient (PG) methods for finite-horizon continuous-time exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularizers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures–Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a priori bound, and converge globally to the optimal policy with a linear rate. We further propose a novel PG method with discrete-time policies. The algorithm leverages the continuous-time analysis, and achieves a robust linear convergence across different action frequencies. A numerical experiment confirms the convergence and robustness of the proposed algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有限边界探索性线性-二次控制问题的政策梯度方法收敛性

SIAM 控制与优化期刊》第 62 卷第 2 期第 1060-1092 页，2024 年 4 月。摘要。我们研究了政策梯度（PG）方法对有限视距连续时间探索性线性二次控制（LQC）问题的全局线性收敛性。该设置包括具有不确定成本的随机 LQC 问题，并允许在目标中加入额外的熵正则。我们考虑的是连续时间高斯策略，其均值与状态变量成线性关系，协方差与状态无关。与离散时间问题相反，代价在策略中是非胁迫性的，而且并非所有下降方向都会导致有界迭代。我们分别利用费雪几何和布雷斯-瓦瑟斯坦几何，为策略的均值和协方差提出了几何感知梯度下降方法。结果表明，策略迭代满足先验约束，并以线性速率全局收敛到最优策略。我们进一步提出了一种采用离散时间策略的新型 PG 方法。该算法利用连续时间分析，在不同的行动频率下实现了稳健的线性收敛。数值实验证实了所提算法的收敛性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

SIAM Journal on Control and Optimization 数学-应用数学

CiteScore

4.00

自引率

4.50%

发文量

143

审稿时长

12 months

期刊介绍： SIAM Journal on Control and Optimization (SICON) publishes original research articles on the mathematics and applications of control theory and certain parts of optimization theory. Papers considered for publication must be significant at both the mathematical level and the level of applications or potential applications. Papers containing mostly routine mathematics or those with no discernible connection to control and systems theory or optimization will not be considered for publication. From time to time, the journal will also publish authoritative surveys of important subject areas in control theory and optimization whose level of maturity permits a clear and unified exposition. The broad areas mentioned above are intended to encompass a wide range of mathematical techniques and scientific, engineering, economic, and industrial applications. These include stochastic and deterministic methods in control, estimation, and identification of systems; modeling and realization of complex control systems; the numerical analysis and related computational methodology of control processes and allied issues; and the development of mathematical theories and techniques that give new insights into old problems or provide the basis for further progress in control theory and optimization. Within the field of optimization, the journal focuses on the parts that are relevant to dynamic and control systems. Contributions to numerical methodology are also welcome in accordance with these aims, especially as related to large-scale problems and decomposition as well as to fundamental questions of convergence and approximation.