无限视界随机线性二次最优控制问题策略梯度的收敛性

IF 1.2 3区数学 Q1 MATHEMATICS Journal of Mathematical Analysis and Applications Pub Date : 2025-07-01 Epub Date: 2025-01-16 DOI:10.1016/j.jmaa.2025.129264

Xinpei Zhang , Guangyan Jia

{"title":"无限视界随机线性二次最优控制问题策略梯度的收敛性","authors":"Xinpei Zhang , Guangyan Jia","doi":"10.1016/j.jmaa.2025.129264","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.</div></div>","PeriodicalId":50147,"journal":{"name":"Journal of Mathematical Analysis and Applications","volume":"547 1","pages":"Article 129264"},"PeriodicalIF":1.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon\",\"authors\":\"Xinpei Zhang , Guangyan Jia\",\"doi\":\"10.1016/j.jmaa.2025.129264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.</div></div>\",\"PeriodicalId\":50147,\"journal\":{\"name\":\"Journal of Mathematical Analysis and Applications\",\"volume\":\"547 1\",\"pages\":\"Article 129264\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Mathematical Analysis and Applications\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022247X25000459\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Mathematical Analysis and Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022247X25000459","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

摘要

最近，在控制和强化学习的背景下，人们对基于梯度的方法的理论性质的研究越来越感兴趣。研究了无限视界连续时间随机线性二次（SLQ）最优控制问题的策略梯度（PG）方法，其中动力学中的漂移项和扩散项既依赖于状态，也依赖于控制。在LQ框架中，最优控制通常可以用线性状态反馈形式表示。因此，我们将SLQ问题表述为策略优化问题，其中策略是线性参数化的。PO公式的一个主要挑战是稳定性约束在策略空间中是非凸的。我们利用梯度控制条件和l -平滑性克服了这一困难，从而建立了梯度流/下降算法的全局指数/线性收敛性。最后给出了一个数值算例，说明了梯度下降法的收敛性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon

Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Mathematical Analysis and Applications 数学-数学

CiteScore

2.50

自引率

7.70%

发文量

790

审稿时长

6 months

期刊介绍： The Journal of Mathematical Analysis and Applications presents papers that treat mathematical analysis and its numerous applications. The journal emphasizes articles devoted to the mathematical treatment of questions arising in physics, chemistry, biology, and engineering, particularly those that stress analytical aspects and novel problems and their solutions. Papers are sought which employ one or more of the following areas of classical analysis: • Analytic number theory • Functional analysis and operator theory • Real and harmonic analysis • Complex analysis • Numerical analysis • Applied mathematics • Partial differential equations • Dynamical systems • Control and Optimization • Probability • Mathematical biology • Combinatorics • Mathematical physics.