无限视界随机线性二次最优控制问题策略梯度的收敛性

IF 1.2 3区 数学 Q1 MATHEMATICS Journal of Mathematical Analysis and Applications Pub Date : 2025-07-01 Epub Date: 2025-01-16 DOI:10.1016/j.jmaa.2025.129264
Xinpei Zhang , Guangyan Jia
{"title":"无限视界随机线性二次最优控制问题策略梯度的收敛性","authors":"Xinpei Zhang ,&nbsp;Guangyan Jia","doi":"10.1016/j.jmaa.2025.129264","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.</div></div>","PeriodicalId":50147,"journal":{"name":"Journal of Mathematical Analysis and Applications","volume":"547 1","pages":"Article 129264"},"PeriodicalIF":1.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon\",\"authors\":\"Xinpei Zhang ,&nbsp;Guangyan Jia\",\"doi\":\"10.1016/j.jmaa.2025.129264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.</div></div>\",\"PeriodicalId\":50147,\"journal\":{\"name\":\"Journal of Mathematical Analysis and Applications\",\"volume\":\"547 1\",\"pages\":\"Article 129264\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Mathematical Analysis and Applications\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022247X25000459\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Mathematical Analysis and Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022247X25000459","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0

摘要

最近,在控制和强化学习的背景下,人们对基于梯度的方法的理论性质的研究越来越感兴趣。研究了无限视界连续时间随机线性二次(SLQ)最优控制问题的策略梯度(PG)方法,其中动力学中的漂移项和扩散项既依赖于状态,也依赖于控制。在LQ框架中,最优控制通常可以用线性状态反馈形式表示。因此,我们将SLQ问题表述为策略优化问题,其中策略是线性参数化的。PO公式的一个主要挑战是稳定性约束在策略空间中是非凸的。我们利用梯度控制条件和l -平滑性克服了这一困难,从而建立了梯度流/下降算法的全局指数/线性收敛性。最后给出了一个数值算例,说明了梯度下降法的收敛性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon
Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.50
自引率
7.70%
发文量
790
审稿时长
6 months
期刊介绍: The Journal of Mathematical Analysis and Applications presents papers that treat mathematical analysis and its numerous applications. The journal emphasizes articles devoted to the mathematical treatment of questions arising in physics, chemistry, biology, and engineering, particularly those that stress analytical aspects and novel problems and their solutions. Papers are sought which employ one or more of the following areas of classical analysis: • Analytic number theory • Functional analysis and operator theory • Real and harmonic analysis • Complex analysis • Numerical analysis • Applied mathematics • Partial differential equations • Dynamical systems • Control and Optimization • Probability • Mathematical biology • Combinatorics • Mathematical physics.
期刊最新文献
Remarks on positive solutions to a p-Laplacian problem with a possibly singular nonlinearity On asymptotic expansions of resolvents for Poisson distributed random Schrödinger operators On the existence of optimal controls for reflected McKean–Vlasov stochastic differential equations Dispersive decay for the inter-critical nonlinear Schrödinger equation in R3 A conjecture of Radu and Sellers on congruences modulo powers of 2 for broken 3-diamond partitions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1