{"title":"无限视界随机线性二次最优控制问题策略梯度的收敛性","authors":"Xinpei Zhang , Guangyan Jia","doi":"10.1016/j.jmaa.2025.129264","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.</div></div>","PeriodicalId":50147,"journal":{"name":"Journal of Mathematical Analysis and Applications","volume":"547 1","pages":"Article 129264"},"PeriodicalIF":1.2000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon\",\"authors\":\"Xinpei Zhang , Guangyan Jia\",\"doi\":\"10.1016/j.jmaa.2025.129264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.</div></div>\",\"PeriodicalId\":50147,\"journal\":{\"name\":\"Journal of Mathematical Analysis and Applications\",\"volume\":\"547 1\",\"pages\":\"Article 129264\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Mathematical Analysis and Applications\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022247X25000459\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Mathematical Analysis and Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022247X25000459","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
Convergence of policy gradient for stochastic linear quadratic optimal control problems in infinite horizon
Recently, there has been an increasing interest in studying theoretical properties of gradient-based methods in the context of control and reinforcement learning. This article studies the policy gradient (PG) method for infinite horizon continuous-time stochastic linear quadratic (SLQ) optimal control problems, where the drift and diffusion terms in dynamics depend on both the state and control. Within the LQ framework, the optimal controls can usually be expressed by a linear state feedback form. Thus we formulate the SLQ problem as a policy optimization (PO) problem, where the policy is linearly parameterized. A main challenge for the PO formulation is that the stability constraints are nonconvex in the policy space. We overcome this difficulty by leveraging the gradient domination condition and L-smoothness property, thereby establishing global exponential/linear convergence of the gradient flow/descent algorithm. Finally, a numerical example is given to illustrate the convergence of gradient descent method.
期刊介绍:
The Journal of Mathematical Analysis and Applications presents papers that treat mathematical analysis and its numerous applications. The journal emphasizes articles devoted to the mathematical treatment of questions arising in physics, chemistry, biology, and engineering, particularly those that stress analytical aspects and novel problems and their solutions.
Papers are sought which employ one or more of the following areas of classical analysis:
• Analytic number theory
• Functional analysis and operator theory
• Real and harmonic analysis
• Complex analysis
• Numerical analysis
• Applied mathematics
• Partial differential equations
• Dynamical systems
• Control and Optimization
• Probability
• Mathematical biology
• Combinatorics
• Mathematical physics.