{"title":"Policy Iteration for Exploratory Hamilton–Jacobi–Bellman Equations","authors":"Hung Vinh Tran, Zhenhua Wang, Yuming Paul Zhang","doi":"10.1007/s00245-025-10249-3","DOIUrl":null,"url":null,"abstract":"<div><p>We study the policy iteration algorithm (PIA) for entropy-regularized stochastic control problems on an infinite time horizon with a large discount rate, focusing on two main scenarios. First, we analyze PIA with bounded coefficients where the controls applied to the diffusion term satisfy a smallness condition. We demonstrate the convergence of PIA based on a uniform <span>\\({{\\mathcal {C}}}^{2,\\alpha }\\)</span> estimate for the value sequence generated by PIA, and provide a quantitative convergence analysis for this scenario. Second, we investigate PIA with unbounded coefficients but no control over the diffusion term. In this scenario, we first provide the well-posedness of the exploratory Hamilton–Jacobi–Bellman equation with linear growth coefficients and polynomial growth reward function. By such a well-posedess result we achieve PIA’s convergence by establishing a quantitative locally uniform <span>\\({{\\mathcal {C}}}^{1,\\alpha }\\)</span> estimates for the generated value sequence.</p></div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"91 2","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-025-10249-3","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
We study the policy iteration algorithm (PIA) for entropy-regularized stochastic control problems on an infinite time horizon with a large discount rate, focusing on two main scenarios. First, we analyze PIA with bounded coefficients where the controls applied to the diffusion term satisfy a smallness condition. We demonstrate the convergence of PIA based on a uniform \({{\mathcal {C}}}^{2,\alpha }\) estimate for the value sequence generated by PIA, and provide a quantitative convergence analysis for this scenario. Second, we investigate PIA with unbounded coefficients but no control over the diffusion term. In this scenario, we first provide the well-posedness of the exploratory Hamilton–Jacobi–Bellman equation with linear growth coefficients and polynomial growth reward function. By such a well-posedess result we achieve PIA’s convergence by establishing a quantitative locally uniform \({{\mathcal {C}}}^{1,\alpha }\) estimates for the generated value sequence.
期刊介绍:
The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.