首页 > 最新文献

SIAM Journal on Control and Optimization最新文献

英文 中文
A Stability Dichotomy for Discrete-Time Linear Switching Systems in Dimension Two 二维离散时间线性开关系统的稳定性二分法
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-31 DOI: 10.1137/23m1551225
Ian D. Morris
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 400-414, February 2024.
Abstract. We prove that, for every discrete-time linear switching system in two complex variables and with finitely many switching states, either the system is Lyapunov stable or there exists a trajectory which escapes to infinity with at least linear speed. We also give a checkable algebraic criterion to distinguish these two cases. This dichotomy was previously known to hold for systems in two real variables but is known to be false in higher dimensions and for systems with infinitely many switching states.
SIAM 控制与优化期刊》第 62 卷第 1 期第 400-414 页,2024 年 2 月。 摘要我们证明,对于每一个具有有限多个切换状态的双复变离散时间线性切换系统,要么系统是李亚普诺夫稳定的,要么存在一个至少以线性速度逃逸到无穷远的轨迹。我们还给出了一个可检查的代数准则来区分这两种情况。这种二分法以前已知在两个实变系统中成立,但已知在更高维度和具有无限多切换状态的系统中是错误的。
{"title":"A Stability Dichotomy for Discrete-Time Linear Switching Systems in Dimension Two","authors":"Ian D. Morris","doi":"10.1137/23m1551225","DOIUrl":"https://doi.org/10.1137/23m1551225","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 400-414, February 2024. <br/> Abstract. We prove that, for every discrete-time linear switching system in two complex variables and with finitely many switching states, either the system is Lyapunov stable or there exists a trajectory which escapes to infinity with at least linear speed. We also give a checkable algebraic criterion to distinguish these two cases. This dichotomy was previously known to hold for systems in two real variables but is known to be false in higher dimensions and for systems with infinitely many switching states.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonlinear Consensus+Innovations under Correlated Heavy-Tailed Noises: Mean Square Convergence Rate and Asymptotics 相关重尾噪声下的非线性共识+创新:均方收敛率和渐近线
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-29 DOI: 10.1137/22m1543197
Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 376-399, February 2024.
Abstract. We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent and identically distributed in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator’s almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions.
SIAM 控制与优化期刊》第 62 卷第 1 期第 376-399 页,2024 年 2 月。 摘要。我们考虑在重尾传感和通信噪声存在的情况下进行共识+创新类型的分布式递归估计。我们允许传感噪声和通信噪声相互关联,同时在时间上独立且同分布,而且它们都可能具有阶数大于 1 的无限矩(因此具有无限方差)。这种重尾、无限方差噪声在实践中非常重要,例如在密集的物联网部署中就会出现。我们开发了一种 "共识+创新 "分布式估计器,它在共识和创新步骤中都采用了一般非线性来对抗噪声。我们确定了估计器的几乎确定收敛性、渐近正态性和均方误差(MSE)收敛性。此外,我们还建立并明确量化了估计器的亚线性 MSE 收敛率。然后,我们通过分析实例量化了非线性选择和噪声相关性对系统性能的影响。最后,数值示例证实了我们的发现,并验证了所提出的方法在同时存在重尾通信-感应噪声的情况下是有效的,而现有的方法在相同的噪声条件下却失效了。
{"title":"Nonlinear Consensus+Innovations under Correlated Heavy-Tailed Noises: Mean Square Convergence Rate and Asymptotics","authors":"Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar","doi":"10.1137/22m1543197","DOIUrl":"https://doi.org/10.1137/22m1543197","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 376-399, February 2024. <br/> Abstract. We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent and identically distributed in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator’s almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139578867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Optimal Policies in Potential Mean Field Games: Smoothed Policy Iteration Algorithms 在潜在均值场博弈中学习最优策略:平滑政策迭代算法
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-24 DOI: 10.1137/22m1539861
Qing Tang, Jiahao Song
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 351-375, February 2024.
Abstract. We introduce two smoothed policy iteration algorithms (SPIs) as rules for learning policies and methods for computing Nash equilibria in second order potential mean field games (MFGs). Global convergence is proved if the coupling term in the MFG system satisfies the Lasry–Lions monotonicity condition. Local convergence to a stable solution is proved for a system which may have multiple solutions. The convergence analysis shows close connections between SPIs and the fictitious play algorithm, which has been widely studied in the MFG literature. Numerical simulation results based on finite difference schemes are presented to supplement the theoretical analysis.
SIAM 控制与优化期刊》第 62 卷第 1 期第 351-375 页,2024 年 2 月。 摘要。我们介绍了两种平滑策略迭代算法(SPIs),作为二阶势均场博弈(MFGs)中学习策略的规则和计算纳什均衡的方法。如果 MFG 系统中的耦合项满足 Lasry-Lions 单调性条件,就能证明全局收敛性。对于可能有多个解的系统,证明了向稳定解的局部收敛。收敛性分析表明了 SPI 与虚构游戏算法之间的密切联系,后者已在 MFG 文献中得到广泛研究。本文给出了基于有限差分方案的数值模拟结果,以补充理论分析。
{"title":"Learning Optimal Policies in Potential Mean Field Games: Smoothed Policy Iteration Algorithms","authors":"Qing Tang, Jiahao Song","doi":"10.1137/22m1539861","DOIUrl":"https://doi.org/10.1137/22m1539861","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 351-375, February 2024. <br/> Abstract. We introduce two smoothed policy iteration algorithms (SPIs) as rules for learning policies and methods for computing Nash equilibria in second order potential mean field games (MFGs). Global convergence is proved if the coupling term in the MFG system satisfies the Lasry–Lions monotonicity condition. Local convergence to a stable solution is proved for a system which may have multiple solutions. The convergence analysis shows close connections between SPIs and the fictitious play algorithm, which has been widely studied in the MFG literature. Numerical simulation results based on finite difference schemes are presented to supplement the theoretical analysis.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Principles for Optimal Control Problems with Differential Inclusions 带微分夹杂的最优控制问题的最大原则
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-23 DOI: 10.1137/22m1540740
A. D. Ioffe
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 271-296, February 2024.
Abstract. There are three different forms of adjoint inclusions that appear in the most advanced necessary optimality conditions for optimal control problems involving differential inclusions: Euler–Lagrange inclusion (with partial convexification) [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309], fully convexified Hamiltonian inclusion [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816], and partially convexified Hamiltonian inclusion [P. D. Loewen and R. T. Rockafellar, SIAM J. Control Optim., 34 (1996), pp. 1496–1511], [A. D. Ioffe, Trans. Amer. Math. Soc., 349 (1997), pp. 2871–2900], [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] (for convex-valued differential inclusions in the first two references). This paper addresses all three types of necessary conditions for problems with (in general) nonconvex-valued differential inclusions. The first of the two main theorems, with the Euler–Lagrange inclusion, is equivalent to the main result of [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309] but proved in a substantially different and much more direct way. The second theorem contains conditions that guarantee necessity of both types of Hamiltonian conditions. It seems to be the first result of such a sort that covers differential inclusions with possibly unbounded values and contains the most recent results of [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816] and [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] as particular cases. And again, the proof of the theorem is based on a substantially different approach.
SIAM 控制与优化期刊》第 62 卷第 1 期第 271-296 页,2024 年 2 月。 摘要。在涉及微分夹杂的最优控制问题的最先进的必要最优性条件中,出现了三种不同形式的邻接夹杂:欧拉-拉格朗日包含(部分凸化)[A. D. Ioffe, J. Optim. Theory Appl.D. Loewen 和 R. T. Rockafellar,SIAM J. Control Optim.,34 (1996),第 1496-1511 页],[A. D. Ioffe,Trans. Amer. Math. Soc.,349 (1997),第 2871-2900 页],[R. B. Vinter,SIAM J. Control Optim.,52 (2014),第 1237-1250 页](前两个参考文献中的凸值微分夹杂)。本文论述了(一般情况下)非凸值微分夹杂问题的所有三类必要条件。两个主要定理中的第一个,即欧拉-拉格朗日包含,等同于 [A. D. Ioffe, J. Optim. Theory Appl.第二个定理包含保证两类汉密尔顿条件必要性的条件。它似乎是第一个涵盖可能无界值的微分夹杂的此类结果,并包含[F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816] 和[R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp.同样,该定理的证明基于一种本质上不同的方法。
{"title":"Maximum Principles for Optimal Control Problems with Differential Inclusions","authors":"A. D. Ioffe","doi":"10.1137/22m1540740","DOIUrl":"https://doi.org/10.1137/22m1540740","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 271-296, February 2024. <br/> Abstract. There are three different forms of adjoint inclusions that appear in the most advanced necessary optimality conditions for optimal control problems involving differential inclusions: Euler–Lagrange inclusion (with partial convexification) [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309], fully convexified Hamiltonian inclusion [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816], and partially convexified Hamiltonian inclusion [P. D. Loewen and R. T. Rockafellar, SIAM J. Control Optim., 34 (1996), pp. 1496–1511], [A. D. Ioffe, Trans. Amer. Math. Soc., 349 (1997), pp. 2871–2900], [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] (for convex-valued differential inclusions in the first two references). This paper addresses all three types of necessary conditions for problems with (in general) nonconvex-valued differential inclusions. The first of the two main theorems, with the Euler–Lagrange inclusion, is equivalent to the main result of [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309] but proved in a substantially different and much more direct way. The second theorem contains conditions that guarantee necessity of both types of Hamiltonian conditions. It seems to be the first result of such a sort that covers differential inclusions with possibly unbounded values and contains the most recent results of [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816] and [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] as particular cases. And again, the proof of the theorem is based on a substantially different approach.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sampled-Data Finite-Dimensional Observer-Based Control of 1D Stochastic Parabolic PDEs 基于采样数据的有限维观测器控制一维随机抛物多项式方程
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-23 DOI: 10.1137/22m1538247
Pengfei Wang, Emilia Fridman
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 297-325, February 2024.
Abstract. Sampled-data control of PDEs has become an active research area; however, existing results are confined to deterministic PDEs. Sampled-data controller design of stochastic PDEs is a challenging open problem. In this paper we suggest a solution to this problem for 1D stochastic diffusion-reaction equations under discrete-time nonlocal measurement via the modal decomposition method, where both the considered system and the measurement are subject to nonlinear multiplicative noise. We present two methods: a direct one with sampled-data controller implemented via zero-order hold device, and a dynamic-extension-based one with sampled-data controller implemented via a generalized hold device. For both methods, we provide mean-square [math] exponential stability analysis of the full-order closed-loop system. We construct a Lyapunov functional [math] that depends on both the deterministic and stochastic parts of the finite-dimensional part of the closed-loop system. We employ corresponding Itô’s formulas for stochastic ODEs and PDEs, respectively, and further combine [math] with Halanay’s inequality with respect to the expected value of [math] to compensate for sampling in the infinite-dimensional tail. We provide linear matrix inequalities (LMIs) for finding the observer dimension and upper bounds on sampling intervals and noise intensities that preserve the mean-square exponential stability. We prove that the LMIs are always feasible for large enough observer dimension and small enough bounds on sampling intervals and noise intensities. A numerical example demonstrates the efficiency of our methods. The example shows that for the same bounds on noise intensities, the dynamic-extension-based controller allows larger sampling intervals, but this is due to its complexity (generalized hold device for sample-data implementation compared to zero-order hold for the direct method).
SIAM 控制与优化期刊》第 62 卷第 1 期第 297-325 页,2024 年 2 月。 摘要PDEs的采样数据控制已成为一个活跃的研究领域;然而,现有成果仅限于确定性PDEs。随机 PDE 的采样数据控制器设计是一个具有挑战性的开放问题。在本文中,我们通过模态分解方法为离散时间非局部测量条件下的一维随机扩散-反应方程提出了一个解决方案,其中所考虑的系统和测量都受到非线性乘法噪声的影响。我们提出了两种方法:一种是通过零阶保持装置实现采样数据控制器的直接方法,另一种是通过广义保持装置实现采样数据控制器的基于动态扩展的方法。对于这两种方法,我们都提供了全阶闭环系统的均方[数学]指数稳定性分析。我们构建了一个 Lyapunov 函数[数学],它取决于闭环系统有限维部分的确定性和随机性部分。我们分别采用随机 ODE 和 PDE 的相应 Itô 公式,并进一步将 [math] 与关于 [math] 期望值的 Halanay 不等式相结合,以补偿无限维尾部的采样。我们提供了线性矩阵不等式 (LMI),用于寻找观察者维度以及保持均方指数稳定性的采样间隔和噪声强度的上限。我们证明,对于足够大的观察者维度以及足够小的采样间隔和噪声强度边界,线性矩阵不等式总是可行的。一个数值示例证明了我们方法的效率。该示例表明,在噪声强度界限相同的情况下,基于动态扩展的控制器允许更大的采样间隔,但这是由于其复杂性造成的(与直接方法的零阶保持相比,采样数据实现的广义保持设备)。
{"title":"Sampled-Data Finite-Dimensional Observer-Based Control of 1D Stochastic Parabolic PDEs","authors":"Pengfei Wang, Emilia Fridman","doi":"10.1137/22m1538247","DOIUrl":"https://doi.org/10.1137/22m1538247","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 297-325, February 2024. <br/> Abstract. Sampled-data control of PDEs has become an active research area; however, existing results are confined to deterministic PDEs. Sampled-data controller design of stochastic PDEs is a challenging open problem. In this paper we suggest a solution to this problem for 1D stochastic diffusion-reaction equations under discrete-time nonlocal measurement via the modal decomposition method, where both the considered system and the measurement are subject to nonlinear multiplicative noise. We present two methods: a direct one with sampled-data controller implemented via zero-order hold device, and a dynamic-extension-based one with sampled-data controller implemented via a generalized hold device. For both methods, we provide mean-square [math] exponential stability analysis of the full-order closed-loop system. We construct a Lyapunov functional [math] that depends on both the deterministic and stochastic parts of the finite-dimensional part of the closed-loop system. We employ corresponding Itô’s formulas for stochastic ODEs and PDEs, respectively, and further combine [math] with Halanay’s inequality with respect to the expected value of [math] to compensate for sampling in the infinite-dimensional tail. We provide linear matrix inequalities (LMIs) for finding the observer dimension and upper bounds on sampling intervals and noise intensities that preserve the mean-square exponential stability. We prove that the LMIs are always feasible for large enough observer dimension and small enough bounds on sampling intervals and noise intensities. A numerical example demonstrates the efficiency of our methods. The example shows that for the same bounds on noise intensities, the dynamic-extension-based controller allows larger sampling intervals, but this is due to its complexity (generalized hold device for sample-data implementation compared to zero-order hold for the direct method).","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete-Time Approximation of Stochastic Optimal Control with Partial Observation 带部分观测的随机最优控制的离散时间逼近
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-23 DOI: 10.1137/23m1549018
Yunzhang Li, Xiaolu Tan, Shanjian Tang
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 326-350, February 2024.
Abstract. We consider a class of stochastic optimal control problems with partial observation, and study their approximation by discrete-time control problems. We establish a convergence result by using the weak convergence technique of Kushner and Dupuis [Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York], together with the notion of relaxed control rule introduced by El Karoui, Huù Nguyen and Jeanblanc-Picqué [SIAM J. Control Optim., 26 (1988), pp. 1025–1061]. In particular, with a well chosen discrete-time control system, we obtain a first implementable numerical algorithm (with convergence) for the partially observed control problem. Moreover, our discrete-time approximation result would open the door to study convergence of more general numerical approximation methods, such as machine learning based methods. Finally, we illustrate our convergence result by numerical experiments on a partially observed control problem in a linear quadratic setting.
SIAM 控制与优化期刊》第 62 卷第 1 期第 326-350 页,2024 年 2 月。 摘要。我们考虑了一类具有部分观测的随机最优控制问题,并研究了它们的离散时间控制问题近似。我们利用 Kushner 和 Dupuis [Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York] 的弱收敛技术,以及 El Karoui、Huù Nguyen 和 Jeanblanc-Picqué [SIAM J. Control Optim.特别是,在离散时间控制系统选择良好的情况下,我们首次获得了部分观测控制问题的可实现数值算法(具有收敛性)。此外,我们的离散时间近似结果为研究更一般的数值近似方法(如基于机器学习的方法)的收敛性打开了大门。最后,我们通过对线性二次方程环境下部分观测控制问题的数值实验来说明我们的收敛结果。
{"title":"Discrete-Time Approximation of Stochastic Optimal Control with Partial Observation","authors":"Yunzhang Li, Xiaolu Tan, Shanjian Tang","doi":"10.1137/23m1549018","DOIUrl":"https://doi.org/10.1137/23m1549018","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 326-350, February 2024. <br/> Abstract. We consider a class of stochastic optimal control problems with partial observation, and study their approximation by discrete-time control problems. We establish a convergence result by using the weak convergence technique of Kushner and Dupuis [Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York], together with the notion of relaxed control rule introduced by El Karoui, Huù Nguyen and Jeanblanc-Picqué [SIAM J. Control Optim., 26 (1988), pp. 1025–1061]. In particular, with a well chosen discrete-time control system, we obtain a first implementable numerical algorithm (with convergence) for the partially observed control problem. Moreover, our discrete-time approximation result would open the door to study convergence of more general numerical approximation methods, such as machine learning based methods. Finally, we illustrate our convergence result by numerical experiments on a partially observed control problem in a linear quadratic setting.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MF-OMO: An Optimization Formulation of Mean-Field Games MF-OMO:均场博弈的优化公式
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-22 DOI: 10.1137/22m1524084
Xin Guo, Anran Hu, Junzi Zhang
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 243-270, February 2024.
Abstract. This paper proposes a new mathematical paradigm to analyze discrete-time mean-field games. It is shown that finding Nash equilibrium solutions for a general class of discrete-time mean-field games is equivalent to solving an optimization problem with bounded variables and simple convex constraints, called MF-OMO. This equivalence framework enables finding multiple (and possibly all) Nash equilibrium solutions of mean-field games by standard algorithms. For instance, projected gradient descent is shown to be capable of retrieving all possible Nash equilibrium solutions when there are finitely many of them, with proper initializations. Moreover, analyzing mean-field games with linear rewards and mean-field independent dynamics is reduced to solving a finite number of linear programs, hence solvable in finite time. This framework does not rely on the contractive and the monotone assumptions and the uniqueness of the Nash equilibrium.
SIAM 控制与优化期刊》第 62 卷第 1 期第 243-270 页,2024 年 2 月。 摘要本文提出了一种分析离散时间均场博弈的新数学范式。研究表明,寻找一般离散时间均场博弈的纳什均衡解等同于求解一个有界变量和简单凸约束的优化问题,即 MF-OMO。这种等价框架可以通过标准算法找到均场博弈的多个(甚至所有)纳什均衡解。例如,在有有限多个纳什均衡解的情况下,通过适当的初始化,投影梯度下降算法就能找到所有可能的纳什均衡解。此外,分析具有线性奖励和均值场独立动力学的均值场博弈,可以简化为求解有限数量的线性程序,因此可以在有限时间内求解。这个框架并不依赖于收缩和单调假设以及纳什均衡的唯一性。
{"title":"MF-OMO: An Optimization Formulation of Mean-Field Games","authors":"Xin Guo, Anran Hu, Junzi Zhang","doi":"10.1137/22m1524084","DOIUrl":"https://doi.org/10.1137/22m1524084","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 243-270, February 2024. <br/> Abstract. This paper proposes a new mathematical paradigm to analyze discrete-time mean-field games. It is shown that finding Nash equilibrium solutions for a general class of discrete-time mean-field games is equivalent to solving an optimization problem with bounded variables and simple convex constraints, called MF-OMO. This equivalence framework enables finding multiple (and possibly all) Nash equilibrium solutions of mean-field games by standard algorithms. For instance, projected gradient descent is shown to be capable of retrieving all possible Nash equilibrium solutions when there are finitely many of them, with proper initializations. Moreover, analyzing mean-field games with linear rewards and mean-field independent dynamics is reduced to solving a finite number of linear programs, hence solvable in finite time. This framework does not rely on the contractive and the monotone assumptions and the uniqueness of the Nash equilibrium.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of RHC for Stabilization of Nonautonomous Parabolic Equations Under Uncertainty 不确定条件下稳定非自主抛物方程的 RHC 分析
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-19 DOI: 10.1137/23m1550876
Behzad Azmi, Lukas Herrmann, Karl Kunisch
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 220-242, February 2024.
Abstract. Stabilization of a class of time-varying parabolic equations with uncertain input data using receding horizon control (RHC) is investigated. The diffusion coefficient and the initial function are prescribed as random fields. We consider both cases: uniform and log-normal distributions of the diffusion coefficient. The controls are chosen to be finite-dimensional and enter into the system as a linear combination of finitely many indicator functions (actuators) supported in open subsets of the spatial domain. Under suitable regularity assumptions, we study the expected (averaged) stabilizability of the RHC-controlled system with respect to the number of actuators. An upper bound is also obtained for the failure probability of RHC in relation to the choice of the number of actuators and parameters in the equation.
SIAM 控制与优化期刊》第 62 卷第 1 期第 220-242 页,2024 年 2 月。 摘要。利用后退地平线控制(RHC)研究了一类具有不确定输入数据的时变抛物线方程的稳定问题。扩散系数和初始函数被规定为随机场。我们考虑了两种情况:扩散系数的均匀分布和对数正态分布。控制被选择为有限维的,并作为有限多个支持空间域开放子集的指示函数(执行器)的线性组合进入系统。在适当的规则性假设下,我们研究了 RHC 控制系统的预期(平均)稳定性与执行器数量的关系。此外,我们还得出了 RHC 失效概率的上限,它与执行器数量和方程参数的选择有关。
{"title":"Analysis of RHC for Stabilization of Nonautonomous Parabolic Equations Under Uncertainty","authors":"Behzad Azmi, Lukas Herrmann, Karl Kunisch","doi":"10.1137/23m1550876","DOIUrl":"https://doi.org/10.1137/23m1550876","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 220-242, February 2024. <br/> Abstract. Stabilization of a class of time-varying parabolic equations with uncertain input data using receding horizon control (RHC) is investigated. The diffusion coefficient and the initial function are prescribed as random fields. We consider both cases: uniform and log-normal distributions of the diffusion coefficient. The controls are chosen to be finite-dimensional and enter into the system as a linear combination of finitely many indicator functions (actuators) supported in open subsets of the spatial domain. Under suitable regularity assumptions, we study the expected (averaged) stabilizability of the RHC-controlled system with respect to the number of actuators. An upper bound is also obtained for the failure probability of RHC in relation to the choice of the number of actuators and parameters in the equation.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139501359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds 非扩展映射的随机定点迭代:收敛性与误差边界
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-18 DOI: 10.1137/22m1515550
Mario Bravo, Roberto Cominetti
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 191-219, February 2024.
Abstract. We study a stochastically perturbed version of the well-known Krasnoselskii–Mann iteration for computing fixed points of nonexpansive maps in finite dimensional normed spaces. We discuss sufficient conditions on the stochastic noise and stepsizes that guarantee almost sure convergence of the iterates towards a fixed point and derive nonasymptotic error bounds and convergence rates for the fixed-point residuals. Our main results concern the case of a martingale difference noise with variances that can possibly grow unbounded. This supports an application to reinforcement learning for average reward Markov decision processes, for which we establish convergence and asymptotic rates. We also analyze in depth the case where the noise has uniformly bounded variance, obtaining error bounds with explicit computable constants.
SIAM 控制与优化期刊》第 62 卷第 1 期第 191-219 页,2024 年 2 月。 摘要。我们研究了著名的 Krasnoselskii-Mann 迭代的随机扰动版本,用于计算有限维规范空间中的无穷映射的定点。我们讨论了随机噪声和步长的充分条件,这些条件保证了迭代几乎肯定收敛于定点,并推导出了非渐近误差边界和定点残差的收敛率。我们的主要结果涉及方差可能无限制增长的鞅差分噪声。这支持了平均报酬马尔可夫决策过程的强化学习应用,我们为其建立了收敛性和渐近率。我们还深入分析了噪声方差均匀有界的情况,获得了具有明确可计算常数的误差边界。
{"title":"Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds","authors":"Mario Bravo, Roberto Cominetti","doi":"10.1137/22m1515550","DOIUrl":"https://doi.org/10.1137/22m1515550","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 191-219, February 2024. <br/> Abstract. We study a stochastically perturbed version of the well-known Krasnoselskii–Mann iteration for computing fixed points of nonexpansive maps in finite dimensional normed spaces. We discuss sufficient conditions on the stochastic noise and stepsizes that guarantee almost sure convergence of the iterates towards a fixed point and derive nonasymptotic error bounds and convergence rates for the fixed-point residuals. Our main results concern the case of a martingale difference noise with variances that can possibly grow unbounded. This supports an application to reinforcement learning for average reward Markov decision processes, for which we establish convergence and asymptotic rates. We also analyze in depth the case where the noise has uniformly bounded variance, obtaining error bounds with explicit computable constants.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139500222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning 连续时间线性-二次强化学习的熵正则优化调度
IF 2.2 2区 数学 Q1 Mathematics Pub Date : 2024-01-17 DOI: 10.1137/22m1515744
Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 135-166, February 2024.
Abstract. This work uses the entropy-regularized relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein, an agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies, on the one hand, explore the space and hence facilitate learning, but, on the other hand, they introduce bias by assigning a positive probability to nonoptimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularization. We study algorithms resulting from two entropy regularization formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalizes policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadratic costs. In this setting, both algorithms yield a Gaussian relaxed policy. We quantify the precise difference between the value functions of a Gaussian policy and its noisy evaluation and show that the execution noise must be independent across time. By tuning the frequency of sampling from relaxed policies and the parameter governing the strength of entropy regularization, we prove that the regret, for both learning algorithms, is of the order [math] (up to a logarithmic factor) over [math] episodes, matching the best known result from the literature.
SIAM 控制与优化期刊》第 62 卷第 1 期第 135-166 页,2024 年 2 月。 摘要本研究从熵规则化松弛随机控制的角度出发,为设计强化学习(RL)算法提供了一个原则性框架。在这里,代理通过根据最优松弛策略生成分布式噪声控制来与环境交互。一方面,噪声策略可以探索空间,从而促进学习,但另一方面,噪声策略会给非最佳行动分配正概率,从而引入偏差。这种探索与利用之间的权衡取决于熵正则化的强度。我们研究了两种熵正则化公式所产生的算法:探索控制法和近似策略更新法,前者将熵添加到成本目标中,后者则对连续事件之间的策略偏差进行惩罚。我们将重点放在有限视界连续时间线性-二次方(LQ)RL 问题上,在该问题中,具有未知漂移系数的线性动力学受到二次方成本的控制。在这种情况下,两种算法都能得到高斯松弛策略。我们量化了高斯策略的值函数与其噪声评估之间的精确差异,并证明了执行噪声必须是跨时间独立的。通过调整从松弛策略中采样的频率和管理熵正则化强度的参数,我们证明了这两种学习算法在[math]事件上的遗憾都是[math]数量级(达到对数因子),与文献中已知的最佳结果相吻合。
{"title":"Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning","authors":"Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang","doi":"10.1137/22m1515744","DOIUrl":"https://doi.org/10.1137/22m1515744","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 135-166, February 2024. <br/> Abstract. This work uses the entropy-regularized relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein, an agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies, on the one hand, explore the space and hence facilitate learning, but, on the other hand, they introduce bias by assigning a positive probability to nonoptimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularization. We study algorithms resulting from two entropy regularization formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalizes policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadratic costs. In this setting, both algorithms yield a Gaussian relaxed policy. We quantify the precise difference between the value functions of a Gaussian policy and its noisy evaluation and show that the execution noise must be independent across time. By tuning the frequency of sampling from relaxed policies and the parameter governing the strength of entropy regularization, we prove that the regret, for both learning algorithms, is of the order [math] (up to a logarithmic factor) over [math] episodes, matching the best known result from the literature.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139500602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
SIAM Journal on Control and Optimization
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1