SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 400-414, February 2024. Abstract. We prove that, for every discrete-time linear switching system in two complex variables and with finitely many switching states, either the system is Lyapunov stable or there exists a trajectory which escapes to infinity with at least linear speed. We also give a checkable algebraic criterion to distinguish these two cases. This dichotomy was previously known to hold for systems in two real variables but is known to be false in higher dimensions and for systems with infinitely many switching states.
{"title":"A Stability Dichotomy for Discrete-Time Linear Switching Systems in Dimension Two","authors":"Ian D. Morris","doi":"10.1137/23m1551225","DOIUrl":"https://doi.org/10.1137/23m1551225","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 400-414, February 2024. <br/> Abstract. We prove that, for every discrete-time linear switching system in two complex variables and with finitely many switching states, either the system is Lyapunov stable or there exists a trajectory which escapes to infinity with at least linear speed. We also give a checkable algebraic criterion to distinguish these two cases. This dichotomy was previously known to hold for systems in two real variables but is known to be false in higher dimensions and for systems with infinitely many switching states.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 376-399, February 2024. Abstract. We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent and identically distributed in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator’s almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions.
{"title":"Nonlinear Consensus+Innovations under Correlated Heavy-Tailed Noises: Mean Square Convergence Rate and Asymptotics","authors":"Manojlo Vukovic, Dusan Jakovetic, Dragana Bajovic, Soummya Kar","doi":"10.1137/22m1543197","DOIUrl":"https://doi.org/10.1137/22m1543197","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 376-399, February 2024. <br/> Abstract. We consider distributed recursive estimation of consensus+innovations type in the presence of heavy-tailed sensing and communication noises. We allow that the sensing and communication noises are mutually correlated while independent and identically distributed in time, and that they may both have infinite moments of order higher than one (hence having infinite variances). Such heavy-tailed, infinite-variance noises are highly relevant in practice and are shown to occur, e.g., in dense internet of things deployments. We develop a consensus+innovations distributed estimator that employs a general nonlinearity in both consensus and innovations steps to combat the noise. We establish the estimator’s almost sure convergence, asymptotic normality, and mean squared error (MSE) convergence. Moreover, we establish and explicitly quantify for the estimator a sublinear MSE convergence rate. We then quantify through analytical examples the effects of the nonlinearity choices and the noises correlation on the system performance. Finally, numerical examples corroborate our findings and verify that the proposed method works in the simultaneous heavy-tail communication-sensing noise setting, while existing methods fail under the same noise conditions.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139578867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 351-375, February 2024. Abstract. We introduce two smoothed policy iteration algorithms (SPIs) as rules for learning policies and methods for computing Nash equilibria in second order potential mean field games (MFGs). Global convergence is proved if the coupling term in the MFG system satisfies the Lasry–Lions monotonicity condition. Local convergence to a stable solution is proved for a system which may have multiple solutions. The convergence analysis shows close connections between SPIs and the fictitious play algorithm, which has been widely studied in the MFG literature. Numerical simulation results based on finite difference schemes are presented to supplement the theoretical analysis.
{"title":"Learning Optimal Policies in Potential Mean Field Games: Smoothed Policy Iteration Algorithms","authors":"Qing Tang, Jiahao Song","doi":"10.1137/22m1539861","DOIUrl":"https://doi.org/10.1137/22m1539861","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 351-375, February 2024. <br/> Abstract. We introduce two smoothed policy iteration algorithms (SPIs) as rules for learning policies and methods for computing Nash equilibria in second order potential mean field games (MFGs). Global convergence is proved if the coupling term in the MFG system satisfies the Lasry–Lions monotonicity condition. Local convergence to a stable solution is proved for a system which may have multiple solutions. The convergence analysis shows close connections between SPIs and the fictitious play algorithm, which has been widely studied in the MFG literature. Numerical simulation results based on finite difference schemes are presented to supplement the theoretical analysis.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 271-296, February 2024. Abstract. There are three different forms of adjoint inclusions that appear in the most advanced necessary optimality conditions for optimal control problems involving differential inclusions: Euler–Lagrange inclusion (with partial convexification) [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309], fully convexified Hamiltonian inclusion [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816], and partially convexified Hamiltonian inclusion [P. D. Loewen and R. T. Rockafellar, SIAM J. Control Optim., 34 (1996), pp. 1496–1511], [A. D. Ioffe, Trans. Amer. Math. Soc., 349 (1997), pp. 2871–2900], [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] (for convex-valued differential inclusions in the first two references). This paper addresses all three types of necessary conditions for problems with (in general) nonconvex-valued differential inclusions. The first of the two main theorems, with the Euler–Lagrange inclusion, is equivalent to the main result of [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309] but proved in a substantially different and much more direct way. The second theorem contains conditions that guarantee necessity of both types of Hamiltonian conditions. It seems to be the first result of such a sort that covers differential inclusions with possibly unbounded values and contains the most recent results of [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816] and [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] as particular cases. And again, the proof of the theorem is based on a substantially different approach.
SIAM 控制与优化期刊》第 62 卷第 1 期第 271-296 页,2024 年 2 月。 摘要。在涉及微分夹杂的最优控制问题的最先进的必要最优性条件中,出现了三种不同形式的邻接夹杂:欧拉-拉格朗日包含(部分凸化)[A. D. Ioffe, J. Optim. Theory Appl.D. Loewen 和 R. T. Rockafellar,SIAM J. Control Optim.,34 (1996),第 1496-1511 页],[A. D. Ioffe,Trans. Amer. Math. Soc.,349 (1997),第 2871-2900 页],[R. B. Vinter,SIAM J. Control Optim.,52 (2014),第 1237-1250 页](前两个参考文献中的凸值微分夹杂)。本文论述了(一般情况下)非凸值微分夹杂问题的所有三类必要条件。两个主要定理中的第一个,即欧拉-拉格朗日包含,等同于 [A. D. Ioffe, J. Optim. Theory Appl.第二个定理包含保证两类汉密尔顿条件必要性的条件。它似乎是第一个涵盖可能无界值的微分夹杂的此类结果,并包含[F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816] 和[R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp.同样,该定理的证明基于一种本质上不同的方法。
{"title":"Maximum Principles for Optimal Control Problems with Differential Inclusions","authors":"A. D. Ioffe","doi":"10.1137/22m1540740","DOIUrl":"https://doi.org/10.1137/22m1540740","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 271-296, February 2024. <br/> Abstract. There are three different forms of adjoint inclusions that appear in the most advanced necessary optimality conditions for optimal control problems involving differential inclusions: Euler–Lagrange inclusion (with partial convexification) [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309], fully convexified Hamiltonian inclusion [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816], and partially convexified Hamiltonian inclusion [P. D. Loewen and R. T. Rockafellar, SIAM J. Control Optim., 34 (1996), pp. 1496–1511], [A. D. Ioffe, Trans. Amer. Math. Soc., 349 (1997), pp. 2871–2900], [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] (for convex-valued differential inclusions in the first two references). This paper addresses all three types of necessary conditions for problems with (in general) nonconvex-valued differential inclusions. The first of the two main theorems, with the Euler–Lagrange inclusion, is equivalent to the main result of [A. D. Ioffe, J. Optim. Theory Appl., 182 (2019), pp. 285–309] but proved in a substantially different and much more direct way. The second theorem contains conditions that guarantee necessity of both types of Hamiltonian conditions. It seems to be the first result of such a sort that covers differential inclusions with possibly unbounded values and contains the most recent results of [F. H. Clarke, Mem. Amer. Math. Soc., 173 (2005), 816] and [R. B. Vinter, SIAM J. Control Optim., 52 (2014), pp. 1237–1250] as particular cases. And again, the proof of the theorem is based on a substantially different approach.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 297-325, February 2024. Abstract. Sampled-data control of PDEs has become an active research area; however, existing results are confined to deterministic PDEs. Sampled-data controller design of stochastic PDEs is a challenging open problem. In this paper we suggest a solution to this problem for 1D stochastic diffusion-reaction equations under discrete-time nonlocal measurement via the modal decomposition method, where both the considered system and the measurement are subject to nonlinear multiplicative noise. We present two methods: a direct one with sampled-data controller implemented via zero-order hold device, and a dynamic-extension-based one with sampled-data controller implemented via a generalized hold device. For both methods, we provide mean-square [math] exponential stability analysis of the full-order closed-loop system. We construct a Lyapunov functional [math] that depends on both the deterministic and stochastic parts of the finite-dimensional part of the closed-loop system. We employ corresponding Itô’s formulas for stochastic ODEs and PDEs, respectively, and further combine [math] with Halanay’s inequality with respect to the expected value of [math] to compensate for sampling in the infinite-dimensional tail. We provide linear matrix inequalities (LMIs) for finding the observer dimension and upper bounds on sampling intervals and noise intensities that preserve the mean-square exponential stability. We prove that the LMIs are always feasible for large enough observer dimension and small enough bounds on sampling intervals and noise intensities. A numerical example demonstrates the efficiency of our methods. The example shows that for the same bounds on noise intensities, the dynamic-extension-based controller allows larger sampling intervals, but this is due to its complexity (generalized hold device for sample-data implementation compared to zero-order hold for the direct method).
{"title":"Sampled-Data Finite-Dimensional Observer-Based Control of 1D Stochastic Parabolic PDEs","authors":"Pengfei Wang, Emilia Fridman","doi":"10.1137/22m1538247","DOIUrl":"https://doi.org/10.1137/22m1538247","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 297-325, February 2024. <br/> Abstract. Sampled-data control of PDEs has become an active research area; however, existing results are confined to deterministic PDEs. Sampled-data controller design of stochastic PDEs is a challenging open problem. In this paper we suggest a solution to this problem for 1D stochastic diffusion-reaction equations under discrete-time nonlocal measurement via the modal decomposition method, where both the considered system and the measurement are subject to nonlinear multiplicative noise. We present two methods: a direct one with sampled-data controller implemented via zero-order hold device, and a dynamic-extension-based one with sampled-data controller implemented via a generalized hold device. For both methods, we provide mean-square [math] exponential stability analysis of the full-order closed-loop system. We construct a Lyapunov functional [math] that depends on both the deterministic and stochastic parts of the finite-dimensional part of the closed-loop system. We employ corresponding Itô’s formulas for stochastic ODEs and PDEs, respectively, and further combine [math] with Halanay’s inequality with respect to the expected value of [math] to compensate for sampling in the infinite-dimensional tail. We provide linear matrix inequalities (LMIs) for finding the observer dimension and upper bounds on sampling intervals and noise intensities that preserve the mean-square exponential stability. We prove that the LMIs are always feasible for large enough observer dimension and small enough bounds on sampling intervals and noise intensities. A numerical example demonstrates the efficiency of our methods. The example shows that for the same bounds on noise intensities, the dynamic-extension-based controller allows larger sampling intervals, but this is due to its complexity (generalized hold device for sample-data implementation compared to zero-order hold for the direct method).","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 326-350, February 2024. Abstract. We consider a class of stochastic optimal control problems with partial observation, and study their approximation by discrete-time control problems. We establish a convergence result by using the weak convergence technique of Kushner and Dupuis [Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York], together with the notion of relaxed control rule introduced by El Karoui, Huù Nguyen and Jeanblanc-Picqué [SIAM J. Control Optim., 26 (1988), pp. 1025–1061]. In particular, with a well chosen discrete-time control system, we obtain a first implementable numerical algorithm (with convergence) for the partially observed control problem. Moreover, our discrete-time approximation result would open the door to study convergence of more general numerical approximation methods, such as machine learning based methods. Finally, we illustrate our convergence result by numerical experiments on a partially observed control problem in a linear quadratic setting.
SIAM 控制与优化期刊》第 62 卷第 1 期第 326-350 页,2024 年 2 月。 摘要。我们考虑了一类具有部分观测的随机最优控制问题,并研究了它们的离散时间控制问题近似。我们利用 Kushner 和 Dupuis [Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York] 的弱收敛技术,以及 El Karoui、Huù Nguyen 和 Jeanblanc-Picqué [SIAM J. Control Optim.特别是,在离散时间控制系统选择良好的情况下,我们首次获得了部分观测控制问题的可实现数值算法(具有收敛性)。此外,我们的离散时间近似结果为研究更一般的数值近似方法(如基于机器学习的方法)的收敛性打开了大门。最后,我们通过对线性二次方程环境下部分观测控制问题的数值实验来说明我们的收敛结果。
{"title":"Discrete-Time Approximation of Stochastic Optimal Control with Partial Observation","authors":"Yunzhang Li, Xiaolu Tan, Shanjian Tang","doi":"10.1137/23m1549018","DOIUrl":"https://doi.org/10.1137/23m1549018","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 326-350, February 2024. <br/> Abstract. We consider a class of stochastic optimal control problems with partial observation, and study their approximation by discrete-time control problems. We establish a convergence result by using the weak convergence technique of Kushner and Dupuis [Numerical Methods for Stochastic Control Problems in Continuous Time, Springer, New York], together with the notion of relaxed control rule introduced by El Karoui, Huù Nguyen and Jeanblanc-Picqué [SIAM J. Control Optim., 26 (1988), pp. 1025–1061]. In particular, with a well chosen discrete-time control system, we obtain a first implementable numerical algorithm (with convergence) for the partially observed control problem. Moreover, our discrete-time approximation result would open the door to study convergence of more general numerical approximation methods, such as machine learning based methods. Finally, we illustrate our convergence result by numerical experiments on a partially observed control problem in a linear quadratic setting.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 243-270, February 2024. Abstract. This paper proposes a new mathematical paradigm to analyze discrete-time mean-field games. It is shown that finding Nash equilibrium solutions for a general class of discrete-time mean-field games is equivalent to solving an optimization problem with bounded variables and simple convex constraints, called MF-OMO. This equivalence framework enables finding multiple (and possibly all) Nash equilibrium solutions of mean-field games by standard algorithms. For instance, projected gradient descent is shown to be capable of retrieving all possible Nash equilibrium solutions when there are finitely many of them, with proper initializations. Moreover, analyzing mean-field games with linear rewards and mean-field independent dynamics is reduced to solving a finite number of linear programs, hence solvable in finite time. This framework does not rely on the contractive and the monotone assumptions and the uniqueness of the Nash equilibrium.
{"title":"MF-OMO: An Optimization Formulation of Mean-Field Games","authors":"Xin Guo, Anran Hu, Junzi Zhang","doi":"10.1137/22m1524084","DOIUrl":"https://doi.org/10.1137/22m1524084","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 243-270, February 2024. <br/> Abstract. This paper proposes a new mathematical paradigm to analyze discrete-time mean-field games. It is shown that finding Nash equilibrium solutions for a general class of discrete-time mean-field games is equivalent to solving an optimization problem with bounded variables and simple convex constraints, called MF-OMO. This equivalence framework enables finding multiple (and possibly all) Nash equilibrium solutions of mean-field games by standard algorithms. For instance, projected gradient descent is shown to be capable of retrieving all possible Nash equilibrium solutions when there are finitely many of them, with proper initializations. Moreover, analyzing mean-field games with linear rewards and mean-field independent dynamics is reduced to solving a finite number of linear programs, hence solvable in finite time. This framework does not rely on the contractive and the monotone assumptions and the uniqueness of the Nash equilibrium.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 220-242, February 2024. Abstract. Stabilization of a class of time-varying parabolic equations with uncertain input data using receding horizon control (RHC) is investigated. The diffusion coefficient and the initial function are prescribed as random fields. We consider both cases: uniform and log-normal distributions of the diffusion coefficient. The controls are chosen to be finite-dimensional and enter into the system as a linear combination of finitely many indicator functions (actuators) supported in open subsets of the spatial domain. Under suitable regularity assumptions, we study the expected (averaged) stabilizability of the RHC-controlled system with respect to the number of actuators. An upper bound is also obtained for the failure probability of RHC in relation to the choice of the number of actuators and parameters in the equation.
{"title":"Analysis of RHC for Stabilization of Nonautonomous Parabolic Equations Under Uncertainty","authors":"Behzad Azmi, Lukas Herrmann, Karl Kunisch","doi":"10.1137/23m1550876","DOIUrl":"https://doi.org/10.1137/23m1550876","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 220-242, February 2024. <br/> Abstract. Stabilization of a class of time-varying parabolic equations with uncertain input data using receding horizon control (RHC) is investigated. The diffusion coefficient and the initial function are prescribed as random fields. We consider both cases: uniform and log-normal distributions of the diffusion coefficient. The controls are chosen to be finite-dimensional and enter into the system as a linear combination of finitely many indicator functions (actuators) supported in open subsets of the spatial domain. Under suitable regularity assumptions, we study the expected (averaged) stabilizability of the RHC-controlled system with respect to the number of actuators. An upper bound is also obtained for the failure probability of RHC in relation to the choice of the number of actuators and parameters in the equation.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139501359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 191-219, February 2024. Abstract. We study a stochastically perturbed version of the well-known Krasnoselskii–Mann iteration for computing fixed points of nonexpansive maps in finite dimensional normed spaces. We discuss sufficient conditions on the stochastic noise and stepsizes that guarantee almost sure convergence of the iterates towards a fixed point and derive nonasymptotic error bounds and convergence rates for the fixed-point residuals. Our main results concern the case of a martingale difference noise with variances that can possibly grow unbounded. This supports an application to reinforcement learning for average reward Markov decision processes, for which we establish convergence and asymptotic rates. We also analyze in depth the case where the noise has uniformly bounded variance, obtaining error bounds with explicit computable constants.
{"title":"Stochastic Fixed-Point Iterations for Nonexpansive Maps: Convergence and Error Bounds","authors":"Mario Bravo, Roberto Cominetti","doi":"10.1137/22m1515550","DOIUrl":"https://doi.org/10.1137/22m1515550","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 191-219, February 2024. <br/> Abstract. We study a stochastically perturbed version of the well-known Krasnoselskii–Mann iteration for computing fixed points of nonexpansive maps in finite dimensional normed spaces. We discuss sufficient conditions on the stochastic noise and stepsizes that guarantee almost sure convergence of the iterates towards a fixed point and derive nonasymptotic error bounds and convergence rates for the fixed-point residuals. Our main results concern the case of a martingale difference noise with variances that can possibly grow unbounded. This supports an application to reinforcement learning for average reward Markov decision processes, for which we establish convergence and asymptotic rates. We also analyze in depth the case where the noise has uniformly bounded variance, obtaining error bounds with explicit computable constants.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139500222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 135-166, February 2024. Abstract. This work uses the entropy-regularized relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein, an agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies, on the one hand, explore the space and hence facilitate learning, but, on the other hand, they introduce bias by assigning a positive probability to nonoptimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularization. We study algorithms resulting from two entropy regularization formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalizes policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadratic costs. In this setting, both algorithms yield a Gaussian relaxed policy. We quantify the precise difference between the value functions of a Gaussian policy and its noisy evaluation and show that the execution noise must be independent across time. By tuning the frequency of sampling from relaxed policies and the parameter governing the strength of entropy regularization, we prove that the regret, for both learning algorithms, is of the order [math] (up to a logarithmic factor) over [math] episodes, matching the best known result from the literature.
{"title":"Optimal Scheduling of Entropy Regularizer for Continuous-Time Linear-Quadratic Reinforcement Learning","authors":"Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang","doi":"10.1137/22m1515744","DOIUrl":"https://doi.org/10.1137/22m1515744","url":null,"abstract":"SIAM Journal on Control and Optimization, Volume 62, Issue 1, Page 135-166, February 2024. <br/> Abstract. This work uses the entropy-regularized relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein, an agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies, on the one hand, explore the space and hence facilitate learning, but, on the other hand, they introduce bias by assigning a positive probability to nonoptimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularization. We study algorithms resulting from two entropy regularization formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalizes policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadratic costs. In this setting, both algorithms yield a Gaussian relaxed policy. We quantify the precise difference between the value functions of a Gaussian policy and its noisy evaluation and show that the execution noise must be independent across time. By tuning the frequency of sampling from relaxed policies and the parameter governing the strength of entropy regularization, we prove that the regret, for both learning algorithms, is of the order [math] (up to a logarithmic factor) over [math] episodes, matching the best known result from the literature.","PeriodicalId":49531,"journal":{"name":"SIAM Journal on Control and Optimization","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139500602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}