首页 > 最新文献

IEEE open journal of control systems最新文献

英文 中文
Risk-Aware Stochastic MPC for Chance-Constrained Linear Systems 针对机会受限线性系统的风险意识随机 MPC
Pub Date : 2024-07-01 DOI: 10.1109/OJCSYS.2024.3421372
Pouria Tooranjipour;Bahare Kiumarsi;Hamidreza Modares
This paper presents a fully risk-aware model predictive control (MPC) framework for chance-constrained discrete-time linear control systems with process noise. Conditional value-at-risk (CVaR) as a popular coherent risk measure is incorporated in both the constraints and the cost function of the MPC framework. This allows the system to navigate the entire spectrum of risk assessments, from worst-case to risk-neutral scenarios, ensuring both constraint satisfaction and performance optimization in stochastic environments. The recursive feasibility and risk-aware exponential stability of the resulting risk-aware MPC are demonstrated through rigorous theoretical analysis by considering the disturbance feedback policy parameterization. In the end, two numerical examples are given to elucidate the efficacy of the proposed method.
本文针对具有过程噪声的机会约束离散时间线性控制系统,提出了一种完全风险感知的模型预测控制(MPC)框架。条件风险值(CVaR)作为一种流行的连贯风险度量,被纳入了 MPC 框架的约束条件和成本函数中。这样,系统就能驾驭从最坏情况到风险中性情况的整个风险评估范围,确保在随机环境中既能满足约束条件,又能优化性能。通过对扰动反馈策略参数化的严格理论分析,证明了由此产生的风险感知 MPC 的递归可行性和风险感知指数稳定性。最后,给出了两个数值示例,以阐明所提方法的有效性。
{"title":"Risk-Aware Stochastic MPC for Chance-Constrained Linear Systems","authors":"Pouria Tooranjipour;Bahare Kiumarsi;Hamidreza Modares","doi":"10.1109/OJCSYS.2024.3421372","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3421372","url":null,"abstract":"This paper presents a fully risk-aware model predictive control (MPC) framework for chance-constrained discrete-time linear control systems with process noise. Conditional value-at-risk (CVaR) as a popular coherent risk measure is incorporated in both the constraints and the cost function of the MPC framework. This allows the system to navigate the entire spectrum of risk assessments, from worst-case to risk-neutral scenarios, ensuring both constraint satisfaction and performance optimization in stochastic environments. The recursive feasibility and risk-aware exponential stability of the resulting risk-aware MPC are demonstrated through rigorous theoretical analysis by considering the disturbance feedback policy parameterization. In the end, two numerical examples are given to elucidate the efficacy of the proposed method.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"282-294"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10578318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141631005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging the Turnpike Effect for Mean Field Games Numerics 利用匝道效应进行均值场游戏数值计算
Pub Date : 2024-06-26 DOI: 10.1109/OJCSYS.2024.3419642
René A. Carmona;Claire Zeng
Recently, a deep-learning algorithm referred to as Deep Galerkin Method (DGM), has gained a lot of attention among those trying to solve numerically Mean Field Games with finite horizon, even if the performance seems to be decreasing significantly with increasing horizon. On the other hand, it has been proven that some specific classes of Mean Field Games enjoy some form of the turnpike property identified over seven decades ago by economists. The gist of this phenomenon is a proof that the solution of an optimal control problem over a long time interval spends most of its time near the stationary solution of the ergodic version of the corresponding infinite horizon optimization problem. After reviewing the implementation of DGM for finite horizon Mean Field Games, we introduce a “turnpike-accelerated” version that incorporates the turnpike estimates in the loss function to be optimized, and we perform a comparative numerical analysis to show the advantages of this accelerated version over the baseline DGM algorithm. We demonstrate on some of the Mean Field Game models with local-couplings known to have the turnpike property, as well as a new class of linear-quadratic models for which we derive explicit turnpike estimates.
最近,一种被称为 "深度伽勒金方法(DGM)"的深度学习算法在试图数值求解有限视界均值场博弈的人群中获得了广泛关注,尽管其性能似乎随着视界的增加而显著下降。另一方面,有研究证明,某些特定类别的均值场博弈具有经济学家在七十多年前发现的某种形式的岔道特性。这一现象的要旨是证明了在一个较长的时间间隔内,最优控制问题的解大部分时间都在相应的无限视界优化问题的遍历版本的静态解附近。在回顾了有限视界均值场博弈的 DGM 实现之后,我们引入了 "岔道加速 "版本,该版本将岔道估计纳入了待优化的损失函数中,我们还进行了数值对比分析,以显示该加速版本相对于基准 DGM 算法的优势。我们在一些已知具有岔道特性的局部耦合平均场博弈模型以及一类新的线性二次模型上进行了演示,并得出了明确的岔道估计值。
{"title":"Leveraging the Turnpike Effect for Mean Field Games Numerics","authors":"René A. Carmona;Claire Zeng","doi":"10.1109/OJCSYS.2024.3419642","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3419642","url":null,"abstract":"Recently, a deep-learning algorithm referred to as Deep Galerkin Method (DGM), has gained a lot of attention among those trying to solve numerically Mean Field Games with finite horizon, even if the performance seems to be decreasing significantly with increasing horizon. On the other hand, it has been proven that some specific classes of Mean Field Games enjoy some form of the turnpike property identified over seven decades ago by economists. The gist of this phenomenon is a proof that the solution of an optimal control problem over a long time interval spends most of its time near the stationary solution of the ergodic version of the corresponding infinite horizon optimization problem. After reviewing the implementation of DGM for finite horizon Mean Field Games, we introduce a “turnpike-accelerated” version that incorporates the turnpike estimates in the loss function to be optimized, and we perform a comparative numerical analysis to show the advantages of this accelerated version over the baseline DGM algorithm. We demonstrate on some of the Mean Field Game models with local-couplings known to have the turnpike property, as well as a new class of linear-quadratic models for which we derive explicit turnpike estimates.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"389-404"},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10572276","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning 强化学习中同时学习控制策略和未知安全规范
Pub Date : 2024-06-24 DOI: 10.1109/OJCSYS.2024.3418306
Lunet Yifru;Ali Baheri
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints. A Python implementation of the algorithm can be found at https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git.
在过去的几十年里,强化学习(RL)已经在广泛的领域为决策带来了革命性的变化。然而,在现实世界场景中部署 RL 政策却面临着确保安全的严峻挑战。传统的安全 RL 方法主要侧重于将预定义的安全约束纳入策略学习过程。然而,这种对预定义安全约束的依赖在动态和不可预测的真实世界环境中造成了限制,因为在这种环境中,此类约束可能无法获得或无法充分适应。为了弥补这一缺陷,我们提出了一种新方法,它能同时学习安全的 RL 控制策略,并识别给定环境中的未知安全约束参数。以参数信号时序逻辑(pSTL)安全规范和一个小型初始标注数据集为初始,我们将该问题视为一个双层优化任务,利用孪生延迟深度确定性策略梯度(TD3)算法的拉格朗日变体将约束策略优化与贝叶斯优化巧妙地结合在一起,以优化给定 pSTL 安全规范的参数。通过综合案例研究实验,我们验证了这种方法在不同形式的环境约束下的有效性,并持续产生了具有高回报的安全 RL 政策。此外,我们的研究结果表明,我们成功地学习了 STL 安全约束参数,与真实的环境安全约束高度一致。我们模型的性能与拥有完整安全约束先验知识的理想场景非常接近,这表明它能够准确识别环境安全约束并学习符合这些约束的安全策略。该算法的 Python 实现可在 https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git 上找到。
{"title":"Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning","authors":"Lunet Yifru;Ali Baheri","doi":"10.1109/OJCSYS.2024.3418306","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3418306","url":null,"abstract":"Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints. A Python implementation of the algorithm can be found at \u0000<uri>https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git</uri>\u0000.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"266-281"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10569078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solving Decision-Dependent Games by Learning From Feedback 通过从反馈中学习来解决依赖决策的游戏
Pub Date : 2024-06-19 DOI: 10.1109/OJCSYS.2024.3416768
Killian Wood;Ahmed S. Zamzam;Emiliano Dall'Anese
This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic strongly-monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.
本文探讨了在强单调随机博弈背景下,当分布依赖性未知时,如何解决决策依赖分布的随机优化问题。本文提出了一种两阶段方法,即首先估计决策变量的分布依赖性,然后在估计的分布图上进行优化。论文提出了每个代理成本近似值的保证。此外,还开发并分析了一种基于随机梯度的算法,用于以分布式方式寻找纳什均衡。本文还利用真实世界的数据,对新型电动汽车充电市场模型进行了数值模拟。
{"title":"Solving Decision-Dependent Games by Learning From Feedback","authors":"Killian Wood;Ahmed S. Zamzam;Emiliano Dall'Anese","doi":"10.1109/OJCSYS.2024.3416768","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3416768","url":null,"abstract":"This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic strongly-monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"295-309"},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10564130","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation 通过不解决 OPF 算是解决了 OPF:DAE 控制理论与实时监管的代价
Pub Date : 2024-06-13 DOI: 10.1109/OJCSYS.2024.3414221
Muhammad Nadeem;Ahmad F. Taha
This paper presents a new approach to approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By i) forming the grid dynamics as a system of differential-algebraic equations (DAE) that naturally encode the non-convex OPF power flow constraints, ii) utilizing DAE-Lyapunov theory, and iii) designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing improvements in system transient stability, while always yielding approximate solutions of the ACOPF with no constraint violations. As the studied approach herein yields slightly more expensive realtime generator controls, the corresponding price of realtime control and regulation is examined. Cost comparisons with the traditional ACOPF are also showcased—all via case studies on standard power networks.
本文介绍了一种近似交流最佳功率流(ACOPF)的新方法。通过消除每几分钟求解一次 ACOPF 的需要,本文展示了如何利用实时反馈控制器来替代 ACOPF 及其变体。通过 i) 将电网动态形成一个自然编码非凸 OPF 功率流约束的微分代数方程 (DAE) 系统,ii) 利用 DAE-Lyapunov 理论,iii) 设计一个能捕捉实时不确定性同时又不感知不确定性的反馈控制器,本文提出的方法有望在无需求解 OPF 的情况下获得接近 OPF 的解决方案。所提出的控制器能实时响应可再生能源发电和负载的偏差,保证系统瞬态稳定性的改善,同时始终能得到不违反约束条件的 ACOPF 近似解。由于本文研究的方法产生的实时发电机控制成本略高,因此对实时控制和调节的相应价格进行了研究。此外,还通过标准电力网络的案例研究,展示了与传统 ACOPF 的成本比较。
{"title":"Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation","authors":"Muhammad Nadeem;Ahmad F. Taha","doi":"10.1109/OJCSYS.2024.3414221","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3414221","url":null,"abstract":"This paper presents a new approach to approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By \u0000<italic>i)</i>\u0000 forming the grid dynamics as a system of differential-algebraic equations (DAE) that naturally encode the non-convex OPF power flow constraints, \u0000<italic>ii)</i>\u0000 utilizing DAE-Lyapunov theory, and \u0000<italic>iii)</i>\u0000 designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing improvements in system transient stability, while always yielding approximate solutions of the ACOPF with no constraint violations. As the studied approach herein yields slightly more expensive realtime generator controls, the corresponding price of realtime control and regulation is examined. Cost comparisons with the traditional ACOPF are also showcased—all via case studies on standard power networks.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"253-265"},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10556752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional PID Control of Switched Positive Systems With Multiple Equilibrium Points 具有多个平衡点的开关正系统的区域 PID 控制
Pub Date : 2024-04-18 DOI: 10.1109/OJCSYS.2024.3391001
Pei Zhang;Junfeng Zhang;Xuan Jia
This paper investigates the regional control problem of switched positive systems with multiple equilibrium points. A proportional-integral-derivative controller is designed by combining the output, the error between the state and the equilibrium point, and the difference of output. A cone is introduced to design the final stable region. Two classes of copositive Lyapunov functions are constructed to achieve the stability and regional stability of subsystems and the whole systems, respectively. Then, a novel class of observers with multiple equilibrium points is proposed using a matrix decomposition approach. The observer-based proportional-integral-derivative control problem is thus solved and all states are driven to the designed cone region under the designed controller. All conditions are formulated in the form of linear programming. The novelties of this paper lie in that: (i) A proportional-integral-derivative control framework is introduced for the considered systems, (ii) Luenberger observer is developed for the observer with multiple equilibrium points, and (iii) Copositive Lyapunov functions and linear programming are employed for the analysis and design of controller and observer. Finally, the effectiveness of the proposed design is verified via two examples.
本文研究了具有多个平衡点的开关正系统的区域控制问题。结合输出、状态与平衡点之间的误差以及输出差值,设计了一个比例-积分-派生控制器。引入了一个锥体来设计最终稳定区域。构建了两类共正 Lyapunov 函数,以分别实现子系统和整个系统的稳定性和区域稳定性。然后,利用矩阵分解方法提出了一类具有多个平衡点的新型观测器。从而解决了基于观测器的比例-积分-衍生控制问题,在设计的控制器下,所有状态都被驱动到设计的锥形区域。所有条件均以线性规划的形式提出。本文的新颖之处在于(i) 为所考虑的系统引入了比例-积分-衍生控制框架,(ii) 为具有多个平衡点的观测器开发了卢恩伯格观测器,(iii) 在控制器和观测器的分析和设计中采用了共正 Lyapunov 函数和线性规划。最后,通过两个实例验证了所提设计方案的有效性。
{"title":"Regional PID Control of Switched Positive Systems With Multiple Equilibrium Points","authors":"Pei Zhang;Junfeng Zhang;Xuan Jia","doi":"10.1109/OJCSYS.2024.3391001","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3391001","url":null,"abstract":"This paper investigates the regional control problem of switched positive systems with multiple equilibrium points. A proportional-integral-derivative controller is designed by combining the output, the error between the state and the equilibrium point, and the difference of output. A cone is introduced to design the final stable region. Two classes of copositive Lyapunov functions are constructed to achieve the stability and regional stability of subsystems and the whole systems, respectively. Then, a novel class of observers with multiple equilibrium points is proposed using a matrix decomposition approach. The observer-based proportional-integral-derivative control problem is thus solved and all states are driven to the designed cone region under the designed controller. All conditions are formulated in the form of linear programming. The novelties of this paper lie in that: (i) A proportional-integral-derivative control framework is introduced for the considered systems, (ii) Luenberger observer is developed for the observer with multiple equilibrium points, and (iii) Copositive Lyapunov functions and linear programming are employed for the analysis and design of controller and observer. Finally, the effectiveness of the proposed design is verified via two examples.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"190-201"},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10504945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Bounds for Incremental Hessian Estimation With Application to Zeroth-Order Federated Learning 应用于零阶联合学习的增量赫赛斯估计新界限
Pub Date : 2024-04-15 DOI: 10.1109/OJCSYS.2024.3388374
Alessio Maritan;Luca Schenato;Subhrakanti Dey
The Hessian matrix conveys important information about the curvature, spectrum and partial derivatives of a function, and is required in a variety of tasks. However, computing the exact Hessian is prohibitively expensive for high-dimensional input spaces, and is just impossible in zeroth-order optimization, where the objective function is a black-box of which only input-output pairs are known. In this work we address this relevant problem by providing a rigorous analysis of an Hessian estimator available in the literature, allowing it to be used as a provably accurate replacement of the true Hessian matrix. The Hessian estimator is randomized and incremental, and its computation requires only point function evaluations. We provide non-asymptotic convergence bounds on the estimation error and derive the minimum number of function queries needed to achieve a desired accuracy with arbitrarily high probability. In the second part of the paper we show a practical application of our results, introducing a novel optimization algorithm suitable for non-convex and black-box federated learning. The algorithm only requires clients to evaluate their local functions at certain input points, and builds a sufficiently accurate estimate of the global Hessian matrix in a distributed way. The algorithm exploits inexact cubic regularization to escape saddle points and guarantees convergence with optimal iteration complexity and high probability. Numerical results show that the proposed algorithm outperforms the existing zeroth-order federated algorithms in both convex and non-convex problems. Furthermore, we achieve similar performance to state-of-the-art algorithms for federated convex optimization that use exact gradients and Hessian matrices.
黑森矩阵传达了函数的曲率、频谱和偏导数等重要信息,在各种任务中都需要使用。然而,计算精确的黑森矩阵对于高维输入空间来说过于昂贵,而且在零阶优化中也是不可能的,因为在零阶优化中,目标函数是一个黑箱,只有输入输出对是已知的。在这项工作中,我们通过对文献中的一个黑森估计器进行严格分析,解决了这一相关问题,使其可以用作真正黑森矩阵的可证明精确替代物。黑森估计器是随机和增量的,其计算只需要点函数求值。我们提供了估计误差的非渐近收敛边界,并推导出了以任意高的概率达到预期精度所需的最小函数查询次数。在论文的第二部分,我们展示了我们成果的实际应用,介绍了一种适用于非凸和黑箱联合学习的新型优化算法。该算法只要求客户在特定输入点评估其局部函数,并以分布式方式建立足够精确的全局赫塞斯矩阵估计值。该算法利用非精确立方正则化来摆脱鞍点,并保证以最佳迭代复杂度和高概率收敛。数值结果表明,在凸问题和非凸问题上,所提出的算法都优于现有的零阶联合算法。此外,我们还取得了与使用精确梯度和黑森矩阵的最先进联合凸优化算法类似的性能。
{"title":"Novel Bounds for Incremental Hessian Estimation With Application to Zeroth-Order Federated Learning","authors":"Alessio Maritan;Luca Schenato;Subhrakanti Dey","doi":"10.1109/OJCSYS.2024.3388374","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3388374","url":null,"abstract":"The Hessian matrix conveys important information about the curvature, spectrum and partial derivatives of a function, and is required in a variety of tasks. However, computing the exact Hessian is prohibitively expensive for high-dimensional input spaces, and is just impossible in zeroth-order optimization, where the objective function is a black-box of which only input-output pairs are known. In this work we address this relevant problem by providing a rigorous analysis of an Hessian estimator available in the literature, allowing it to be used as a provably accurate replacement of the true Hessian matrix. The Hessian estimator is randomized and incremental, and its computation requires only point function evaluations. We provide non-asymptotic convergence bounds on the estimation error and derive the minimum number of function queries needed to achieve a desired accuracy with arbitrarily high probability. In the second part of the paper we show a practical application of our results, introducing a novel optimization algorithm suitable for non-convex and black-box federated learning. The algorithm only requires clients to evaluate their local functions at certain input points, and builds a sufficiently accurate estimate of the global Hessian matrix in a distributed way. The algorithm exploits inexact cubic regularization to escape saddle points and guarantees convergence with optimal iteration complexity and high probability. Numerical results show that the proposed algorithm outperforms the existing zeroth-order federated algorithms in both convex and non-convex problems. Furthermore, we achieve similar performance to state-of-the-art algorithms for federated convex optimization that use exact gradients and Hessian matrices.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"173-189"},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10499850","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Robust Output Control Barrier Functions From Safe Expert Demonstrations 从安全专家演示中学习鲁棒输出控制障碍函数
Pub Date : 2024-04-04 DOI: 10.1109/OJCSYS.2024.3385348
Lars Lindemann;Alexander Robey;Lejun Jiang;Satyajeet Das;Stephen Tu;Nikolai Matni
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We assume that a model of the system dynamics and a state estimator are available along with corresponding error bounds, e.g., estimated from data in practice. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety, as defined through controlled forward invariance of a safe set. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior, e.g., data collected from a human operator or an expert controller. When the parametrization of the ROCBF is linear, then we show that, under mild assumptions, the optimization problem is convex. Along with the optimization problem, we provide verifiable conditions in terms of the density of the data, smoothness of the system model and state estimator, and the size of the error bounds that guarantee validity of the obtained ROCBF. Towards obtaining a practical control algorithm, we propose an algorithmic implementation of our theoretical framework that accounts for assumptions made in our framework in practice. We validate our algorithm in the autonomous driving simulator CARLA and demonstrate how to learn safe control laws from simulated RGB camera images.
本文探讨了从专家示范的部分观测结果中学习安全输出反馈控制法的问题。我们假定系统动力学模型和状态估计器以及相应的误差边界(例如,从实际数据中估计的误差边界)是可用的。我们首先提出了鲁棒输出控制障碍函数(ROCBFs),作为保证安全性的一种手段,通过安全集的受控前向不变性来定义。然后,我们提出了一个优化问题,即从显示安全系统行为的专家演示(例如,从人类操作员或专家控制器收集的数据)中学习 ROCBFs。当 ROCBF 的参数化为线性时,我们将证明,在温和的假设条件下,优化问题是凸性的。除了优化问题,我们还提供了数据密度、系统模型和状态估计的平滑性以及误差边界大小等方面的可验证条件,以保证所获得的 ROCBF 的有效性。为了获得实用的控制算法,我们提出了理论框架的算法实现方法,在实践中考虑到了框架中的假设。我们在自动驾驶模拟器 CARLA 中验证了我们的算法,并演示了如何从模拟的 RGB 摄像头图像中学习安全控制法则。
{"title":"Learning Robust Output Control Barrier Functions From Safe Expert Demonstrations","authors":"Lars Lindemann;Alexander Robey;Lejun Jiang;Satyajeet Das;Stephen Tu;Nikolai Matni","doi":"10.1109/OJCSYS.2024.3385348","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3385348","url":null,"abstract":"This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We assume that a model of the system dynamics and a state estimator are available along with corresponding error bounds, e.g., estimated from data in practice. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety, as defined through controlled forward invariance of a safe set. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior, e.g., data collected from a human operator or an expert controller. When the parametrization of the ROCBF is linear, then we show that, under mild assumptions, the optimization problem is convex. Along with the optimization problem, we provide verifiable conditions in terms of the density of the data, smoothness of the system model and state estimator, and the size of the error bounds that guarantee validity of the obtained ROCBF. Towards obtaining a practical control algorithm, we propose an algorithmic implementation of our theoretical framework that accounts for assumptions made in our framework in practice. We validate our algorithm in the autonomous driving simulator CARLA and demonstrate how to learn safe control laws from simulated RGB camera images.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"158-172"},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10491341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
$mathcal {H}_{2}$- and $mathcal {H}_infty$-Optimal Model Predictive Controllers for Robust Legged Locomotion 用于稳健腿部运动的 $mathcal {H}_{2}$- 和 $mathcal {H}_infty$ 最佳模型预测控制器
Pub Date : 2024-03-31 DOI: 10.1109/OJCSYS.2024.3407999
Abhishek Pandala;Aaron D. Ames;Kaveh Akbari Hamed
This paper formally develops robust optimal predictive control solutions that can accommodate disturbances and stabilize periodic legged locomotion. To this end, we build upon existing optimization-based control paradigms, particularly quadratic programming (QP)-based model predictive controllers (MPCs). We present conditions under which the closed-loop reduced-order systems (i.e., template models) with MPC have the continuous differentiability property on an open neighborhood of gaits. We then linearize the resulting discrete-time, closed-loop nonlinear template system around the gait to obtain a linear time-varying (LTV) system. This periodic LTV system is further transformed into a linear system with a constant state-transition matrix using discrete-time Floquet transform. The system is then analyzed to accommodate parametric uncertainties and to synthesize robust optimal $mathcal {H}_{2}$ and $mathcal {H}_infty$ feedback controllers via linear matrix inequalities (LMIs). The paper then extends the theoretical results to the single rigid body (SRB) template dynamics and numerically verifies them. The proposed robust optimal predictive controllers are used in a layered control structure, where the optimal reduced-order trajectories are provided to a full-order nonlinear whole-body controller (WBC) for tracking at the low level. The developed layered controllers are numerically and experimentally validated for the robust locomotion of the A1 quadrupedal robot subject to various disturbances and uneven terrains. Our numerical results suggest that the $mathcal {H}_{2}$- and $mathcal {H}_infty$-optimal MPC controllers significantly improve the robust stability of the gaits compared to the normal MPC.
本文正式提出了稳健的最优预测控制解决方案,可适应干扰并稳定周期性的腿部运动。为此,我们借鉴了现有的基于优化的控制范式,特别是基于二次编程(QP)的模型预测控制器(MPC)。我们提出了使用 MPC 的闭环降阶系统(即模板模型)在步态的开放邻域上具有连续可微分特性的条件。然后,我们将所得到的围绕步态的离散时间闭环非线性模板系统线性化,得到一个线性时变(LTV)系统。利用离散时间 Floquet 变换,可将此周期性 LTV 系统进一步转换为具有恒定状态转换矩阵的线性系统。然后分析该系统以适应参数不确定性,并通过线性矩阵不等式(LMI)合成鲁棒的最优 $mathcal {H}_{2}$ 和 $mathcal {H}_infty$ 反馈控制器。然后,本文将理论结果扩展到单刚体(SRB)模板动力学,并对其进行了数值验证。所提出的鲁棒最优预测控制器被用于分层控制结构中,其中最优的降阶轨迹被提供给全阶非线性全身控制器 (WBC),用于低层次的跟踪。针对 A1 四足机器人在各种干扰和不平地形下的鲁棒运动,对所开发的分层控制器进行了数值和实验验证。数值结果表明,与普通 MPC 相比,$mathcal {H}_{2}$- 和 $mathcal {H}_infty$- 最佳 MPC 控制器显著提高了步态的鲁棒稳定性。
{"title":"$mathcal {H}_{2}$- and $mathcal {H}_infty$-Optimal Model Predictive Controllers for Robust Legged Locomotion","authors":"Abhishek Pandala;Aaron D. Ames;Kaveh Akbari Hamed","doi":"10.1109/OJCSYS.2024.3407999","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3407999","url":null,"abstract":"This paper formally develops robust optimal predictive control solutions that can accommodate disturbances and stabilize periodic legged locomotion. To this end, we build upon existing optimization-based control paradigms, particularly quadratic programming (QP)-based model predictive controllers (MPCs). We present conditions under which the closed-loop reduced-order systems (i.e., template models) with MPC have the continuous differentiability property on an open neighborhood of gaits. We then linearize the resulting discrete-time, closed-loop nonlinear template system around the gait to obtain a linear time-varying (LTV) system. This periodic LTV system is further transformed into a linear system with a constant state-transition matrix using discrete-time Floquet transform. The system is then analyzed to accommodate parametric uncertainties and to synthesize robust optimal \u0000<inline-formula><tex-math>$mathcal {H}_{2}$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$mathcal {H}_infty$</tex-math></inline-formula>\u0000 feedback controllers via linear matrix inequalities (LMIs). The paper then extends the theoretical results to the single rigid body (SRB) template dynamics and numerically verifies them. The proposed robust optimal predictive controllers are used in a layered control structure, where the optimal reduced-order trajectories are provided to a full-order nonlinear whole-body controller (WBC) for tracking at the low level. The developed layered controllers are numerically and experimentally validated for the robust locomotion of the A1 quadrupedal robot subject to various disturbances and uneven terrains. Our numerical results suggest that the \u0000<inline-formula><tex-math>$mathcal {H}_{2}$</tex-math></inline-formula>\u0000- and \u0000<inline-formula><tex-math>$mathcal {H}_infty$</tex-math></inline-formula>\u0000-optimal MPC controllers significantly improve the robust stability of the gaits compared to the normal MPC.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"225-238"},"PeriodicalIF":0.0,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10543084","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141315179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Solution to Optimal Motion Planning With Provable Safety and Convergence 具有可证明安全性和收敛性的最优运动规划高效解决方案
Pub Date : 2024-03-18 DOI: 10.1109/OJCSYS.2024.3378055
PANAGIOTIS ROUSSEAS;Charalampos Bechlioulis;Kostas Kyriakopoulos
An innovative solution to the optimal motion planning problem is presented in this work. A novel parametrized actor structure is proposed, which guarantees safe and convergent navigation by construction. Concurrently, an efficient scheme for optimizing a mixed state and energy cost function is formulated. The proposed method inherits the positive traits of continuous methods, while at the same time providing sub-optimal –but close to optimal– results significantly faster and in more complex workspaces than previous ones. The scheme is demonstrated to outperform established relevant methods, while at the same time being competitive w.r.t. execution time. Extensive simulations to validate the effectiveness of the method are presented, along with relevant technical proofs for safety and convergence.
本研究提出了优化运动规划问题的创新解决方案。本文提出了一种新颖的参数化角色结构,通过构造保证了安全和收敛的导航。同时,还提出了优化混合状态和能量成本函数的高效方案。所提出的方法继承了连续方法的积极特征,同时在更复杂的工作空间中,比以前的方法更快地提供次优但接近最优的结果。事实证明,该方案优于已有的相关方法,同时在执行时间上也具有竞争力。本文还介绍了大量仿真,以验证该方法的有效性,以及安全性和收敛性的相关技术证明。
{"title":"An Efficient Solution to Optimal Motion Planning With Provable Safety and Convergence","authors":"PANAGIOTIS ROUSSEAS;Charalampos Bechlioulis;Kostas Kyriakopoulos","doi":"10.1109/OJCSYS.2024.3378055","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3378055","url":null,"abstract":"An innovative solution to the optimal motion planning problem is presented in this work. A novel parametrized actor structure is proposed, which guarantees safe and convergent navigation by construction. Concurrently, an efficient scheme for optimizing a mixed state and energy cost function is formulated. The proposed method inherits the positive traits of continuous methods, while at the same time providing sub-optimal –but close to optimal– results significantly faster and in more complex workspaces than previous ones. The scheme is demonstrated to outperform established relevant methods, while at the same time being competitive w.r.t. execution time. Extensive simulations to validate the effectiveness of the method are presented, along with relevant technical proofs for safety and convergence.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"143-157"},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10473133","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of control systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1