首页 > 最新文献

IEEE open journal of control systems最新文献

英文 中文
Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes 稳定的逆强化学习:来自控制 Lyapunov 景观的策略
Pub Date : 2024-08-21 DOI: 10.1109/OJCSYS.2024.3447464
SAMUEL TESFAZGI;Leonhard Sprandl;Armin Lederer;Sandra Hirche
Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.
从专家示范中学习,以灵活地为具有复杂行为的自主系统编程,或预测代理的行为,是一种强大的工具,尤其是在协作控制环境中。解决这一问题的常用方法是反强化学习(IRL),即假定被观察的代理(如人类演示者)的行为符合内在成本函数的最优化,该成本函数反映了代理的意图并为其控制行动提供信息。虽然该框架具有很强的表现力,但推断出的控制策略通常缺乏收敛性保证,而收敛性保证对于在现实世界中安全部署至关重要。因此,我们提出了一种新颖的、经过稳定性认证的 IRL 方法,将成本函数推理问题重新表述为从演示数据中学习控制 Lyapunov 函数 (CLF)。此外,我们还利用相关控制策略的闭式表达式,通过观察诱导动力学的吸引子景观,高效地搜索 CLF 空间。为了构建反向最优 CLF,我们使用了平方和法,并提出了一个凸优化问题。我们对 CLF 所提供的最优属性进行了理论分析,并使用模拟数据和真实世界中人类生成的数据对我们的方法进行了评估。
{"title":"Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes","authors":"SAMUEL TESFAZGI;Leonhard Sprandl;Armin Lederer;Sandra Hirche","doi":"10.1109/OJCSYS.2024.3447464","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3447464","url":null,"abstract":"Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"358-374"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Boost the Performance of Stable Nonlinear Systems 通过学习提升稳定非线性系统的性能
Pub Date : 2024-08-12 DOI: 10.1109/OJCSYS.2024.3441768
Luca Furieri;Clara Lucía Galimberti;Giancarlo Ferrari-Trecate
The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over specific classes of deep neural network performance-boosting controllers for stable nonlinear systems; crucially, we guarantee $mathcal {L}_{p}$ closed-loop stability even if optimization is halted prematurely. When the ground-truth dynamics are uncertain, we learn over robustly stabilizing control policies. Our robustness result is tight, in the sense that all stabilizing policies are recovered as the $mathcal {L}_{p}$ -gain of the model mismatch operator is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.
安全关键型控制系统的规模和复杂性不断增加,这凸显了发展当前控制架构的必要性,其目标是通过最先进的优化和机器学习算法实现无与伦比的性能。然而,在利用数据驱动和深度学习方法提高非线性控制系统性能的同时保持闭环稳定性是一项尚未解决的重要挑战。在本文中,我们将在保证闭环稳定性的前提下解决性能提升问题。具体来说,我们在非线性系统的内部模型控制(IMC)原理和最先进的无约束优化方法之间建立了协同作用,以学习稳定的动力学。我们的方法可以学习特定类别的深度神经网络性能提升控制器,用于稳定的非线性系统;重要的是,即使优化过早停止,我们也能保证 $mathcal {L}_{p}$ 闭环稳定性。当地面真实动态不确定时,我们会学习鲁棒稳定控制策略。我们的鲁棒性结果是严密的,即随着模型失配算子的 $mathcal {L}_{p}$ -gain 降为零,所有稳定策略都会恢复。我们讨论了所提控制方案(包括分布式方案)的实施细节以及相应的优化程序,并通过几个数值实验展示了自由塑造成本函数的潜力。
{"title":"Learning to Boost the Performance of Stable Nonlinear Systems","authors":"Luca Furieri;Clara Lucía Galimberti;Giancarlo Ferrari-Trecate","doi":"10.1109/OJCSYS.2024.3441768","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3441768","url":null,"abstract":"The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over specific classes of deep neural network performance-boosting controllers for stable nonlinear systems; crucially, we guarantee \u0000<inline-formula><tex-math>$mathcal {L}_{p}$</tex-math></inline-formula>\u0000 closed-loop stability even if optimization is halted prematurely. When the ground-truth dynamics are uncertain, we learn over robustly stabilizing control policies. Our robustness result is tight, in the sense that all stabilizing policies are recovered as the \u0000<inline-formula><tex-math>$mathcal {L}_{p}$</tex-math></inline-formula>\u0000 -gain of the model mismatch operator is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"342-357"},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10633771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributionally Robust Policy and Lyapunov-Certificate Learning 分布稳健政策与 Lyapunov 证书学习
Pub Date : 2024-08-07 DOI: 10.1109/OJCSYS.2024.3440051
Kehan Long;Jorge Cortés;Nikolay Atanasov
This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation. Open-source implementations of the examples are available at https://github.com/KehanLong/DR_Stabilizing_Policy.
本文介绍了为模型不确定性下的控制系统合成分布式鲁棒稳定神经控制器和凭手机验证码领取彩金的新方法。在为不确定系统设计具有稳定性保证的控制器时,一个关键挑战是在线部署期间如何准确确定和适应模型参数不确定性的变化。我们采用新颖的分布稳健型 Lyapunov 导数机会约束来解决这一问题,确保 Lyapunov 证书单调递减。为了避免处理概率度量空间所涉及的计算复杂性,我们以确定性凸约束的形式确定了一个充分条件,确保满足 Lyapunov 导数约束。我们将这一条件整合到损失函数中,用于训练基于神经网络的控制器,结果表明,对于由此产生的闭环系统,即使在分布外(OoD)模型不确定的情况下,其平衡的全局渐近稳定性也能以很高的置信度得到验证。为了证明所提方法的功效和效率,我们在两个模拟控制问题中将其与不确定基线方法和几种强化学习方法进行了比较。示例的开源实现可在 https://github.com/KehanLong/DR_Stabilizing_Policy 上获取。
{"title":"Distributionally Robust Policy and Lyapunov-Certificate Learning","authors":"Kehan Long;Jorge Cortés;Nikolay Atanasov","doi":"10.1109/OJCSYS.2024.3440051","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3440051","url":null,"abstract":"This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation. Open-source implementations of the examples are available at \u0000<uri>https://github.com/KehanLong/DR_Stabilizing_Policy</uri>\u0000.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"375-388"},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10629071","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global Multi-Phase Path Planning Through High-Level Reinforcement Learning 通过高级强化学习进行全球多阶段路径规划
Pub Date : 2024-07-29 DOI: 10.1109/OJCSYS.2024.3435080
Babak Salamat;Sebastian-Sven Olzem;Gerhard Elsbacher;Andrea M. Tonello
In this paper, we introduce the Global Multi-Phase Path Planning ($GMP^{3}$) algorithm in planner problems, which computes fast and feasible trajectories in environments with obstacles, considering physical and kinematic constraints. Our approach utilizes a Markov Decision Process (MDP) framework and high-level reinforcement learning techniques to ensure trajectory smoothness, continuity, and compliance with constraints. Through extensive simulations, we demonstrate the algorithm's effectiveness and efficiency across various scenarios. We highlight existing path planning challenges, particularly in integrating dynamic adaptability and computational efficiency. The results validate our method's convergence guarantees using Lyapunov’s stability theorem and underscore its computational advantages.
在本文中,我们介绍了规划器问题中的全局多阶段路径规划($GMP^{3}$)算法,它可以在有障碍物的环境中计算快速可行的轨迹,同时考虑物理和运动学约束。我们的方法利用马尔可夫决策过程(MDP)框架和高级强化学习技术来确保轨迹的平滑性、连续性并符合约束条件。通过大量模拟,我们展示了该算法在各种场景下的有效性和效率。我们强调了现有路径规划所面临的挑战,尤其是在动态适应性和计算效率的整合方面。结果利用 Lyapunov 稳定性定理验证了我们方法的收敛性保证,并强调了其计算优势。
{"title":"Global Multi-Phase Path Planning Through High-Level Reinforcement Learning","authors":"Babak Salamat;Sebastian-Sven Olzem;Gerhard Elsbacher;Andrea M. Tonello","doi":"10.1109/OJCSYS.2024.3435080","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3435080","url":null,"abstract":"In this paper, we introduce the \u0000<italic>Global Multi-Phase Path Planning</i>\u0000 (\u0000<monospace><inline-formula><tex-math>$GMP^{3}$</tex-math></inline-formula></monospace>\u0000) algorithm in planner problems, which computes fast and feasible trajectories in environments with obstacles, considering physical and kinematic constraints. Our approach utilizes a Markov Decision Process (MDP) framework and high-level reinforcement learning techniques to ensure trajectory smoothness, continuity, and compliance with constraints. Through extensive simulations, we demonstrate the algorithm's effectiveness and efficiency across various scenarios. We highlight existing path planning challenges, particularly in integrating dynamic adaptability and computational efficiency. The results validate our method's convergence guarantees using Lyapunov’s stability theorem and underscore its computational advantages.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"405-415"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10613437","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risk-Aware Stochastic MPC for Chance-Constrained Linear Systems 针对机会受限线性系统的风险意识随机 MPC
Pub Date : 2024-07-01 DOI: 10.1109/OJCSYS.2024.3421372
Pouria Tooranjipour;Bahare Kiumarsi;Hamidreza Modares
This paper presents a fully risk-aware model predictive control (MPC) framework for chance-constrained discrete-time linear control systems with process noise. Conditional value-at-risk (CVaR) as a popular coherent risk measure is incorporated in both the constraints and the cost function of the MPC framework. This allows the system to navigate the entire spectrum of risk assessments, from worst-case to risk-neutral scenarios, ensuring both constraint satisfaction and performance optimization in stochastic environments. The recursive feasibility and risk-aware exponential stability of the resulting risk-aware MPC are demonstrated through rigorous theoretical analysis by considering the disturbance feedback policy parameterization. In the end, two numerical examples are given to elucidate the efficacy of the proposed method.
本文针对具有过程噪声的机会约束离散时间线性控制系统,提出了一种完全风险感知的模型预测控制(MPC)框架。条件风险值(CVaR)作为一种流行的连贯风险度量,被纳入了 MPC 框架的约束条件和成本函数中。这样,系统就能驾驭从最坏情况到风险中性情况的整个风险评估范围,确保在随机环境中既能满足约束条件,又能优化性能。通过对扰动反馈策略参数化的严格理论分析,证明了由此产生的风险感知 MPC 的递归可行性和风险感知指数稳定性。最后,给出了两个数值示例,以阐明所提方法的有效性。
{"title":"Risk-Aware Stochastic MPC for Chance-Constrained Linear Systems","authors":"Pouria Tooranjipour;Bahare Kiumarsi;Hamidreza Modares","doi":"10.1109/OJCSYS.2024.3421372","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3421372","url":null,"abstract":"This paper presents a fully risk-aware model predictive control (MPC) framework for chance-constrained discrete-time linear control systems with process noise. Conditional value-at-risk (CVaR) as a popular coherent risk measure is incorporated in both the constraints and the cost function of the MPC framework. This allows the system to navigate the entire spectrum of risk assessments, from worst-case to risk-neutral scenarios, ensuring both constraint satisfaction and performance optimization in stochastic environments. The recursive feasibility and risk-aware exponential stability of the resulting risk-aware MPC are demonstrated through rigorous theoretical analysis by considering the disturbance feedback policy parameterization. In the end, two numerical examples are given to elucidate the efficacy of the proposed method.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"282-294"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10578318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141631005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging the Turnpike Effect for Mean Field Games Numerics 利用匝道效应进行均值场游戏数值计算
Pub Date : 2024-06-26 DOI: 10.1109/OJCSYS.2024.3419642
René A. Carmona;Claire Zeng
Recently, a deep-learning algorithm referred to as Deep Galerkin Method (DGM), has gained a lot of attention among those trying to solve numerically Mean Field Games with finite horizon, even if the performance seems to be decreasing significantly with increasing horizon. On the other hand, it has been proven that some specific classes of Mean Field Games enjoy some form of the turnpike property identified over seven decades ago by economists. The gist of this phenomenon is a proof that the solution of an optimal control problem over a long time interval spends most of its time near the stationary solution of the ergodic version of the corresponding infinite horizon optimization problem. After reviewing the implementation of DGM for finite horizon Mean Field Games, we introduce a “turnpike-accelerated” version that incorporates the turnpike estimates in the loss function to be optimized, and we perform a comparative numerical analysis to show the advantages of this accelerated version over the baseline DGM algorithm. We demonstrate on some of the Mean Field Game models with local-couplings known to have the turnpike property, as well as a new class of linear-quadratic models for which we derive explicit turnpike estimates.
最近,一种被称为 "深度伽勒金方法(DGM)"的深度学习算法在试图数值求解有限视界均值场博弈的人群中获得了广泛关注,尽管其性能似乎随着视界的增加而显著下降。另一方面,有研究证明,某些特定类别的均值场博弈具有经济学家在七十多年前发现的某种形式的岔道特性。这一现象的要旨是证明了在一个较长的时间间隔内,最优控制问题的解大部分时间都在相应的无限视界优化问题的遍历版本的静态解附近。在回顾了有限视界均值场博弈的 DGM 实现之后,我们引入了 "岔道加速 "版本,该版本将岔道估计纳入了待优化的损失函数中,我们还进行了数值对比分析,以显示该加速版本相对于基准 DGM 算法的优势。我们在一些已知具有岔道特性的局部耦合平均场博弈模型以及一类新的线性二次模型上进行了演示,并得出了明确的岔道估计值。
{"title":"Leveraging the Turnpike Effect for Mean Field Games Numerics","authors":"René A. Carmona;Claire Zeng","doi":"10.1109/OJCSYS.2024.3419642","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3419642","url":null,"abstract":"Recently, a deep-learning algorithm referred to as Deep Galerkin Method (DGM), has gained a lot of attention among those trying to solve numerically Mean Field Games with finite horizon, even if the performance seems to be decreasing significantly with increasing horizon. On the other hand, it has been proven that some specific classes of Mean Field Games enjoy some form of the turnpike property identified over seven decades ago by economists. The gist of this phenomenon is a proof that the solution of an optimal control problem over a long time interval spends most of its time near the stationary solution of the ergodic version of the corresponding infinite horizon optimization problem. After reviewing the implementation of DGM for finite horizon Mean Field Games, we introduce a “turnpike-accelerated” version that incorporates the turnpike estimates in the loss function to be optimized, and we perform a comparative numerical analysis to show the advantages of this accelerated version over the baseline DGM algorithm. We demonstrate on some of the Mean Field Game models with local-couplings known to have the turnpike property, as well as a new class of linear-quadratic models for which we derive explicit turnpike estimates.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"389-404"},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10572276","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning 强化学习中同时学习控制策略和未知安全规范
Pub Date : 2024-06-24 DOI: 10.1109/OJCSYS.2024.3418306
Lunet Yifru;Ali Baheri
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints. A Python implementation of the algorithm can be found at https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git.
在过去的几十年里,强化学习(RL)已经在广泛的领域为决策带来了革命性的变化。然而,在现实世界场景中部署 RL 政策却面临着确保安全的严峻挑战。传统的安全 RL 方法主要侧重于将预定义的安全约束纳入策略学习过程。然而,这种对预定义安全约束的依赖在动态和不可预测的真实世界环境中造成了限制,因为在这种环境中,此类约束可能无法获得或无法充分适应。为了弥补这一缺陷,我们提出了一种新方法,它能同时学习安全的 RL 控制策略,并识别给定环境中的未知安全约束参数。以参数信号时序逻辑(pSTL)安全规范和一个小型初始标注数据集为初始,我们将该问题视为一个双层优化任务,利用孪生延迟深度确定性策略梯度(TD3)算法的拉格朗日变体将约束策略优化与贝叶斯优化巧妙地结合在一起,以优化给定 pSTL 安全规范的参数。通过综合案例研究实验,我们验证了这种方法在不同形式的环境约束下的有效性,并持续产生了具有高回报的安全 RL 政策。此外,我们的研究结果表明,我们成功地学习了 STL 安全约束参数,与真实的环境安全约束高度一致。我们模型的性能与拥有完整安全约束先验知识的理想场景非常接近,这表明它能够准确识别环境安全约束并学习符合这些约束的安全策略。该算法的 Python 实现可在 https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git 上找到。
{"title":"Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning","authors":"Lunet Yifru;Ali Baheri","doi":"10.1109/OJCSYS.2024.3418306","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3418306","url":null,"abstract":"Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints. A Python implementation of the algorithm can be found at \u0000<uri>https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git</uri>\u0000.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"266-281"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10569078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solving Decision-Dependent Games by Learning From Feedback 通过从反馈中学习来解决依赖决策的游戏
Pub Date : 2024-06-19 DOI: 10.1109/OJCSYS.2024.3416768
Killian Wood;Ahmed S. Zamzam;Emiliano Dall'Anese
This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic strongly-monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.
本文探讨了在强单调随机博弈背景下,当分布依赖性未知时,如何解决决策依赖分布的随机优化问题。本文提出了一种两阶段方法,即首先估计决策变量的分布依赖性,然后在估计的分布图上进行优化。论文提出了每个代理成本近似值的保证。此外,还开发并分析了一种基于随机梯度的算法,用于以分布式方式寻找纳什均衡。本文还利用真实世界的数据,对新型电动汽车充电市场模型进行了数值模拟。
{"title":"Solving Decision-Dependent Games by Learning From Feedback","authors":"Killian Wood;Ahmed S. Zamzam;Emiliano Dall'Anese","doi":"10.1109/OJCSYS.2024.3416768","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3416768","url":null,"abstract":"This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic strongly-monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"295-309"},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10564130","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation 通过不解决 OPF 算是解决了 OPF:DAE 控制理论与实时监管的代价
Pub Date : 2024-06-13 DOI: 10.1109/OJCSYS.2024.3414221
Muhammad Nadeem;Ahmad F. Taha
This paper presents a new approach to approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By i) forming the grid dynamics as a system of differential-algebraic equations (DAE) that naturally encode the non-convex OPF power flow constraints, ii) utilizing DAE-Lyapunov theory, and iii) designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing improvements in system transient stability, while always yielding approximate solutions of the ACOPF with no constraint violations. As the studied approach herein yields slightly more expensive realtime generator controls, the corresponding price of realtime control and regulation is examined. Cost comparisons with the traditional ACOPF are also showcased—all via case studies on standard power networks.
本文介绍了一种近似交流最佳功率流(ACOPF)的新方法。通过消除每几分钟求解一次 ACOPF 的需要,本文展示了如何利用实时反馈控制器来替代 ACOPF 及其变体。通过 i) 将电网动态形成一个自然编码非凸 OPF 功率流约束的微分代数方程 (DAE) 系统,ii) 利用 DAE-Lyapunov 理论,iii) 设计一个能捕捉实时不确定性同时又不感知不确定性的反馈控制器,本文提出的方法有望在无需求解 OPF 的情况下获得接近 OPF 的解决方案。所提出的控制器能实时响应可再生能源发电和负载的偏差,保证系统瞬态稳定性的改善,同时始终能得到不违反约束条件的 ACOPF 近似解。由于本文研究的方法产生的实时发电机控制成本略高,因此对实时控制和调节的相应价格进行了研究。此外,还通过标准电力网络的案例研究,展示了与传统 ACOPF 的成本比较。
{"title":"Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation","authors":"Muhammad Nadeem;Ahmad F. Taha","doi":"10.1109/OJCSYS.2024.3414221","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3414221","url":null,"abstract":"This paper presents a new approach to approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By \u0000<italic>i)</i>\u0000 forming the grid dynamics as a system of differential-algebraic equations (DAE) that naturally encode the non-convex OPF power flow constraints, \u0000<italic>ii)</i>\u0000 utilizing DAE-Lyapunov theory, and \u0000<italic>iii)</i>\u0000 designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing improvements in system transient stability, while always yielding approximate solutions of the ACOPF with no constraint violations. As the studied approach herein yields slightly more expensive realtime generator controls, the corresponding price of realtime control and regulation is examined. Cost comparisons with the traditional ACOPF are also showcased—all via case studies on standard power networks.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"253-265"},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10556752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regional PID Control of Switched Positive Systems With Multiple Equilibrium Points 具有多个平衡点的开关正系统的区域 PID 控制
Pub Date : 2024-04-18 DOI: 10.1109/OJCSYS.2024.3391001
Pei Zhang;Junfeng Zhang;Xuan Jia
This paper investigates the regional control problem of switched positive systems with multiple equilibrium points. A proportional-integral-derivative controller is designed by combining the output, the error between the state and the equilibrium point, and the difference of output. A cone is introduced to design the final stable region. Two classes of copositive Lyapunov functions are constructed to achieve the stability and regional stability of subsystems and the whole systems, respectively. Then, a novel class of observers with multiple equilibrium points is proposed using a matrix decomposition approach. The observer-based proportional-integral-derivative control problem is thus solved and all states are driven to the designed cone region under the designed controller. All conditions are formulated in the form of linear programming. The novelties of this paper lie in that: (i) A proportional-integral-derivative control framework is introduced for the considered systems, (ii) Luenberger observer is developed for the observer with multiple equilibrium points, and (iii) Copositive Lyapunov functions and linear programming are employed for the analysis and design of controller and observer. Finally, the effectiveness of the proposed design is verified via two examples.
本文研究了具有多个平衡点的开关正系统的区域控制问题。结合输出、状态与平衡点之间的误差以及输出差值,设计了一个比例-积分-派生控制器。引入了一个锥体来设计最终稳定区域。构建了两类共正 Lyapunov 函数,以分别实现子系统和整个系统的稳定性和区域稳定性。然后,利用矩阵分解方法提出了一类具有多个平衡点的新型观测器。从而解决了基于观测器的比例-积分-衍生控制问题,在设计的控制器下,所有状态都被驱动到设计的锥形区域。所有条件均以线性规划的形式提出。本文的新颖之处在于(i) 为所考虑的系统引入了比例-积分-衍生控制框架,(ii) 为具有多个平衡点的观测器开发了卢恩伯格观测器,(iii) 在控制器和观测器的分析和设计中采用了共正 Lyapunov 函数和线性规划。最后,通过两个实例验证了所提设计方案的有效性。
{"title":"Regional PID Control of Switched Positive Systems With Multiple Equilibrium Points","authors":"Pei Zhang;Junfeng Zhang;Xuan Jia","doi":"10.1109/OJCSYS.2024.3391001","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3391001","url":null,"abstract":"This paper investigates the regional control problem of switched positive systems with multiple equilibrium points. A proportional-integral-derivative controller is designed by combining the output, the error between the state and the equilibrium point, and the difference of output. A cone is introduced to design the final stable region. Two classes of copositive Lyapunov functions are constructed to achieve the stability and regional stability of subsystems and the whole systems, respectively. Then, a novel class of observers with multiple equilibrium points is proposed using a matrix decomposition approach. The observer-based proportional-integral-derivative control problem is thus solved and all states are driven to the designed cone region under the designed controller. All conditions are formulated in the form of linear programming. The novelties of this paper lie in that: (i) A proportional-integral-derivative control framework is introduced for the considered systems, (ii) Luenberger observer is developed for the observer with multiple equilibrium points, and (iii) Copositive Lyapunov functions and linear programming are employed for the analysis and design of controller and observer. Finally, the effectiveness of the proposed design is verified via two examples.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"190-201"},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10504945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of control systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1