首页 > 最新文献

IEEE open journal of control systems最新文献

英文 中文
Resiliency Through Collaboration in Heterogeneous Multi-Robot Systems 通过异构多机器人系统中的协作提高复原力
Pub Date : 2024-09-25 DOI: 10.1109/OJCSYS.2024.3467991
Alexander A. Nguyen;Faryar Jabbari;Magnus Egerstedt
This paper examines pairwise collaborations in heterogeneous multi-robot systems. In particular, we focus on how individual robots, with different functionalities and dynamics, can enhance their resilience by forming collaborative arrangements that result in new capabilities. Control barrier functions are utilized as a mechanism to encode the safe operating regions of individual robots, with the idea being that a robot may be able to operate in new regions that it could not traverse alone by working with other robots. We explore answers to three questions: “Why should robots collaborate?”, “When should robots collaborate?”, and “How can robots collaborate?” To that end, we introduce the safely reachable set – capturing the regions that individual robots can reach safely, either with or without help, while considering their initial states and dynamics. We then describe the conditions under which a help-providing robot and a help-receiving robot can engage in collaboration. Next, we describe the pairwise collaboration framework, modeled through hybrid automata, to show how collaborations can be structured within a heterogeneous multi-robot team. Finally, we present case studies that are conducted on a team of mobile robots.
本文探讨了异构多机器人系统中的成对协作。我们特别关注具有不同功能和动态特性的单个机器人如何通过形成协作安排来增强其复原力,从而产生新的能力。我们利用控制障碍函数作为一种机制,对单个机器人的安全操作区域进行编码,其目的是让机器人能够通过与其他机器人合作,在其无法单独穿越的新区域内进行操作。我们探讨了三个问题的答案:"机器人为什么要协作?"、"机器人何时协作?"以及 "机器人如何协作?"为此,我们引入了安全可达集--在考虑初始状态和动态的情况下,捕捉单个机器人在有或没有帮助的情况下能够安全到达的区域。然后,我们描述了提供帮助的机器人和接受帮助的机器人进行协作的条件。接下来,我们描述了通过混合自动机建模的配对协作框架,以展示如何在异构多机器人团队中构建协作。最后,我们介绍了在移动机器人团队中进行的案例研究。
{"title":"Resiliency Through Collaboration in Heterogeneous Multi-Robot Systems","authors":"Alexander A. Nguyen;Faryar Jabbari;Magnus Egerstedt","doi":"10.1109/OJCSYS.2024.3467991","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3467991","url":null,"abstract":"This paper examines pairwise collaborations in heterogeneous multi-robot systems. In particular, we focus on how individual robots, with different functionalities and dynamics, can enhance their resilience by forming collaborative arrangements that result in new capabilities. Control barrier functions are utilized as a mechanism to encode the safe operating regions of individual robots, with the idea being that a robot may be able to operate in new regions that it could not traverse alone by working with other robots. We explore answers to three questions: “Why should robots collaborate?”, “When should robots collaborate?”, and “How can robots collaborate?” To that end, we introduce the safely reachable set – capturing the regions that individual robots can reach safely, either with or without help, while considering their initial states and dynamics. We then describe the conditions under which a help-providing robot and a help-receiving robot can engage in collaboration. Next, we describe the pairwise collaboration framework, modeled through hybrid automata, to show how collaborations can be structured within a heterogeneous multi-robot team. Finally, we present case studies that are conducted on a team of mobile robots.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"461-471"},"PeriodicalIF":0.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10693575","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resilient Synchronization of Pulse-Coupled Oscillators Under Stealthy Attacks 隐形攻击下脉冲耦合振荡器的弹性同步
Pub Date : 2024-09-10 DOI: 10.1109/OJCSYS.2024.3458593
Yugo Iori;Hideaki Ishii
This paper studies a clock synchronization problem for wireless sensor networks employing pulse-based communication when some of the nodes are faulty or even adversarial. The objective is to design resilient distributed algorithms for the nonfaulty nodes to keep the influence of the malicious nodes minimal and to arrive at synchronization in a safe manner. Compared with conventional approaches, our algorithms are more capable in the sense that they are applicable to networks taking noncomplete graph structures. Our approach is to extend the class of mean subsequence reduced (MSR) algorithms from the area of multi-agent consensus. First, we provide a simple detection method to find malicious nodes that transmit pulses irregularly. Then, we demonstrate that in the presence of adversaries avoiding to be detected, the normal nodes can reach synchronization by ignoring suspicious pulses. Two extensions of this algorithm are further presented, which can operate under more adversarial attacks and also with relaxed conditions on the initial phases. We illustrate the effectiveness of our results by numerical examples.
本文研究了采用脉冲通信的无线传感器网络的时钟同步问题,当一些节点有故障甚至是恶意节点时,该问题就会出现。目的是为非故障节点设计弹性分布式算法,使恶意节点的影响降到最低,并以安全的方式实现同步。与传统方法相比,我们的算法更适用于非完整图结构的网络。我们的方法是扩展多代理共识领域的平均子序列缩减(MSR)算法。首先,我们提供了一种简单的检测方法,用于发现不规则传输脉冲的恶意节点。然后,我们证明了在对手避免被检测的情况下,正常节点可以通过忽略可疑脉冲达到同步。我们还进一步介绍了该算法的两个扩展,它们可以在更多的对抗性攻击下运行,而且初始阶段的条件也很宽松。我们通过数值示例来说明我们的结果的有效性。
{"title":"Resilient Synchronization of Pulse-Coupled Oscillators Under Stealthy Attacks","authors":"Yugo Iori;Hideaki Ishii","doi":"10.1109/OJCSYS.2024.3458593","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3458593","url":null,"abstract":"This paper studies a clock synchronization problem for wireless sensor networks employing pulse-based communication when some of the nodes are faulty or even adversarial. The objective is to design resilient distributed algorithms for the nonfaulty nodes to keep the influence of the malicious nodes minimal and to arrive at synchronization in a safe manner. Compared with conventional approaches, our algorithms are more capable in the sense that they are applicable to networks taking noncomplete graph structures. Our approach is to extend the class of mean subsequence reduced (MSR) algorithms from the area of multi-agent consensus. First, we provide a simple detection method to find malicious nodes that transmit pulses irregularly. Then, we demonstrate that in the presence of adversaries avoiding to be detected, the normal nodes can reach synchronization by ignoring suspicious pulses. Two extensions of this algorithm are further presented, which can operate under more adversarial attacks and also with relaxed conditions on the initial phases. We illustrate the effectiveness of our results by numerical examples.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"429-444"},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10675443","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pareto-Optimal Event-Based Scheme for Station and Inter-Station Control of Electric and Automated Buses 基于帕累托最优事件的电动和自动公交车车站和站间控制方案
Pub Date : 2024-09-09 DOI: 10.1109/OJCSYS.2024.3456633
Cecilia Pasquale;Simona Sacone;Silvia Siri;Antonella Ferrara
This paper considers electric and automated buses required to follow a given line and respect a given timetable in an inter-city road. The main goal of this work is to design a control scheme in order to optimally decide, in real time, the speed profile of the bus along the line, as well as the dwell and charging times at stops. This must be done by accounting for the traffic conditions encountered in the road and by jointly minimizing the deviations from the timetable and the lack of energy in the bus battery compared with a desired level. For the resulting multi-objective optimal control problem a Pareto front analysis is performed in the paper, also considering a real test case. Relying on the analysis outcomes, an event-based control scheme is proposed, which allows, every time a bus reaches a stop, to find the most suitable Pareto-optimal solution depending on a set of state and scenario conditions referred to the expected departure time at stops, the predicted traffic conditions in the road and the state of charge of the bus battery. The performance of the proposed control scheme is tested on a real case study, thoroughly discussed in the paper.
本文考虑的是在城市间道路上,电动公交车和自动驾驶公交车需要按照给定的线路行驶,并遵守给定的时间表。这项工作的主要目标是设计一种控制方案,以便实时优化公交车沿线的速度曲线以及在站点的停留和充电时间。要做到这一点,就必须考虑到道路上遇到的交通状况,同时尽量减少与时间表的偏差和公交车电池中能量不足的情况。对于由此产生的多目标最优控制问题,本文进行了帕累托前沿分析,并考虑了一个实际测试案例。根据分析结果,提出了一种基于事件的控制方案,该方案允许在公交车每次到达站点时,根据一组状态和场景条件(指站点的预计出发时间、道路的预测交通状况以及公交车电池的充电状态)找到最合适的帕累托最优解。本文在一个实际案例研究中对所提出的控制方案的性能进行了测试,并进行了详尽的讨论。
{"title":"Pareto-Optimal Event-Based Scheme for Station and Inter-Station Control of Electric and Automated Buses","authors":"Cecilia Pasquale;Simona Sacone;Silvia Siri;Antonella Ferrara","doi":"10.1109/OJCSYS.2024.3456633","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3456633","url":null,"abstract":"This paper considers electric and automated buses required to follow a given line and respect a given timetable in an inter-city road. The main goal of this work is to design a control scheme in order to optimally decide, in real time, the speed profile of the bus along the line, as well as the dwell and charging times at stops. This must be done by accounting for the traffic conditions encountered in the road and by jointly minimizing the deviations from the timetable and the lack of energy in the bus battery compared with a desired level. For the resulting multi-objective optimal control problem a Pareto front analysis is performed in the paper, also considering a real test case. Relying on the analysis outcomes, an event-based control scheme is proposed, which allows, every time a bus reaches a stop, to find the most suitable Pareto-optimal solution depending on a set of state and scenario conditions referred to the expected departure time at stops, the predicted traffic conditions in the road and the state of charge of the bus battery. The performance of the proposed control scheme is tested on a real case study, thoroughly discussed in the paper.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"445-460"},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669750","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142447138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Control-Theoretical Zero-Knowledge Proof Scheme for Networked Control Systems 网络控制系统的控制理论零知识证明方案
Pub Date : 2024-09-06 DOI: 10.1109/OJCSYS.2024.3455899
Camilla Fioravanti;Christoforos N. Hadjicostis;Gabriele Oliva
Networked Control Systems (NCS) are pivotal for sectors like industrial automation, autonomous vehicles, and smart grids. However, merging communication networks with control loops brings complexities and security vulnerabilities, necessitating strong protection and authentication measures. This paper introduces an innovative Zero-Knowledge Proof (ZKP) scheme tailored for NCSs, enabling a networked controller to prove its knowledge of the dynamical model and its ability to control a discrete-time linear time-invariant (LTI) system to a sensor, without revealing the model. This verification is done through the controller's capacity to produce suitable control signals in response to the sensor's output demands. The completeness, soundness, and zero-knowledge properties of the proposed approach are demonstrated. The scheme is subsequently extended by considering the presence of delays and output noise. Additionally, a dual scenario where the sensor proves its model knowledge to the controller is explored, enhancing the method's versatility. Effectiveness is shown through numerical simulations and a case study on distributed agreement in multi-agent systems.
联网控制系统(NCS)对于工业自动化、自动驾驶汽车和智能电网等领域至关重要。然而,通信网络与控制回路的融合带来了复杂性和安全漏洞,因此需要强有力的保护和验证措施。本文介绍了一种为 NCS 量身定制的创新型零知识证明(ZKP)方案,使联网控制器能够在不泄露模型的情况下,向传感器证明其对动态模型的了解以及控制离散时间线性时不变(LTI)系统的能力。这种验证是通过控制器根据传感器的输出要求产生适当控制信号的能力来完成的。所提方法的完整性、合理性和零知识属性均已得到证明。随后,考虑到延迟和输出噪声的存在,对该方案进行了扩展。此外,还探讨了传感器向控制器证明其模型知识的双重情况,从而增强了该方法的通用性。通过数值模拟和多代理系统分布式协议案例研究,证明了该方法的有效性。
{"title":"A Control-Theoretical Zero-Knowledge Proof Scheme for Networked Control Systems","authors":"Camilla Fioravanti;Christoforos N. Hadjicostis;Gabriele Oliva","doi":"10.1109/OJCSYS.2024.3455899","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3455899","url":null,"abstract":"Networked Control Systems (NCS) are pivotal for sectors like industrial automation, autonomous vehicles, and smart grids. However, merging communication networks with control loops brings complexities and security vulnerabilities, necessitating strong protection and authentication measures. This paper introduces an innovative Zero-Knowledge Proof (ZKP) scheme tailored for NCSs, enabling a networked controller to prove its knowledge of the dynamical model and its ability to control a discrete-time linear time-invariant (LTI) system to a sensor, without revealing the model. This verification is done through the controller's capacity to produce suitable control signals in response to the sensor's output demands. The completeness, soundness, and zero-knowledge properties of the proposed approach are demonstrated. The scheme is subsequently extended by considering the presence of delays and output noise. Additionally, a dual scenario where the sensor proves its model knowledge to the controller is explored, enhancing the method's versatility. Effectiveness is shown through numerical simulations and a case study on distributed agreement in multi-agent systems.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"416-428"},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669168","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Control of Linear-Threshold Brain Networks via Reservoir Computing 通过水库计算控制线性阈值大脑网络
Pub Date : 2024-08-29 DOI: 10.1109/OJCSYS.2024.3451889
Michael McCreesh;Jorge Cortés
Learning is a key function in the brain to be able to achieve the activity patterns required to perform various activities. While specific behaviors are determined by activity in localized regions, the interconnections throughout the entire brain play a key role in enabling its ability to exhibit desired activity. To mimic this setup, this paper examines the use of reservoir computing to control a linear-threshold network brain model to a desired trajectory. We first formally design open- and closed-loop controllers that achieve reference tracking under suitable conditions on the synaptic connectivity. Given the impracticality of evaluating closed-form control signals, particularly with growing network complexity, we provide a framework where a reservoir of a larger size than the network is trained to drive the activity to the desired pattern. We illustrate the versatility of this setup in two applications: selective recruitment and inhibition of neuronal populations for goal-driven selective attention, and network intervention for the prevention of epileptic seizures.
学习是大脑实现各种活动所需的活动模式的关键功能。虽然特定行为是由局部区域的活动决定的,但整个大脑的相互连接在使大脑表现出所需活动的能力方面起着关键作用。为了模拟这种设置,本文研究了如何利用水库计算来控制线性阈值网络大脑模型,使其达到所需的轨迹。我们首先正式设计了开环和闭环控制器,可在突触连通性的适当条件下实现参考跟踪。鉴于评估闭式控制信号不切实际,特别是随着网络复杂性的增加,我们提供了一个框架,即训练一个比网络规模更大的蓄水池,以驱动活动达到所需的模式。我们在两个应用中说明了这种设置的多功能性:对神经元群进行选择性招募和抑制,以实现目标驱动的选择性注意;以及对网络进行干预,以预防癫痫发作。
{"title":"Control of Linear-Threshold Brain Networks via Reservoir Computing","authors":"Michael McCreesh;Jorge Cortés","doi":"10.1109/OJCSYS.2024.3451889","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3451889","url":null,"abstract":"Learning is a key function in the brain to be able to achieve the activity patterns required to perform various activities. While specific behaviors are determined by activity in localized regions, the interconnections throughout the entire brain play a key role in enabling its ability to exhibit desired activity. To mimic this setup, this paper examines the use of reservoir computing to control a linear-threshold network brain model to a desired trajectory. We first formally design open- and closed-loop controllers that achieve reference tracking under suitable conditions on the synaptic connectivity. Given the impracticality of evaluating closed-form control signals, particularly with growing network complexity, we provide a framework where a reservoir of a larger size than the network is trained to drive the activity to the desired pattern. We illustrate the versatility of this setup in two applications: selective recruitment and inhibition of neuronal populations for goal-driven selective attention, and network intervention for the prevention of epileptic seizures.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"325-341"},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10659224","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142246461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey 强化学习中的汉密尔顿-雅各比可达性:调查
Pub Date : 2024-08-23 DOI: 10.1109/OJCSYS.2024.3449138
Milan Ganai;Sicun Gao;Sylvia L. Herbert
Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was restricted to verifying low-dimensional dynamical systems primarily because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. In recent years, a litany of proposed methods addresses this limitation by computing the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.
最近有文献提出了既能学习高性能控制策略,又能保证安全的方法。合成汉密尔顿-雅各比(HJ)可达集已成为验证复杂高维系统安全性和监督基于强化学习的控制策略训练的有效工具。以前,HJ可达性仅限于验证低维动态系统,主要是因为它所依赖的动态编程方法的计算复杂度会随着系统状态数的增加而呈指数增长。近年来,为解决这一局限性,人们提出了一系列方法,即在计算可达性值函数的同时学习控制策略,以扩展 HJ可达性分析,同时仍能保持对真实可达集的可靠估计。这些 HJ 可及性近似值可用于提高所学控制策略的安全性甚至奖励性能,并能解决具有挑战性的任务,如具有动态障碍物和/或基于激光雷达或视觉观测的任务。在这篇调查报告中,我们回顾了强化学习中 HJ可达性估计领域的最新进展,这些进展将为进一步研究高维系统的可靠性奠定基础。
{"title":"Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey","authors":"Milan Ganai;Sicun Gao;Sylvia L. Herbert","doi":"10.1109/OJCSYS.2024.3449138","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3449138","url":null,"abstract":"Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was restricted to verifying low-dimensional dynamical systems primarily because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. In recent years, a litany of proposed methods addresses this limitation by computing the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"310-324"},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10645063","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142246525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes 稳定的逆强化学习:来自控制 Lyapunov 景观的策略
Pub Date : 2024-08-21 DOI: 10.1109/OJCSYS.2024.3447464
SAMUEL TESFAZGI;Leonhard Sprandl;Armin Lederer;Sandra Hirche
Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.
从专家示范中学习,以灵活地为具有复杂行为的自主系统编程,或预测代理的行为,是一种强大的工具,尤其是在协作控制环境中。解决这一问题的常用方法是反强化学习(IRL),即假定被观察的代理(如人类演示者)的行为符合内在成本函数的最优化,该成本函数反映了代理的意图并为其控制行动提供信息。虽然该框架具有很强的表现力,但推断出的控制策略通常缺乏收敛性保证,而收敛性保证对于在现实世界中安全部署至关重要。因此,我们提出了一种新颖的、经过稳定性认证的 IRL 方法,将成本函数推理问题重新表述为从演示数据中学习控制 Lyapunov 函数 (CLF)。此外,我们还利用相关控制策略的闭式表达式,通过观察诱导动力学的吸引子景观,高效地搜索 CLF 空间。为了构建反向最优 CLF,我们使用了平方和法,并提出了一个凸优化问题。我们对 CLF 所提供的最优属性进行了理论分析,并使用模拟数据和真实世界中人类生成的数据对我们的方法进行了评估。
{"title":"Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes","authors":"SAMUEL TESFAZGI;Leonhard Sprandl;Armin Lederer;Sandra Hirche","doi":"10.1109/OJCSYS.2024.3447464","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3447464","url":null,"abstract":"Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"358-374"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Boost the Performance of Stable Nonlinear Systems 通过学习提升稳定非线性系统的性能
Pub Date : 2024-08-12 DOI: 10.1109/OJCSYS.2024.3441768
Luca Furieri;Clara Lucía Galimberti;Giancarlo Ferrari-Trecate
The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over specific classes of deep neural network performance-boosting controllers for stable nonlinear systems; crucially, we guarantee $mathcal {L}_{p}$ closed-loop stability even if optimization is halted prematurely. When the ground-truth dynamics are uncertain, we learn over robustly stabilizing control policies. Our robustness result is tight, in the sense that all stabilizing policies are recovered as the $mathcal {L}_{p}$ -gain of the model mismatch operator is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.
安全关键型控制系统的规模和复杂性不断增加,这凸显了发展当前控制架构的必要性,其目标是通过最先进的优化和机器学习算法实现无与伦比的性能。然而,在利用数据驱动和深度学习方法提高非线性控制系统性能的同时保持闭环稳定性是一项尚未解决的重要挑战。在本文中,我们将在保证闭环稳定性的前提下解决性能提升问题。具体来说,我们在非线性系统的内部模型控制(IMC)原理和最先进的无约束优化方法之间建立了协同作用,以学习稳定的动力学。我们的方法可以学习特定类别的深度神经网络性能提升控制器,用于稳定的非线性系统;重要的是,即使优化过早停止,我们也能保证 $mathcal {L}_{p}$ 闭环稳定性。当地面真实动态不确定时,我们会学习鲁棒稳定控制策略。我们的鲁棒性结果是严密的,即随着模型失配算子的 $mathcal {L}_{p}$ -gain 降为零,所有稳定策略都会恢复。我们讨论了所提控制方案(包括分布式方案)的实施细节以及相应的优化程序,并通过几个数值实验展示了自由塑造成本函数的潜力。
{"title":"Learning to Boost the Performance of Stable Nonlinear Systems","authors":"Luca Furieri;Clara Lucía Galimberti;Giancarlo Ferrari-Trecate","doi":"10.1109/OJCSYS.2024.3441768","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3441768","url":null,"abstract":"The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over specific classes of deep neural network performance-boosting controllers for stable nonlinear systems; crucially, we guarantee \u0000<inline-formula><tex-math>$mathcal {L}_{p}$</tex-math></inline-formula>\u0000 closed-loop stability even if optimization is halted prematurely. When the ground-truth dynamics are uncertain, we learn over robustly stabilizing control policies. Our robustness result is tight, in the sense that all stabilizing policies are recovered as the \u0000<inline-formula><tex-math>$mathcal {L}_{p}$</tex-math></inline-formula>\u0000 -gain of the model mismatch operator is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"342-357"},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10633771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributionally Robust Policy and Lyapunov-Certificate Learning 分布稳健政策与 Lyapunov 证书学习
Pub Date : 2024-08-07 DOI: 10.1109/OJCSYS.2024.3440051
Kehan Long;Jorge Cortés;Nikolay Atanasov
This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation. Open-source implementations of the examples are available at https://github.com/KehanLong/DR_Stabilizing_Policy.
本文介绍了为模型不确定性下的控制系统合成分布式鲁棒稳定神经控制器和凭手机验证码领取彩金的新方法。在为不确定系统设计具有稳定性保证的控制器时,一个关键挑战是在线部署期间如何准确确定和适应模型参数不确定性的变化。我们采用新颖的分布稳健型 Lyapunov 导数机会约束来解决这一问题,确保 Lyapunov 证书单调递减。为了避免处理概率度量空间所涉及的计算复杂性,我们以确定性凸约束的形式确定了一个充分条件,确保满足 Lyapunov 导数约束。我们将这一条件整合到损失函数中,用于训练基于神经网络的控制器,结果表明,对于由此产生的闭环系统,即使在分布外(OoD)模型不确定的情况下,其平衡的全局渐近稳定性也能以很高的置信度得到验证。为了证明所提方法的功效和效率,我们在两个模拟控制问题中将其与不确定基线方法和几种强化学习方法进行了比较。示例的开源实现可在 https://github.com/KehanLong/DR_Stabilizing_Policy 上获取。
{"title":"Distributionally Robust Policy and Lyapunov-Certificate Learning","authors":"Kehan Long;Jorge Cortés;Nikolay Atanasov","doi":"10.1109/OJCSYS.2024.3440051","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3440051","url":null,"abstract":"This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation. Open-source implementations of the examples are available at \u0000<uri>https://github.com/KehanLong/DR_Stabilizing_Policy</uri>\u0000.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"375-388"},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10629071","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global Multi-Phase Path Planning Through High-Level Reinforcement Learning 通过高级强化学习进行全球多阶段路径规划
Pub Date : 2024-07-29 DOI: 10.1109/OJCSYS.2024.3435080
Babak Salamat;Sebastian-Sven Olzem;Gerhard Elsbacher;Andrea M. Tonello
In this paper, we introduce the Global Multi-Phase Path Planning ($GMP^{3}$) algorithm in planner problems, which computes fast and feasible trajectories in environments with obstacles, considering physical and kinematic constraints. Our approach utilizes a Markov Decision Process (MDP) framework and high-level reinforcement learning techniques to ensure trajectory smoothness, continuity, and compliance with constraints. Through extensive simulations, we demonstrate the algorithm's effectiveness and efficiency across various scenarios. We highlight existing path planning challenges, particularly in integrating dynamic adaptability and computational efficiency. The results validate our method's convergence guarantees using Lyapunov’s stability theorem and underscore its computational advantages.
在本文中,我们介绍了规划器问题中的全局多阶段路径规划($GMP^{3}$)算法,它可以在有障碍物的环境中计算快速可行的轨迹,同时考虑物理和运动学约束。我们的方法利用马尔可夫决策过程(MDP)框架和高级强化学习技术来确保轨迹的平滑性、连续性并符合约束条件。通过大量模拟,我们展示了该算法在各种场景下的有效性和效率。我们强调了现有路径规划所面临的挑战,尤其是在动态适应性和计算效率的整合方面。结果利用 Lyapunov 稳定性定理验证了我们方法的收敛性保证,并强调了其计算优势。
{"title":"Global Multi-Phase Path Planning Through High-Level Reinforcement Learning","authors":"Babak Salamat;Sebastian-Sven Olzem;Gerhard Elsbacher;Andrea M. Tonello","doi":"10.1109/OJCSYS.2024.3435080","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3435080","url":null,"abstract":"In this paper, we introduce the \u0000<italic>Global Multi-Phase Path Planning</i>\u0000 (\u0000<monospace><inline-formula><tex-math>$GMP^{3}$</tex-math></inline-formula></monospace>\u0000) algorithm in planner problems, which computes fast and feasible trajectories in environments with obstacles, considering physical and kinematic constraints. Our approach utilizes a Markov Decision Process (MDP) framework and high-level reinforcement learning techniques to ensure trajectory smoothness, continuity, and compliance with constraints. Through extensive simulations, we demonstrate the algorithm's effectiveness and efficiency across various scenarios. We highlight existing path planning challenges, particularly in integrating dynamic adaptability and computational efficiency. The results validate our method's convergence guarantees using Lyapunov’s stability theorem and underscore its computational advantages.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"405-415"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10613437","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of control systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1