IEEE open journal of control systems最新文献_第8页

Closed-Loop Kinematic and Indirect Force Control of a Cable-Driven Knee Exoskeleton: A Lyapunov-Based Switched Systems Approach 缆索驱动膝关节外骨架的闭环运动学和间接力控制：一种基于李雅普诺夫的切换系统方法

IEEE open journal of control systems

Pub Date : 2023-06-26 DOI: 10.1109/OJCSYS.2023.3289771

Chen-Hao Chang;Jonathan Casas;Victor H. Duenas

Lower-limb exoskeletons can aid restoring mobility in people with movement disorders. Cable-driven exoskeletons can offload their actuators away from the human body to reduce the weight imposed on the user and enable precise control of joints. However, ensuring limb coordination through bidirectional motion control of joints using cables raise the technical challenge of preventing the occurrence of undesired cable slackness or counteracting forces between cables. Thus, motivation exists to develop a control design framework that integrates both a joint control loop to ensure suitable limb tracking and a cable control loop to maintain cable tension properly. In this article, a two-layer control structure consisting of high and low-level controllers are developed to ensure a knee-joint exoskeleton system follows the desired joint trajectories and adjusts the cable tension, respectively. A repetitive learning controller is designed for the high-level knee joint tracking objective motivated by the periodic nature of the desired leg swings (i.e., to achieve knee flexion and extension). Low-level robust controllers are developed for a pair of cables, each actuated by an electric motor, to track target motor trajectories composed of motor kinematics and offset angles to mitigate cable slackness. The offset angles are computed using admittance models that exploit measurements of the cable tensions as inputs. Each electric motor switches its role between tracking the knee joint trajectory (i.e., the motor acts as the leader motor to achieve flexion or extension) and implementing the low-level controller (i.e., the motor acts as the follower motor to reduce slackness). Hence, at any time, one motor is the leader and the other is the follower. A Lyapunov-based stability analysis is developed for the high-level joint controller to ensure global asymptotic tracking and the low-level follower controller to guarantee global exponential tracking. The designed controllers are implemented during leg swing experiments in six able-bodied individuals while wearing the knee joint cable-driven exoskeleton. A comparison of the results obtained in two trials with and without using the admittance model (i.e., exploiting cable tension measurements) is presented. The experimental results indicate improved knee joint tracking performance, smaller control input magnitudes, and reduced cable slackness in the trial that leveraged cable tension feedback compared to the trial that did not exploit tension feedback.

下肢外骨骼可以帮助运动障碍患者恢复活动能力。电缆驱动的外骨骼可以将其致动器从人体上卸下，以减轻施加在用户身上的重量，并实现关节的精确控制。然而，通过使用电缆对关节进行双向运动控制来确保肢体协调，这就提出了防止电缆松弛或电缆之间产生反作用力的技术挑战。因此，有动机开发一种控制设计框架，该框架集成了关节控制回路以确保合适的肢体跟踪和电缆控制回路以适当地保持电缆张力。在本文中，开发了一种由高级和低级控制器组成的两层控制结构，以确保膝关节外骨骼系统分别遵循所需的关节轨迹并调整绳索张力。重复学习控制器是为高级膝关节跟踪目标设计的，其动机是所需腿部摆动的周期性性质（即，实现膝盖屈曲和伸展）。为一对电缆开发了低级别鲁棒控制器，每条电缆由电机驱动，以跟踪由电机运动学和偏移角组成的目标电机轨迹，从而减轻电缆松弛。偏移角是使用导纳模型计算的，该模型利用电缆张力的测量值作为输入。每个电动机在跟踪膝关节轨迹（即，电动机充当引导电动机以实现屈曲或伸展）和实现低级别控制器（即，电机充当跟随电动机以减少松弛）之间切换其作用。因此，在任何时候，一个电机是领导者，另一个是跟随者。对保证全局渐近跟踪的高级联合控制器和保证全局指数跟踪的低级跟随器控制器进行了基于李雅普诺夫的稳定性分析。所设计的控制器是在六名穿着膝关节电缆驱动外骨骼的健全人的腿部摆动实验中实现的。对使用导纳模型和不使用导纳模型（即利用电缆张力测量）的两次试验结果进行了比较。实验结果表明，与不利用张力反馈的试验相比，利用缆索张力反馈的实验提高了膝关节跟踪性能，减小了控制输入幅度，并降低了缆索松弛度。

{"title":"Closed-Loop Kinematic and Indirect Force Control of a Cable-Driven Knee Exoskeleton: A Lyapunov-Based Switched Systems Approach","authors":"Chen-Hao Chang;Jonathan Casas;Victor H. Duenas","doi":"10.1109/OJCSYS.2023.3289771","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3289771","url":null,"abstract":"Lower-limb exoskeletons can aid restoring mobility in people with movement disorders. Cable-driven exoskeletons can offload their actuators away from the human body to reduce the weight imposed on the user and enable precise control of joints. However, ensuring limb coordination through bidirectional motion control of joints using cables raise the technical challenge of preventing the occurrence of undesired cable slackness or counteracting forces between cables. Thus, motivation exists to develop a control design framework that integrates both a joint control loop to ensure suitable limb tracking and a cable control loop to maintain cable tension properly. In this article, a two-layer control structure consisting of high and low-level controllers are developed to ensure a knee-joint exoskeleton system follows the desired joint trajectories and adjusts the cable tension, respectively. A repetitive learning controller is designed for the high-level knee joint tracking objective motivated by the periodic nature of the desired leg swings (i.e., to achieve knee flexion and extension). Low-level robust controllers are developed for a pair of cables, each actuated by an electric motor, to track target motor trajectories composed of motor kinematics and offset angles to mitigate cable slackness. The offset angles are computed using admittance models that exploit measurements of the cable tensions as inputs. Each electric motor switches its role between tracking the knee joint trajectory (i.e., the motor acts as the leader motor to achieve flexion or extension) and implementing the low-level controller (i.e., the motor acts as the follower motor to reduce slackness). Hence, at any time, one motor is the leader and the other is the follower. A Lyapunov-based stability analysis is developed for the high-level joint controller to ensure global asymptotic tracking and the low-level follower controller to guarantee global exponential tracking. The designed controllers are implemented during leg swing experiments in six able-bodied individuals while wearing the knee joint cable-driven exoskeleton. A comparison of the results obtained in two trials with and without using the admittance model (i.e., exploiting cable tension measurements) is presented. The experimental results indicate improved knee joint tracking performance, smaller control input magnitudes, and reduced cable slackness in the trial that leveraged cable tension feedback compared to the trial that did not exploit tension feedback.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"171-184"},"PeriodicalIF":0.0,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10163824.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual Control of Coupled Oscillator Networks 耦合振荡网络的对偶控制

IEEE open journal of control systems

Pub Date : 2023-06-02 DOI: 10.1109/OJCSYS.2023.3282438

Per Sebastian Skardal;Alex Arenas

Robust coordination and organization in large ensembles of nonlinear oscillatory units play a vital role in a wide range of natural and engineered system. The control of self-organizing network-coupled systems has recently seen significant attention, but largely in the context of modifying or augmenting existing structures. This leaves a gap in our understanding of reactive control, where and how to design direct interventions, and what we may learn about structure and dynamics from such control strategies. Here we study reactive control of coupled oscillator networks and demonstrate dual control strategies, i.e., two different mechanisms for control, that may each be implemented on their own and interchangeably to achieve synchronization. These diverse strategies exploit different network properties, with the first directly targeting oscillators that are challenging to entrain, and the second focusing on oscillators with a strong influence on others. Thus, in addition to presenting alternative strategies for network control, the distinct control sets illuminate the oscillators' dynamical and structural roles within the system. The applicability of dual control is demonstrated using both synthetic and real networks.

非线性振荡单元大集合的鲁棒协调和组织在各种自然和工程系统中发挥着至关重要的作用。自组织网络耦合系统的控制最近受到了极大的关注，但主要是在修改或扩充现有结构的背景下。这给我们对反应控制的理解留下了空白，在哪里以及如何设计直接干预措施，以及我们可以从这种控制策略中学到什么关于结构和动态的知识。在这里，我们研究了耦合振荡器网络的无功控制，并演示了双重控制策略，即两种不同的控制机制，它们可以各自独立实现并可互换以实现同步。这些不同的策略利用了不同的网络特性，第一种策略直接针对难以携带的振荡器，第二种策略专注于对其他振荡器有强烈影响的振荡器。因此，除了提出网络控制的替代策略外，不同的控制集还阐明了振荡器在系统中的动态和结构作用。使用合成网络和真实网络证明了对偶控制的适用性。

{"title":"Dual Control of Coupled Oscillator Networks","authors":"Per Sebastian Skardal;Alex Arenas","doi":"10.1109/OJCSYS.2023.3282438","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3282438","url":null,"abstract":"Robust coordination and organization in large ensembles of nonlinear oscillatory units play a vital role in a wide range of natural and engineered system. The control of self-organizing network-coupled systems has recently seen significant attention, but largely in the context of modifying or augmenting existing structures. This leaves a gap in our understanding of reactive control, where and how to design direct interventions, and what we may learn about structure and dynamics from such control strategies. Here we study reactive control of coupled oscillator networks and demonstrate dual control strategies, i.e., two different mechanisms for control, that may each be implemented on their own and interchangeably to achieve synchronization. These diverse strategies exploit different network properties, with the first directly targeting oscillators that are challenging to entrain, and the second focusing on oscillators with a strong influence on others. Thus, in addition to presenting alternative strategies for network control, the distinct control sets illuminate the oscillators' dynamical and structural roles within the system. The applicability of dual control is demonstrated using both synthetic and real networks.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"146-154"},"PeriodicalIF":0.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10143236.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Backward Reachability Analysis of Neural Feedback Loops: Techniques for Linear and Nonlinear Systems 神经反馈回路的后向可达性分析：线性和非线性系统的技术

IEEE open journal of control systems

Pub Date : 2023-04-10 DOI: 10.1109/OJCSYS.2023.3265901

Nicholas Rober;Sydney M. Katz;Chelsea Sidrane;Esen Yel;Michael Everett;Mykel J. Kochenderfer;Jonathan P. How

As neural networks (NNs) become more prevalent in safety-critical applications such as control of vehicles, there is a growing need to certify that systems with NN components are safe. This paper presents a set of backward reachability approaches for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have been developed for systems without NN components, the nonlinearities in NN activation functions and general noninvertibility of NN weight matrices make backward reachability for NFLs a challenging problem. To avoid the difficulties associated with propagating sets backward through NNs, we introduce a framework that leverages standard forward NN analysis tools to efficiently find over-approximations to backprojection (BP) sets, i.e., sets of states for which an NN policy will lead a system to a given target set. We present frameworks for calculating BP over-approximations for both linear and nonlinear systems with control policies represented by feedforward NNs and propose computationally efficient strategies. We use numerical results from a variety of models to showcase the proposed algorithms, including a demonstration of safety certification for a 6D system.

随着神经网络在车辆控制等安全关键应用中越来越普遍，越来越需要证明具有神经网络组件的系统是安全的。本文提出了一组用于神经反馈回路（NFL）安全认证的向后可达性方法，即具有NN控制策略的闭环系统。虽然已经为没有NN组件的系统开发了向后可达性策略，但NN激活函数的非线性和NN权重矩阵的一般不可逆性使得NFL的向后可达性成为一个具有挑战性的问题。为了避免通过神经网络向后传播集合的困难，我们引入了一个框架，该框架利用标准的正向神经网络分析工具来有效地找到反向投影（BP）集合的过度近似，即神经网络策略将引导系统到达给定目标集的状态集合。我们提出了用前馈神经网络表示控制策略的线性和非线性系统的近似计算BP的框架，并提出了计算高效的策略。我们使用各种模型的数值结果来展示所提出的算法，包括6D系统的安全认证演示。

{"title":"Backward Reachability Analysis of Neural Feedback Loops: Techniques for Linear and Nonlinear Systems","authors":"Nicholas Rober;Sydney M. Katz;Chelsea Sidrane;Esen Yel;Michael Everett;Mykel J. Kochenderfer;Jonathan P. How","doi":"10.1109/OJCSYS.2023.3265901","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3265901","url":null,"abstract":"As neural networks (NNs) become more prevalent in safety-critical applications such as control of vehicles, there is a growing need to certify that systems with NN components are safe. This paper presents a set of backward reachability approaches for safety certification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While backward reachability strategies have been developed for systems without NN components, the nonlinearities in NN activation functions and general noninvertibility of NN weight matrices make backward reachability for NFLs a challenging problem. To avoid the difficulties associated with propagating sets backward through NNs, we introduce a framework that leverages standard forward NN analysis tools to efficiently find over-approximations to backprojection (BP) sets, i.e., sets of states for which an NN policy will lead a system to a given target set. We present frameworks for calculating BP over-approximations for both linear and nonlinear systems with control policies represented by feedforward NNs and propose computationally efficient strategies. We use numerical results from a variety of models to showcase the proposed algorithms, including a demonstration of safety certification for a 6D system.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"108-124"},"PeriodicalIF":0.0,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10097878.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Distributed Data-Driven Control of Network Systems 网络系统的分布式数据驱动控制

IEEE open journal of control systems

Pub Date : 2023-03-20 DOI: 10.1109/OJCSYS.2023.3259228

Federico Celi;Giacomo Baggio;Fabio Pasqualetti

Imperfect models lead to imperfect controllers and deriving accurate models from first principles or system identification is especially challenging in networked systems. Instead, data can be used to directly compute controllers, without requiring any system identification or modeling. In this paper we propose a strategy to directly learn control actions when data from past system trajectories is distributed among multiple agents in a network. The approach we develop provably converges to a suboptimal solution in a finite number of steps, bounded by the diameter of the network, and with a sub-optimality gap that can be characterized as a function of data, and that can be made arbitrarily small. We further characterize the robustness properties of our approach and give provable guarantees on its performance when data are affected by noise or by a class of attacks.

不完美的模型会导致不完美的控制器，在网络系统中，从第一性原理或系统识别推导准确的模型尤其具有挑战性。相反，数据可以用于直接计算控制器，而不需要任何系统识别或建模。在本文中，当来自过去系统轨迹的数据分布在网络中的多个代理之间时，我们提出了一种直接学习控制动作的策略。我们开发的方法在有限的步骤中可证明地收敛于次优解，以网络的直径为界，并且具有次优间隙，该次优间隙可以表征为数据的函数，并且可以变得任意小。我们进一步描述了我们的方法的鲁棒性，并在数据受到噪声或一类攻击影响时对其性能给出了可证明的保证。

引用次数: 3

Policy Evaluation in Decentralized POMDPs With Belief Sharing 具有信念共享的去中心化POMDP中的政策评估

IEEE open journal of control systems

Pub Date : 2023-03-18 DOI: 10.1109/OJCSYS.2023.3277760

Mert Kayaalp;Fatima Ghadieh;Ali H. Sayed

Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.

大多数关于多智能体强化学习的工作都集中在环境状态完全可观察的场景上。在这项工作中，我们考虑了一个合作策略评估任务，其中假设代理不直接观察环境状态。相反，代理只能访问有噪声的观察结果和信任向量。众所周知，在多智能体环境下寻找全局后验分布通常是NP困难的。作为补救措施，我们提出了一种完全去中心化的信念形成策略，该策略依赖于个人更新和通信网络上的本地化交互。除了交换信念之外，代理还通过交换值函数参数估计来利用通信网络。我们分析表明，所提出的策略允许信息在网络上传播，这反过来又允许代理的参数与集中式基线具有有界差异。仿真中考虑了多传感器目标跟踪的应用。

引用次数: 0

Model-Based Reinforcement Learning via Stochastic Hybrid Models 基于随机混合模型的强化学习

IEEE open journal of control systems

Pub Date : 2023-03-17 DOI: 10.1109/OJCSYS.2023.3277308

Hany Abdulsamad;Jan Peters

Optimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This article adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.

一般非线性系统的最优控制是自动化中的一个核心挑战。在强大的函数逼近器的支持下，数据驱动的控制方法最近成功地解决了具有挑战性的应用。然而，这种方法往往掩盖了参数化表示黑匣子背后的动力学和控制结构，从而限制了我们理解闭环行为的能力。本文采用了非线性建模和控制的混合系统观点，为问题提供了明确的层次结构，并将复杂的动力学分解为更简单的局部单元。我们考虑了一种序列建模范式，该范式捕捉数据的时间结构，并推导出一种期望最大化（EM）算法，该算法自动将非线性动力学分解为具有非线性过渡边界的随机分段仿射模型。此外，我们证明了这些时间序列模型自然地允许闭环扩展，我们使用它通过行为克隆从非线性专家那里提取局部多项式反馈控制器。最后，我们介绍了一种新的混合相对熵策略搜索（Hb REPS）技术，该技术结合了混合模型的层次性，并优化了一组从全局状态值函数的分段多项式近似导出的时不变分段反馈控制器。

{"title":"Model-Based Reinforcement Learning via Stochastic Hybrid Models","authors":"Hany Abdulsamad;Jan Peters","doi":"10.1109/OJCSYS.2023.3277308","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3277308","url":null,"abstract":"Optimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This article adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"155-170"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10128705.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes 使用可达性分析和多项式分区型通过动作投影的可证明安全的强化学习

IEEE open journal of control systems

Pub Date : 2023-03-13 DOI: 10.1109/OJCSYS.2023.3256305

Niklas Kochdumper;Hanna Krasowski;Xiao Wang;Stanley Bak;Matthias Althoff

While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.

虽然强化学习在许多应用中产生了非常有希望的结果，但其主要缺点是缺乏安全保障，这阻碍了它在安全关键系统中的使用。在这项工作中，我们通过非线性连续系统的安全屏蔽来解决这个问题，该系统解决了到达回避任务。我们的安全防护通过将建议的动作投影到最接近的安全动作来防止强化学习代理应用潜在的不安全动作。这种方法被称为动作投影，并通过混合整数优化来实现。通过使用多项式区域图应用参数化可达性分析来获得动作投影的安全约束，这使得能够准确地捕捉动作对系统的非线性影响。与其他最先进的动作投影方法相比，我们的安全防护罩可以有效地处理输入约束和动态障碍，便于将空间机器人尺寸纳入安全约束，尽管存在过程噪声和测量误差，但仍能确保稳健的安全性，并且非常适合高维系统，正如我们在几个具有挑战性的基准系统上所展示的那样。

{"title":"Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes","authors":"Niklas Kochdumper;Hanna Krasowski;Xiao Wang;Stanley Bak;Matthias Althoff","doi":"10.1109/OJCSYS.2023.3256305","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3256305","url":null,"abstract":"While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"79-92"},"PeriodicalIF":0.0,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10068193.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions 基于积分值函数的动力系统无模型分布强化学习状态估计

IEEE open journal of control systems

Pub Date : 2023-02-27 DOI: 10.1109/OJCSYS.2023.3250089

Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner

One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.

传感器网络系统中具有挑战性的问题之一是估计和跟踪具有未知动力学的目标点质量的状态。深度学习（DL）的最新改进显示出对将DL技术应用于状态估计问题的新兴趣。然而，过程噪声不存在，这似乎表明点质量目标必须是非机动的，因为过程噪声通常与跟踪机动目标的测量噪声一样重要。在本文中，我们提出了一种在传感器网络中使用积分值函数的连续时间（CT）无模型或建模分布式强化学习估计器（DRLE）。DRLE算法能够从神经值函数中学习最优策略，该函数旨在提供目标点质量的估计。所提出的估计器由两个加权测量和逆协方差矩阵的高通一致性滤波器和网络中每个节点的临界强化学习机制组成。通过对具有强输入耦合的欠驱动垂直起降飞机网络的仿真实验，表明了所提出的DRLE的效率。该实验强调了DRLE的两个优点：i）它不需要知道动态模型，ii）它比依赖状态的Riccati方程（SDRE）基线快一个数量级。

{"title":"Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions","authors":"Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner","doi":"10.1109/OJCSYS.2023.3250089","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3250089","url":null,"abstract":"One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"70-78"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10054475.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The Internal Model Principle for Biomolecular Control Theory 生物分子控制理论的内模原理

IEEE open journal of control systems

Pub Date : 2023-02-10 DOI: 10.1109/OJCSYS.2023.3244089

Ankit Gupta;Mustafa Khammash

The well-known Internal Model Principle (IMP) is a cornerstone of modern control theory. It stipulates the necessary conditions for asymptotic robustness of disturbance-prone dynamical systems by asserting that such a system must embed a subsystem in a feedback loop, and this subsystem must be able to reduplicate the dynamic disturbance using only the regulated variable as the input. The insights provided by IMP can help in both designing suitable controllers and also in analysing the regulatory mechanisms in complex systems. So far the application of IMP in biology has been case-specific and ad hoc, primarily due to the lack of generic versions of the IMP for biomolecular reaction networks that model biological processes. In this short article we highlight the need for an IMP in biology and discuss a recently developed version of it for biomolecular networks that exhibit maximal Robust Perfect Adaptation (maxRPA) by being robust to the maximum number of disturbance sources.

众所周知的内部模型原理是现代控制理论的基石。它规定了易受扰动的动力系统渐近鲁棒性的必要条件，声称这样的系统必须在反馈回路中嵌入一个子系统，并且该子系统必须能够仅使用调节变量作为输入来重复动态扰动。IMP提供的见解有助于设计合适的控制器，也有助于分析复杂系统中的调节机制。到目前为止，IMP在生物学中的应用是针对具体情况和特殊情况的，主要是由于缺乏用于模拟生物过程的生物分子反应网络的IMP的通用版本。在这篇短文中，我们强调了生物学中对IMP的需求，并讨论了最近开发的用于生物分子网络的IMP版本，该版本通过对最大数量的干扰源具有鲁棒性而表现出最大鲁棒完全适应（maxRPA）。

引用次数: 1

Certifying Black-Box Policies With Stability for Nonlinear Control 非线性控制黑盒策略的稳定性证明

IEEE open journal of control systems

Pub Date : 2023-02-01 DOI: 10.1109/OJCSYS.2023.3241486

Tongxin Li;Ruixiao Yang;Guannan Qu;Yiheng Lin;Adam Wierman;Steven H. Low

Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of certifying a black-box control policy with stability using model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive

$lambda$

-confident policy, with a coefficient

$lambda$

indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive

$lambda$

-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive

$lambda$

-confident policy and verify its efficacy in case studies about the Cart-Pole problem and a real-world electric vehicle (EV) charging problem with covariate shift due to COVID-19.

机器学习的黑匣子策略在非线性控制问题中普遍存在。同时，这些问题通常可以从非线性动力学的线性近似中获得粗略的模型信息。我们研究了在单轨迹上使用基于模型的非线性控制建议来证明具有稳定性的黑箱控制策略的问题。我们首先给出了一个普遍的否定结果，即黑箱策略和基于线性模型的策略的天真凸组合可能导致不稳定，即使这两个策略都是稳定的。然后，我们提出了一个自适应的$lambda$置信策略，系数$lambda$表示黑盒策略中的置信度，并证明了它的稳定性。此外，在有界非线性的情况下，我们证明了当黑盒策略接近最优时，自适应$lambda$置信策略实现了有界竞争比。最后，我们提出了一种在线学习方法来实现自适应$lambda$-置信策略，并在关于Cart-Pole问题和现实世界电动汽车（EV）充电问题的案例研究中验证其有效性，该问题因新冠肺炎而发生协变。

{"title":"Certifying Black-Box Policies With Stability for Nonlinear Control","authors":"Tongxin Li;Ruixiao Yang;Guannan Qu;Yiheng Lin;Adam Wierman;Steven H. Low","doi":"10.1109/OJCSYS.2023.3241486","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3241486","url":null,"abstract":"Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of certifying a black-box control policy with stability using model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an \u0000<italic>adaptive <inline-formula><tex-math>$lambda$</tex-math></inline-formula>-confident policy</i>\u0000, with a coefficient \u0000<inline-formula><tex-math>$lambda$</tex-math></inline-formula>\u0000 indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive \u0000<inline-formula><tex-math>$lambda$</tex-math></inline-formula>\u0000-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive \u0000<inline-formula><tex-math>$lambda$</tex-math></inline-formula>\u0000-confident policy and verify its efficacy in case studies about the Cart-Pole problem and a real-world electric vehicle (EV) charging problem with covariate shift due to COVID-19.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"49-62"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10034859.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4