首页 > 最新文献

IEEE open journal of control systems最新文献

英文 中文
Policy Evaluation in Decentralized POMDPs With Belief Sharing 具有信念共享的去中心化POMDP中的政策评估
Pub Date : 2023-03-18 DOI: 10.1109/OJCSYS.2023.3277760
Mert Kayaalp;Fatima Ghadieh;Ali H. Sayed
Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.
大多数关于多智能体强化学习的工作都集中在环境状态完全可观察的场景上。在这项工作中,我们考虑了一个合作策略评估任务,其中假设代理不直接观察环境状态。相反,代理只能访问有噪声的观察结果和信任向量。众所周知,在多智能体环境下寻找全局后验分布通常是NP困难的。作为补救措施,我们提出了一种完全去中心化的信念形成策略,该策略依赖于个人更新和通信网络上的本地化交互。除了交换信念之外,代理还通过交换值函数参数估计来利用通信网络。我们分析表明,所提出的策略允许信息在网络上传播,这反过来又允许代理的参数与集中式基线具有有界差异。仿真中考虑了多传感器目标跟踪的应用。
{"title":"Policy Evaluation in Decentralized POMDPs With Belief Sharing","authors":"Mert Kayaalp;Fatima Ghadieh;Ali H. Sayed","doi":"10.1109/OJCSYS.2023.3277760","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3277760","url":null,"abstract":"Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"125-145"},"PeriodicalIF":0.0,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10129007.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50226357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Based Reinforcement Learning via Stochastic Hybrid Models 基于随机混合模型的强化学习
Pub Date : 2023-03-17 DOI: 10.1109/OJCSYS.2023.3277308
Hany Abdulsamad;Jan Peters
Optimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This article adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.
一般非线性系统的最优控制是自动化中的一个核心挑战。在强大的函数逼近器的支持下,数据驱动的控制方法最近成功地解决了具有挑战性的应用。然而,这种方法往往掩盖了参数化表示黑匣子背后的动力学和控制结构,从而限制了我们理解闭环行为的能力。本文采用了非线性建模和控制的混合系统观点,为问题提供了明确的层次结构,并将复杂的动力学分解为更简单的局部单元。我们考虑了一种序列建模范式,该范式捕捉数据的时间结构,并推导出一种期望最大化(EM)算法,该算法自动将非线性动力学分解为具有非线性过渡边界的随机分段仿射模型。此外,我们证明了这些时间序列模型自然地允许闭环扩展,我们使用它通过行为克隆从非线性专家那里提取局部多项式反馈控制器。最后,我们介绍了一种新的混合相对熵策略搜索(Hb REPS)技术,该技术结合了混合模型的层次性,并优化了一组从全局状态值函数的分段多项式近似导出的时不变分段反馈控制器。
{"title":"Model-Based Reinforcement Learning via Stochastic Hybrid Models","authors":"Hany Abdulsamad;Jan Peters","doi":"10.1109/OJCSYS.2023.3277308","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3277308","url":null,"abstract":"Optimal control of general nonlinear systems is a central challenge in automation. Enabled by powerful function approximators, data-driven approaches to control have recently successfully tackled challenging applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand closed-loop behavior. This article adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. We consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expectation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine models with nonlinear transition boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract local polynomial feedback controllers from nonlinear experts via behavioral cloning. Finally, we introduce a novel hybrid relative entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid models and optimizes a set of time-invariant piecewise feedback controllers derived from a piecewise polynomial approximation of a global state-value function.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"155-170"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10128705.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes 使用可达性分析和多项式分区型通过动作投影的可证明安全的强化学习
Pub Date : 2023-03-13 DOI: 10.1109/OJCSYS.2023.3256305
Niklas Kochdumper;Hanna Krasowski;Xiao Wang;Stanley Bak;Matthias Althoff
While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.
虽然强化学习在许多应用中产生了非常有希望的结果,但其主要缺点是缺乏安全保障,这阻碍了它在安全关键系统中的使用。在这项工作中,我们通过非线性连续系统的安全屏蔽来解决这个问题,该系统解决了到达回避任务。我们的安全防护通过将建议的动作投影到最接近的安全动作来防止强化学习代理应用潜在的不安全动作。这种方法被称为动作投影,并通过混合整数优化来实现。通过使用多项式区域图应用参数化可达性分析来获得动作投影的安全约束,这使得能够准确地捕捉动作对系统的非线性影响。与其他最先进的动作投影方法相比,我们的安全防护罩可以有效地处理输入约束和动态障碍,便于将空间机器人尺寸纳入安全约束,尽管存在过程噪声和测量误差,但仍能确保稳健的安全性,并且非常适合高维系统,正如我们在几个具有挑战性的基准系统上所展示的那样。
{"title":"Provably Safe Reinforcement Learning via Action Projection Using Reachability Analysis and Polynomial Zonotopes","authors":"Niklas Kochdumper;Hanna Krasowski;Xiao Wang;Stanley Bak;Matthias Althoff","doi":"10.1109/OJCSYS.2023.3256305","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3256305","url":null,"abstract":"While reinforcement learning produces very promising results for many applications, its main disadvantage is the lack of safety guarantees, which prevents its use in safety-critical systems. In this work, we address this issue by a safety shield for nonlinear continuous systems that solve reach-avoid tasks. Our safety shield prevents applying potentially unsafe actions from a reinforcement learning agent by projecting the proposed action to the closest safe action. This approach is called action projection and is implemented via mixed-integer optimization. The safety constraints for action projection are obtained by applying parameterized reachability analysis using polynomial zonotopes, which enables to accurately capture the nonlinear effects of the actions on the system. In contrast to other state-of-the-art approaches for action projection, our safety shield can efficiently handle input constraints and dynamic obstacles, eases incorporation of the spatial robot dimensions into the safety constraints, guarantees robust safety despite process noise and measurement errors, and is well suited for high-dimensional systems, as we demonstrate on several challenging benchmark systems.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"79-92"},"PeriodicalIF":0.0,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10068193.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions 基于积分值函数的动力系统无模型分布强化学习状态估计
Pub Date : 2023-02-27 DOI: 10.1109/OJCSYS.2023.3250089
Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner
One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.
传感器网络系统中具有挑战性的问题之一是估计和跟踪具有未知动力学的目标点质量的状态。深度学习(DL)的最新改进显示出对将DL技术应用于状态估计问题的新兴趣。然而,过程噪声不存在,这似乎表明点质量目标必须是非机动的,因为过程噪声通常与跟踪机动目标的测量噪声一样重要。在本文中,我们提出了一种在传感器网络中使用积分值函数的连续时间(CT)无模型或建模分布式强化学习估计器(DRLE)。DRLE算法能够从神经值函数中学习最优策略,该函数旨在提供目标点质量的估计。所提出的估计器由两个加权测量和逆协方差矩阵的高通一致性滤波器和网络中每个节点的临界强化学习机制组成。通过对具有强输入耦合的欠驱动垂直起降飞机网络的仿真实验,表明了所提出的DRLE的效率。该实验强调了DRLE的两个优点:i)它不需要知道动态模型,ii)它比依赖状态的Riccati方程(SDRE)基线快一个数量级。
{"title":"Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions","authors":"Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner","doi":"10.1109/OJCSYS.2023.3250089","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3250089","url":null,"abstract":"One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"70-78"},"PeriodicalIF":0.0,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10054475.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Internal Model Principle for Biomolecular Control Theory 生物分子控制理论的内模原理
Pub Date : 2023-02-10 DOI: 10.1109/OJCSYS.2023.3244089
Ankit Gupta;Mustafa Khammash
The well-known Internal Model Principle (IMP) is a cornerstone of modern control theory. It stipulates the necessary conditions for asymptotic robustness of disturbance-prone dynamical systems by asserting that such a system must embed a subsystem in a feedback loop, and this subsystem must be able to reduplicate the dynamic disturbance using only the regulated variable as the input. The insights provided by IMP can help in both designing suitable controllers and also in analysing the regulatory mechanisms in complex systems. So far the application of IMP in biology has been case-specific and ad hoc, primarily due to the lack of generic versions of the IMP for biomolecular reaction networks that model biological processes. In this short article we highlight the need for an IMP in biology and discuss a recently developed version of it for biomolecular networks that exhibit maximal Robust Perfect Adaptation (maxRPA) by being robust to the maximum number of disturbance sources.
众所周知的内部模型原理是现代控制理论的基石。它规定了易受扰动的动力系统渐近鲁棒性的必要条件,声称这样的系统必须在反馈回路中嵌入一个子系统,并且该子系统必须能够仅使用调节变量作为输入来重复动态扰动。IMP提供的见解有助于设计合适的控制器,也有助于分析复杂系统中的调节机制。到目前为止,IMP在生物学中的应用是针对具体情况和特殊情况的,主要是由于缺乏用于模拟生物过程的生物分子反应网络的IMP的通用版本。在这篇短文中,我们强调了生物学中对IMP的需求,并讨论了最近开发的用于生物分子网络的IMP版本,该版本通过对最大数量的干扰源具有鲁棒性而表现出最大鲁棒完全适应(maxRPA)。
{"title":"The Internal Model Principle for Biomolecular Control Theory","authors":"Ankit Gupta;Mustafa Khammash","doi":"10.1109/OJCSYS.2023.3244089","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3244089","url":null,"abstract":"The well-known Internal Model Principle (IMP) is a cornerstone of modern control theory. It stipulates the necessary conditions for asymptotic robustness of disturbance-prone dynamical systems by asserting that such a system must embed a subsystem in a feedback loop, and this subsystem must be able to reduplicate the dynamic disturbance using only the regulated variable as the input. The insights provided by IMP can help in both designing suitable controllers and also in analysing the regulatory mechanisms in complex systems. So far the application of IMP in biology has been case-specific and ad hoc, primarily due to the lack of generic versions of the IMP for biomolecular reaction networks that model biological processes. In this short article we highlight the need for an IMP in biology and discuss a recently developed version of it for biomolecular networks that exhibit maximal Robust Perfect Adaptation (maxRPA) by being robust to the maximum number of disturbance sources.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"63-69"},"PeriodicalIF":0.0,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10041993.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50226358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Certifying Black-Box Policies With Stability for Nonlinear Control 非线性控制黑盒策略的稳定性证明
Pub Date : 2023-02-01 DOI: 10.1109/OJCSYS.2023.3241486
Tongxin Li;Ruixiao Yang;Guannan Qu;Yiheng Lin;Adam Wierman;Steven H. Low
Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of certifying a black-box control policy with stability using model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an adaptive $lambda$-confident policy, with a coefficient $lambda$ indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive $lambda$-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive $lambda$-confident policy and verify its efficacy in case studies about the Cart-Pole problem and a real-world electric vehicle (EV) charging problem with covariate shift due to COVID-19.
机器学习的黑匣子策略在非线性控制问题中普遍存在。同时,这些问题通常可以从非线性动力学的线性近似中获得粗略的模型信息。我们研究了在单轨迹上使用基于模型的非线性控制建议来证明具有稳定性的黑箱控制策略的问题。我们首先给出了一个普遍的否定结果,即黑箱策略和基于线性模型的策略的天真凸组合可能导致不稳定,即使这两个策略都是稳定的。然后,我们提出了一个自适应的$lambda$置信策略,系数$lambda$表示黑盒策略中的置信度,并证明了它的稳定性。此外,在有界非线性的情况下,我们证明了当黑盒策略接近最优时,自适应$lambda$置信策略实现了有界竞争比。最后,我们提出了一种在线学习方法来实现自适应$lambda$-置信策略,并在关于Cart-Pole问题和现实世界电动汽车(EV)充电问题的案例研究中验证其有效性,该问题因新冠肺炎而发生协变。
{"title":"Certifying Black-Box Policies With Stability for Nonlinear Control","authors":"Tongxin Li;Ruixiao Yang;Guannan Qu;Yiheng Lin;Adam Wierman;Steven H. Low","doi":"10.1109/OJCSYS.2023.3241486","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3241486","url":null,"abstract":"Machine-learned black-box policies are ubiquitous for nonlinear control problems. Meanwhile, crude model information is often available for these problems from, e.g., linear approximations of nonlinear dynamics. We study the problem of certifying a black-box control policy with stability using model-based advice for nonlinear control on a single trajectory. We first show a general negative result that a naive convex combination of a black-box policy and a linear model-based policy can lead to instability, even if the two policies are both stabilizing. We then propose an \u0000<italic>adaptive <inline-formula><tex-math>$lambda$</tex-math></inline-formula>-confident policy</i>\u0000, with a coefficient \u0000<inline-formula><tex-math>$lambda$</tex-math></inline-formula>\u0000 indicating the confidence in a black-box policy, and prove its stability. With bounded nonlinearity, in addition, we show that the adaptive \u0000<inline-formula><tex-math>$lambda$</tex-math></inline-formula>\u0000-confident policy achieves a bounded competitive ratio when a black-box policy is near-optimal. Finally, we propose an online learning approach to implement the adaptive \u0000<inline-formula><tex-math>$lambda$</tex-math></inline-formula>\u0000-confident policy and verify its efficacy in case studies about the Cart-Pole problem and a real-world electric vehicle (EV) charging problem with covariate shift due to COVID-19.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"49-62"},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10034859.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Cross Apprenticeship Learning Framework: Properties and Solution Approaches 跨学徒制学习框架:特性与解决方法
Pub Date : 2023-01-09 DOI: 10.1109/OJCSYS.2023.3235248
Ashwin Aravind;Debasish Chatterjee;Ashish Cherukuri
Apprenticeship learning is a framework in which an agent learns a policy to perform a given task in an environment using example trajectories provided by an expert. In the real world, one might have access to expert trajectories in different environments where system dynamics is different while the learning task is the same. For such scenarios, two types of learning objectives can be defined. One where the learned policy performs very well in one specific environment and another when it performs well across all environments. To balance these two objectives in a principled way, our work presents the cross apprenticeship learning (CAL) framework. This consists of an optimization problem where an optimal policy for each environment is sought while ensuring that all policies remain close to each other. This nearness is facilitated by one tuning parameter in the optimization problem. We derive properties of the optimizers of the problem as the tuning parameter varies. We identify conditions under which an agent prefers using the policy obtained from CAL over the traditional apprenticeship learning. Since the CAL problem is nonconvex, we provide a convex outer approximation. Finally, we demonstrate the attributes of our framework in the context of a navigation task in a windy gridworld environment.
学徒制学习是一种框架,在该框架中,代理使用专家提供的示例轨迹来学习在环境中执行给定任务的策略。在现实世界中,在系统动力学不同而学习任务相同的不同环境中,人们可能可以访问专家轨迹。对于这样的场景,可以定义两种类型的学习目标。其中学习到的策略在一个特定环境中表现良好,而在另一个环境中,它在所有环境中都表现良好。为了以原则的方式平衡这两个目标,我们的工作提出了跨学徒学习(CAL)框架。这包括一个优化问题,其中为每个环境寻求最佳策略,同时确保所有策略保持彼此接近。优化问题中的一个调整参数促进了这种接近性。随着调谐参数的变化,我们导出了问题的优化器的性质。我们确定了代理人更喜欢使用从CAL获得的策略而不是传统学徒学习的条件。由于CAL问题是非凸的,我们提供了一个凸的外近似。最后,我们在风网格世界环境中的导航任务上下文中演示了我们的框架的属性。
{"title":"Cross Apprenticeship Learning Framework: Properties and Solution Approaches","authors":"Ashwin Aravind;Debasish Chatterjee;Ashish Cherukuri","doi":"10.1109/OJCSYS.2023.3235248","DOIUrl":"https://doi.org/10.1109/OJCSYS.2023.3235248","url":null,"abstract":"Apprenticeship learning is a framework in which an agent learns a policy to perform a given task in an environment using example trajectories provided by an expert. In the real world, one might have access to expert trajectories in different environments where system dynamics is different while the learning task is the same. For such scenarios, two types of learning objectives can be defined. One where the learned policy performs very well in one specific environment and another when it performs well across all environments. To balance these two objectives in a principled way, our work presents the cross apprenticeship learning (CAL) framework. This consists of an optimization problem where an optimal policy for each environment is sought while ensuring that all policies remain close to each other. This nearness is facilitated by one tuning parameter in the optimization problem. We derive properties of the optimizers of the problem as the tuning parameter varies. We identify conditions under which an agent prefers using the policy obtained from CAL over the traditional apprenticeship learning. Since the CAL problem is nonconvex, we provide a convex outer approximation. Finally, we demonstrate the attributes of our framework in the context of a navigation task in a windy gridworld environment.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"36-48"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10011555.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling and Characterization of Pre-Charged Collapse-Mode CMUTs 预充电坍缩模式cmut的建模与表征
Pub Date : 2023-01-01 DOI: 10.1109/OJUFFC.2023.3240699
M. Saccher, Shinnosuke Kawasaki, J. Klootwijk, R. van Schaijk, Ronald Dekker
Recently, the applications of ultrasound transducers expanded from high-end diagnostic tools to point of care diagnostic devices and wireless power receivers for implantable devices. These new applications additionally require that the transducer technology must comply to biocompatibility and manufacturing scalability. In this respect, Capacitive Micromachined Ultrasound Transducers (CMUTs) have a strong advantage compared to the conventional PZT based transducers. However, current CMUTs require a large DC bias voltage for their operation, which limits the miniaturizability of these devices. In this study, we propose a pre-charged collapse-mode CMUT for immersive applications that can operate without an external bias by means of a charge trapping Al2O3 layer embedded in the dielectrics between the top and bottom electrodes. The built-in charge layer was analytically modeled and four layer stack combinations were investigated and characterized. The measurement results of the CMUTs were then used to fit the model and to quantify the amount and type of trapped charge. It was found that these devices polarize due to the ferroelectric-like behavior of the Al2O3, and the amount of charge stored in the charge-trapping layer was estimated to be approximately 0.02 C/m2. Their acoustic performance shows a transmit and receive sensitivity of 8.8 kPa/V and 13.1 V/MPa respectively. In addition, we show that increasing the charging temperature, the charging duration, and the charging voltage results in a higher amount of stored charge. Finally, results of ALT tests showed that these devices have a lifetime of more than 2.5 years at body temperature.
近年来,超声换能器的应用范围从高端诊断工具扩展到护理点诊断设备和植入式设备的无线电源接收器。这些新的应用还要求换能器技术必须符合生物相容性和制造可扩展性。在这方面,电容式微机械超声换能器(CMUTs)与传统的PZT换能器相比具有很强的优势。然而,当前的cmut需要较大的直流偏置电压才能运行,这限制了这些器件的小型化。在这项研究中,我们提出了一种用于沉浸式应用的预充电坍缩模式CMUT,通过在上下电极之间的电介质中嵌入电荷捕获Al2O3层,该CMUT可以在没有外部偏置的情况下运行。对内置电荷层进行了解析建模,并对四层叠加组合进行了研究和表征。然后使用cmut的测量结果来拟合模型并量化捕获电荷的数量和类型。研究发现,这些器件由于Al2O3的类铁电行为而极化,并且电荷捕获层中存储的电荷量估计约为0.02 C/m2。其发射灵敏度为8.8 kPa/V,接收灵敏度为13.1 V/MPa。此外,我们还表明,增加充电温度、充电持续时间和充电电压会导致更高的存储电量。最后,ALT测试结果表明,这些装置在体温下的使用寿命超过2.5年。
{"title":"Modeling and Characterization of Pre-Charged Collapse-Mode CMUTs","authors":"M. Saccher, Shinnosuke Kawasaki, J. Klootwijk, R. van Schaijk, Ronald Dekker","doi":"10.1109/OJUFFC.2023.3240699","DOIUrl":"https://doi.org/10.1109/OJUFFC.2023.3240699","url":null,"abstract":"Recently, the applications of ultrasound transducers expanded from high-end diagnostic tools to point of care diagnostic devices and wireless power receivers for implantable devices. These new applications additionally require that the transducer technology must comply to biocompatibility and manufacturing scalability. In this respect, Capacitive Micromachined Ultrasound Transducers (CMUTs) have a strong advantage compared to the conventional PZT based transducers. However, current CMUTs require a large DC bias voltage for their operation, which limits the miniaturizability of these devices. In this study, we propose a pre-charged collapse-mode CMUT for immersive applications that can operate without an external bias by means of a charge trapping Al2O3 layer embedded in the dielectrics between the top and bottom electrodes. The built-in charge layer was analytically modeled and four layer stack combinations were investigated and characterized. The measurement results of the CMUTs were then used to fit the model and to quantify the amount and type of trapped charge. It was found that these devices polarize due to the ferroelectric-like behavior of the Al2O3, and the amount of charge stored in the charge-trapping layer was estimated to be approximately 0.02 C/m2. Their acoustic performance shows a transmit and receive sensitivity of 8.8 kPa/V and 13.1 V/MPa respectively. In addition, we show that increasing the charging temperature, the charging duration, and the charging voltage results in a higher amount of stored charge. Finally, results of ALT tests showed that these devices have a lifetime of more than 2.5 years at body temperature.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 1","pages":"14-28"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62907489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exact Decomposition of Optimal Control Problems via Simultaneous Block Diagonalization of Matrices 最优控制问题的矩阵同时块对角化的精确分解
Pub Date : 2022-12-22 DOI: 10.1109/OJCSYS.2022.3231553
Amirhossein Nazerian;Kshitij Bhatta;Francesco Sorrentino
In this paper, we consider optimal control problems (OCPs) applied to large-scale linear dynamical systems with a large number of states and inputs. We attempt to reduce such problems into a set of independent OCPs of lower dimensions. Our decomposition is ‘exact’ in the sense that it preserves all the information about the original system and the objective function. Previous work in this area has focused on strategies that exploit symmetries of the underlying system and of the objective function. Here, instead, we implement the algebraic method of simultaneous block diagonalization of matrices (SBD), which we show provides advantages both in terms of the dimension of the subproblems that are obtained and of the computation time. We provide practical examples with networked systems that demonstrate the benefits of applying the SBD decomposition over the decomposition method based on group symmetries.
本文研究了具有大量状态和输入的大型线性动力系统的最优控制问题。我们试图将这些问题简化为一组较低维度的独立OCP。我们的分解是“精确的”,因为它保留了关于原始系统和目标函数的所有信息。以前在这一领域的工作集中在利用底层系统和目标函数对称性的策略上。相反,在这里,我们实现了矩阵的同时块对角化(SBD)的代数方法,我们证明了该方法在所获得的子问题的维数和计算时间方面都具有优势。我们提供了网络系统的实际例子,证明了应用SBD分解相对于基于群对称性的分解方法的好处。
{"title":"Exact Decomposition of Optimal Control Problems via Simultaneous Block Diagonalization of Matrices","authors":"Amirhossein Nazerian;Kshitij Bhatta;Francesco Sorrentino","doi":"10.1109/OJCSYS.2022.3231553","DOIUrl":"10.1109/OJCSYS.2022.3231553","url":null,"abstract":"In this paper, we consider optimal control problems (OCPs) applied to large-scale linear dynamical systems with a large number of states and inputs. We attempt to reduce such problems into a set of independent OCPs of lower dimensions. Our decomposition is ‘exact’ in the sense that it preserves all the information about the original system and the objective function. Previous work in this area has focused on strategies that exploit symmetries of the underlying system and of the objective function. Here, instead, we implement the algebraic method of simultaneous block diagonalization of matrices (SBD), which we show provides advantages both in terms of the dimension of the subproblems that are obtained and of the computation time. We provide practical examples with networked systems that demonstrate the benefits of applying the SBD decomposition over the decomposition method based on group symmetries.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"24-35"},"PeriodicalIF":0.0,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9996568","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9111923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Robotarium: A Remotely-Accessible, Multi-Robot Testbed for Control Research and Education 机器人博物馆:用于控制研究和教育的可远程访问的多机器人试验台
Pub Date : 2022-12-22 DOI: 10.1109/OJCSYS.2022.3231523
Sean Wilson;Magnus Egerstedt
In robotic research and education, the cost in terms of money, expertise, and time required to instantiate and maintain robotic testbeds can prevent researchers and educators from including hardware based experimentation in their laboratories and classrooms. This results in robotic algorithms often being validated by low-fidelity simulation due to the complexity and computational demand required by high-fidelity simulators. Unfortunately, these simulation environments often neglect real world complexities, such as wheel slip, actuator dynamics, computation time, communication delays, and sensor noise. The Robotarium provides a solution to these problems by providing a state-of-the-art, multi-robot research facility to everyone around the world free of charge for academic and educational purposes. This paper discusses the remote usage of the testbed since its opening in 2017, details the testbeds design, and provides a brief tutorial on how to use it.
在机器人研究和教育中,实例化和维护机器人试验台所需的资金、专业知识和时间成本可能会阻碍研究人员和教育工作者在实验室和教室中进行基于硬件的实验。由于高保真度模拟器所需的复杂性和计算需求,这导致机器人算法经常通过低保真度模拟进行验证。不幸的是,这些模拟环境往往忽略了现实世界的复杂性,如车轮打滑、执行器动力学、计算时间、通信延迟和传感器噪声。机器人博物馆为世界各地的每个人免费提供最先进的多机器人研究设施,用于学术和教育目的,从而为这些问题提供了解决方案。本文讨论了自2017年开放以来测试台的远程使用,详细介绍了测试台的设计,并提供了如何使用它的简短教程。
{"title":"The Robotarium: A Remotely-Accessible, Multi-Robot Testbed for Control Research and Education","authors":"Sean Wilson;Magnus Egerstedt","doi":"10.1109/OJCSYS.2022.3231523","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3231523","url":null,"abstract":"In robotic research and education, the cost in terms of money, expertise, and time required to instantiate and maintain robotic testbeds can prevent researchers and educators from including hardware based experimentation in their laboratories and classrooms. This results in robotic algorithms often being validated by low-fidelity simulation due to the complexity and computational demand required by high-fidelity simulators. Unfortunately, these simulation environments often neglect real world complexities, such as wheel slip, actuator dynamics, computation time, communication delays, and sensor noise. The Robotarium provides a solution to these problems by providing a state-of-the-art, multi-robot research facility to everyone around the world free of charge for academic and educational purposes. This paper discusses the remote usage of the testbed since its opening in 2017, details the testbeds design, and provides a brief tutorial on how to use it.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"12-23"},"PeriodicalIF":0.0,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/09996578.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50226356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE open journal of control systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1