首页 > 最新文献

Conference on Learning for Dynamics & Control最新文献

英文 中文
ISAACS: Iterative Soft Adversarial Actor-Critic for Safety 安全的迭代软对抗行为批评家
Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03228
Kai Hsu, D. Nguyen, J. Fisac
The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable"deep"methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial"disturbance"agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.
在不受控制的环境中部署机器人需要它们在以前看不见的情况下稳健地运行,比如不规则的地形和风力条件。不幸的是,虽然来自鲁棒最优控制理论的严格安全框架难以适用于高维非线性动力学,但通过更易于处理的“深度”方法计算的控制策略缺乏保证,并且往往对不确定的操作条件表现出很少的鲁棒性。这项工作介绍了一种新的方法,通过将博弈论安全分析与模拟中的对抗强化学习相结合,为具有有界建模误差的一般非线性动力学的机器人系统实现可扩展的鲁棒安全保持控制器的综合。遵循软行为者-批评家方案,寻求安全的回退策略与对抗的“干扰”代理共同训练,该代理旨在调用模型错误的最坏情况实现以及设计者的不确定性所允许的训练与部署差异。虽然学习到的控制策略本质上不能保证安全性,但它用于构建基于前向可达性部署的具有鲁棒安全保证的实时安全过滤器(或屏蔽)。这种屏蔽可以与安全不可知控制策略结合使用,从而避免任何可能导致安全损失的任务驱动操作。我们在5D赛车模拟器中评估了基于学习的安全方法,将学习到的安全策略与数值获得的最优解进行了比较,并通过经验验证了我们提出的安全防护对最坏情况模型差异的鲁棒安全性保证。
{"title":"ISAACS: Iterative Soft Adversarial Actor-Critic for Safety","authors":"Kai Hsu, D. Nguyen, J. Fisac","doi":"10.48550/arXiv.2212.03228","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03228","url":null,"abstract":"The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable\"deep\"methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial\"disturbance\"agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116337380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Online Saddle Point Tracking with Decision-Dependent Data 基于决策依赖数据的鞍点在线跟踪
Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.02693
Killian Wood, E. Dall’Anese
In this work, we consider a time-varying stochastic saddle point problem in which the objective is revealed sequentially, and the data distribution depends on the decision variables. Problems of this type express the distributional dependence via a distributional map, and are known to have two distinct types of solutions--saddle points and equilibrium points. We demonstrate that, under suitable conditions, online primal-dual type algorithms are capable of tracking equilibrium points. In contrast, since computing closed-form gradient of the objective requires knowledge of the distributional map, we offer an online stochastic primal-dual algorithm for tracking equilibrium trajectories. We provide bounds in expectation and in high probability, with the latter leveraging a sub-Weibull model for the gradient error. We illustrate our results on an electric vehicle charging problem where responsiveness to prices follows a location-scale family based distributional map.
本文研究了一个时变随机鞍点问题,该问题的目标是顺序揭示的,数据分布依赖于决策变量。这种类型的问题通过分布图来表达分布依赖性,并且已知有两种不同类型的解——鞍点和平衡点。我们证明,在适当的条件下,在线原始对偶型算法能够跟踪平衡点。相比之下,由于计算目标的封闭形式梯度需要了解分布图,因此我们提供了一种在线随机原始对偶算法来跟踪平衡轨迹。我们提供了期望和高概率的边界,后者利用了梯度误差的子威布尔模型。我们在电动汽车充电问题上说明了我们的结果,其中对价格的响应遵循基于位置尺度的家庭分布地图。
{"title":"Online Saddle Point Tracking with Decision-Dependent Data","authors":"Killian Wood, E. Dall’Anese","doi":"10.48550/arXiv.2212.02693","DOIUrl":"https://doi.org/10.48550/arXiv.2212.02693","url":null,"abstract":"In this work, we consider a time-varying stochastic saddle point problem in which the objective is revealed sequentially, and the data distribution depends on the decision variables. Problems of this type express the distributional dependence via a distributional map, and are known to have two distinct types of solutions--saddle points and equilibrium points. We demonstrate that, under suitable conditions, online primal-dual type algorithms are capable of tracking equilibrium points. In contrast, since computing closed-form gradient of the objective requires knowledge of the distributional map, we offer an online stochastic primal-dual algorithm for tracking equilibrium trajectories. We provide bounds in expectation and in high probability, with the latter leveraging a sub-Weibull model for the gradient error. We illustrate our results on an electric vehicle charging problem where responsiveness to prices follows a location-scale family based distributional map.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132421756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DiffTune+: Hyperparameter-Free Auto-Tuning using Auto-Differentiation DiffTune+:使用自分化的超参数自动调谐
Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03194
Sheng Cheng, Lin Song, Minkyung Kim, Shenlong Wang, N. Hovakimyan
Controller tuning is a vital step to ensure the controller delivers its designed performance. DiffTune has been proposed as an automatic tuning method that unrolls the dynamical system and controller into a computational graph and uses auto-differentiation to obtain the gradient for the controller's parameter update. However, DiffTune uses the vanilla gradient descent to iteratively update the parameter, in which the performance largely depends on the choice of the learning rate (as a hyperparameter). In this paper, we propose to use hyperparameter-free methods to update the controller parameters. We find the optimal parameter update by maximizing the loss reduction, where a predicted loss based on the approximated state and control is used for the maximization. Two methods are proposed to optimally update the parameters and are compared with related variants in simulations on a Dubin's car and a quadrotor. Simulation experiments show that the proposed first-order method outperforms the hyperparameter-based methods and is more robust than the second-order hyperparameter-free methods.
控制器调优是确保控制器实现其设计性能的关键步骤。DiffTune是一种自动调谐方法,它将动态系统和控制器展开成一个计算图,并利用自微分来获得用于控制器参数更新的梯度。然而,DiffTune使用香草梯度下降来迭代更新参数,其中性能很大程度上取决于学习率的选择(作为超参数)。在本文中,我们提出使用无超参数的方法来更新控制器参数。我们通过最大化损失减少来找到最优参数更新,其中基于近似状态和控制的预测损失用于最大化。提出了两种优化更新参数的方法,并在杜宾汽车和四旋翼飞行器的仿真中与相关变量进行了比较。仿真实验表明,一阶方法优于基于超参数的方法,且比二阶无超参数方法具有更强的鲁棒性。
{"title":"DiffTune+: Hyperparameter-Free Auto-Tuning using Auto-Differentiation","authors":"Sheng Cheng, Lin Song, Minkyung Kim, Shenlong Wang, N. Hovakimyan","doi":"10.48550/arXiv.2212.03194","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03194","url":null,"abstract":"Controller tuning is a vital step to ensure the controller delivers its designed performance. DiffTune has been proposed as an automatic tuning method that unrolls the dynamical system and controller into a computational graph and uses auto-differentiation to obtain the gradient for the controller's parameter update. However, DiffTune uses the vanilla gradient descent to iteratively update the parameter, in which the performance largely depends on the choice of the learning rate (as a hyperparameter). In this paper, we propose to use hyperparameter-free methods to update the controller parameters. We find the optimal parameter update by maximizing the loss reduction, where a predicted loss based on the approximated state and control is used for the maximization. Two methods are proposed to optimally update the parameters and are compared with related variants in simulations on a Dubin's car and a quadrotor. Simulation experiments show that the proposed first-order method outperforms the hyperparameter-based methods and is more robust than the second-order hyperparameter-free methods.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122997807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Physics-Informed Model-Based Reinforcement Learning 基于物理信息模型的强化学习
Pub Date : 2022-12-05 DOI: 10.48550/arXiv.2212.02179
Adithya Ramesh, Balaraman Ravindran
We apply reinforcement learning (RL) to robotics tasks. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL algorithm, we learn a model of the environment, essentially its transition dynamics and reward function, use it to generate imaginary trajectories and backpropagate through them to update the policy, exploiting the differentiability of the model. Intuitively, learning more accurate models should lead to better model-based RL performance. Recently, there has been growing interest in developing better deep neural network based dynamics models for physical systems, by utilizing the structure of the underlying physics. We focus on robotic systems undergoing rigid body motion without contacts. We compare two versions of our model-based RL algorithm, one which uses a standard deep neural network based dynamics model and the other which uses a much more accurate, physics-informed neural network based dynamics model. We show that, in model-based RL, model accuracy mainly matters in environments that are sensitive to initial conditions, where numerical errors accumulate fast. In these environments, the physics-informed version of our algorithm achieves significantly better average-return and sample efficiency. In environments that are not sensitive to initial conditions, both versions of our algorithm achieve similar average-return, while the physics-informed version achieves better sample efficiency. We also show that, in challenging environments, physics-informed model-based RL achieves better average-return than state-of-the-art model-free RL algorithms such as Soft Actor-Critic, as it computes the policy-gradient analytically, while the latter estimates it through sampling.
我们将强化学习(RL)应用于机器人任务。传统强化学习算法的缺点之一是样本效率差。提高样本效率的一种方法是基于模型的强化学习。在我们基于模型的强化学习算法中,我们学习了一个环境模型,本质上是它的过渡动态和奖励函数,用它来生成想象的轨迹,并通过它们反向传播来更新策略,利用模型的可微分性。直观地说,学习更精确的模型应该会带来更好的基于模型的强化学习性能。最近,人们对开发更好的基于深度神经网络的物理系统动力学模型越来越感兴趣,利用底层物理的结构。我们的重点是机器人系统进行刚体运动无接触。我们比较了基于模型的RL算法的两个版本,一个使用标准的基于深度神经网络的动态模型,另一个使用更精确的基于物理信息的神经网络动态模型。我们表明,在基于模型的强化学习中,模型精度主要影响对初始条件敏感的环境,其中数值误差积累得很快。在这些环境中,我们的算法的物理信息版本实现了更好的平均回报和样本效率。在对初始条件不敏感的环境中,我们的算法的两个版本实现了相似的平均回报,而物理信息版本实现了更好的样本效率。我们还表明,在具有挑战性的环境中,基于物理信息的基于模型的强化学习比最先进的无模型强化学习算法(如Soft Actor-Critic)实现了更好的平均回报,因为它是分析性地计算策略梯度的,而后者是通过抽样来估计的。
{"title":"Physics-Informed Model-Based Reinforcement Learning","authors":"Adithya Ramesh, Balaraman Ravindran","doi":"10.48550/arXiv.2212.02179","DOIUrl":"https://doi.org/10.48550/arXiv.2212.02179","url":null,"abstract":"We apply reinforcement learning (RL) to robotics tasks. One of the drawbacks of traditional RL algorithms has been their poor sample efficiency. One approach to improve the sample efficiency is model-based RL. In our model-based RL algorithm, we learn a model of the environment, essentially its transition dynamics and reward function, use it to generate imaginary trajectories and backpropagate through them to update the policy, exploiting the differentiability of the model. Intuitively, learning more accurate models should lead to better model-based RL performance. Recently, there has been growing interest in developing better deep neural network based dynamics models for physical systems, by utilizing the structure of the underlying physics. We focus on robotic systems undergoing rigid body motion without contacts. We compare two versions of our model-based RL algorithm, one which uses a standard deep neural network based dynamics model and the other which uses a much more accurate, physics-informed neural network based dynamics model. We show that, in model-based RL, model accuracy mainly matters in environments that are sensitive to initial conditions, where numerical errors accumulate fast. In these environments, the physics-informed version of our algorithm achieves significantly better average-return and sample efficiency. In environments that are not sensitive to initial conditions, both versions of our algorithm achieve similar average-return, while the physics-informed version achieves better sample efficiency. We also show that, in challenging environments, physics-informed model-based RL achieves better average-return than state-of-the-art model-free RL algorithms such as Soft Actor-Critic, as it computes the policy-gradient analytically, while the latter estimates it through sampling.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"os-48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127788590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Predictive safety filter using system level synthesis 基于系统级综合的预测安全滤波器
Pub Date : 2022-12-05 DOI: 10.3929/ethz-b-000615512
Antoine P. Leeman, Johannes Köhler, S. Bennani, M. Zeilinger
Safety filters provide modular techniques to augment potentially unsafe control inputs (e.g. from learning-based controllers or humans) with safety guarantees in the form of constraint satisfaction. In this paper, we present an improved model predictive safety filter (MPSF) formulation, which incorporates system level synthesis techniques in the design. The resulting SL-MPSF scheme ensures safety for linear systems subject to bounded disturbances in an enlarged safe set. It requires less severe and frequent modifications of potentially unsafe control inputs compared to existing MPSF formulations to certify safety. In addition, we propose an explicit variant of the SL-MPSF formulation, which maintains scalability, and reduces the required online computational effort - the main drawback of the MPSF. The benefits of the proposed system level safety filter formulations compared to state-of-the-art MPSF formulations are demonstrated using a numerical example.
安全过滤器提供模块化技术,以约束满足的形式提供安全保证,以增加潜在的不安全控制输入(例如,来自基于学习的控制器或人类)。在本文中,我们提出了一种改进的模型预测安全滤波器(MPSF)配方,该配方在设计中结合了系统级综合技术。所得到的SL-MPSF格式在一个扩大的安全集中保证了受有界扰动的线性系统的安全性。与现有的MPSF配方相比,它不需要对潜在不安全的控制输入进行严格和频繁的修改,以证明安全性。此外,我们提出了SL-MPSF公式的显式变体,它保持了可扩展性,并减少了所需的在线计算工作量-MPSF的主要缺点。与最先进的MPSF配方相比,所提出的系统级安全过滤器配方的好处是用数值例子来证明的。
{"title":"Predictive safety filter using system level synthesis","authors":"Antoine P. Leeman, Johannes Köhler, S. Bennani, M. Zeilinger","doi":"10.3929/ethz-b-000615512","DOIUrl":"https://doi.org/10.3929/ethz-b-000615512","url":null,"abstract":"Safety filters provide modular techniques to augment potentially unsafe control inputs (e.g. from learning-based controllers or humans) with safety guarantees in the form of constraint satisfaction. In this paper, we present an improved model predictive safety filter (MPSF) formulation, which incorporates system level synthesis techniques in the design. The resulting SL-MPSF scheme ensures safety for linear systems subject to bounded disturbances in an enlarged safe set. It requires less severe and frequent modifications of potentially unsafe control inputs compared to existing MPSF formulations to certify safety. In addition, we propose an explicit variant of the SL-MPSF formulation, which maintains scalability, and reduces the required online computational effort - the main drawback of the MPSF. The benefits of the proposed system level safety filter formulations compared to state-of-the-art MPSF formulations are demonstrated using a numerical example.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134418953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Reinforcement Learning Look at Risk-Sensitive Linear Quadratic Gaussian Control 风险敏感线性二次高斯控制的强化学习研究
Pub Date : 2022-12-05 DOI: 10.48550/arXiv.2212.02072
Leilei Cui, Zhong-Ping Jiang
This paper proposes a novel robust reinforcement learning framework for discrete-time systems with model mismatch that may arise from the sim2real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy iteration algorithm is proposed to generate a robust optimal controller. The dual-loop policy iteration algorithm is shown to be globally exponentially and uniformly convergent, and robust against disturbance during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy iteration algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy iteration algorithm is proposed for the same class of dynamical system with additive Gaussian noise. Finally, numerical examples are provided for the demonstration of the proposed algorithm.
本文提出了一种新的鲁棒强化学习框架,用于解决可能由sim2real间隙引起的模型不匹配的离散时间系统。一个关键的策略是从控制理论中调用先进的技术。利用经典的风险敏感线性二次高斯控制公式,提出了一种双环策略迭代算法来生成鲁棒最优控制器。在学习过程中,双环策略迭代算法具有全局指数收敛性和一致收敛性,对扰动具有鲁棒性。这种鲁棒性被称为小干扰输入到状态稳定性,保证了所提出的策略迭代算法收敛到最优控制器的小邻域,只要每个学习步骤的干扰很小。此外,在系统动力学未知的情况下,针对一类具有加性高斯噪声的动力系统,提出了一种新的无模型脱策略迭代算法。最后,通过数值算例对所提算法进行了验证。
{"title":"A Reinforcement Learning Look at Risk-Sensitive Linear Quadratic Gaussian Control","authors":"Leilei Cui, Zhong-Ping Jiang","doi":"10.48550/arXiv.2212.02072","DOIUrl":"https://doi.org/10.48550/arXiv.2212.02072","url":null,"abstract":"This paper proposes a novel robust reinforcement learning framework for discrete-time systems with model mismatch that may arise from the sim2real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy iteration algorithm is proposed to generate a robust optimal controller. The dual-loop policy iteration algorithm is shown to be globally exponentially and uniformly convergent, and robust against disturbance during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy iteration algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy iteration algorithm is proposed for the same class of dynamical system with additive Gaussian noise. Finally, numerical examples are provided for the demonstration of the proposed algorithm.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"412 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127598683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Distributionally Robust Lyapunov Function Search Under Uncertainty 不确定性下分布鲁棒Lyapunov函数搜索
Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01554
Kehan Long, Yinzhuang Yi, J. Cortés, Nikolay A. Atanasov
This paper develops methods for proving Lyapunov stability of dynamical systems subject to disturbances with an unknown distribution. We assume only a finite set of disturbance samples is available and that the true online disturbance realization may be drawn from a different distribution than the given samples. We formulate an optimization problem to search for a sum-of-squares (SOS) Lyapunov function and introduce a distributionally robust version of the Lyapunov function derivative constraint. We show that this constraint may be reformulated as several SOS constraints, ensuring that the search for a Lyapunov function remains in the class of SOS polynomial optimization problems. For general systems, we provide a distributionally robust chance-constrained formulation for neural network Lyapunov function search. Simulations demonstrate the validity and efficiency of either formulation on non-linear uncertain dynamical systems.
本文发展了一类具有未知分布扰动的动力系统的李雅普诺夫稳定性的证明方法。我们假设只有一组有限的干扰样本可用,并且真正的在线干扰实现可能来自与给定样本不同的分布。我们提出了一个搜索平方和(SOS) Lyapunov函数的优化问题,并引入了Lyapunov函数导数约束的分布鲁棒版本。我们证明了这个约束可以被重新表述为几个SOS约束,以确保对Lyapunov函数的搜索仍然是SOS多项式优化问题。对于一般系统,我们给出了神经网络李雅普诺夫函数搜索的分布鲁棒机会约束公式。仿真验证了两种方法在非线性不确定动力系统上的有效性和有效性。
{"title":"Distributionally Robust Lyapunov Function Search Under Uncertainty","authors":"Kehan Long, Yinzhuang Yi, J. Cortés, Nikolay A. Atanasov","doi":"10.48550/arXiv.2212.01554","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01554","url":null,"abstract":"This paper develops methods for proving Lyapunov stability of dynamical systems subject to disturbances with an unknown distribution. We assume only a finite set of disturbance samples is available and that the true online disturbance realization may be drawn from a different distribution than the given samples. We formulate an optimization problem to search for a sum-of-squares (SOS) Lyapunov function and introduce a distributionally robust version of the Lyapunov function derivative constraint. We show that this constraint may be reformulated as several SOS constraints, ensuring that the search for a Lyapunov function remains in the class of SOS polynomial optimization problems. For general systems, we provide a distributionally robust chance-constrained formulation for neural network Lyapunov function search. Simulations demonstrate the validity and efficiency of either formulation on non-linear uncertain dynamical systems.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117123010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Policy Learning for Active Target Tracking over Continuous SE(3) Trajectories 连续SE(3)轨迹上主动目标跟踪的策略学习
Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01498
Pengzhi Yang, Shumon Koga, Arash Asgharivaskasi, Nikolay A. Atanasov
This paper proposes a novel model-based policy gradient algorithm for tracking dynamic targets using a mobile robot, equipped with an onboard sensor with limited field of view. The task is to obtain a continuous control policy for the mobile robot to collect sensor measurements that reduce uncertainty in the target states, measured by the target distribution entropy. We design a neural network control policy with the robot $SE(3)$ pose and the mean vector and information matrix of the joint target distribution as inputs and attention layers to handle variable numbers of targets. We also derive the gradient of the target entropy with respect to the network parameters explicitly, allowing efficient model-based policy gradient optimization.
本文提出了一种基于模型的策略梯度算法,用于移动机器人在有限视场条件下的动态目标跟踪。任务是获得移动机器人的连续控制策略,以收集传感器测量值,减少目标状态的不确定性,由目标分布熵测量。我们设计了一个神经网络控制策略,以机器人$SE(3)$位姿和联合目标分布的均值向量和信息矩阵作为输入和注意层来处理可变数量的目标。我们还明确地推导了目标熵相对于网络参数的梯度,从而实现了有效的基于模型的策略梯度优化。
{"title":"Policy Learning for Active Target Tracking over Continuous SE(3) Trajectories","authors":"Pengzhi Yang, Shumon Koga, Arash Asgharivaskasi, Nikolay A. Atanasov","doi":"10.48550/arXiv.2212.01498","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01498","url":null,"abstract":"This paper proposes a novel model-based policy gradient algorithm for tracking dynamic targets using a mobile robot, equipped with an onboard sensor with limited field of view. The task is to obtain a continuous control policy for the mobile robot to collect sensor measurements that reduce uncertainty in the target states, measured by the target distribution entropy. We design a neural network control policy with the robot $SE(3)$ pose and the mean vector and information matrix of the joint target distribution as inputs and attention layers to handle variable numbers of targets. We also derive the gradient of the target entropy with respect to the network parameters explicitly, allowing efficient model-based policy gradient optimization.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115616165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Online Estimation of the Koopman Operator Using Fourier Features 基于傅立叶特征的Koopman算子在线估计
Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01503
Tahiya Salam, Alice K. Li, M. Hsieh
Transfer operators offer linear representations and global, physically meaningful features of nonlinear dynamical systems. Discovering transfer operators, such as the Koopman operator, require careful crafted dictionaries of observables, acting on states of the dynamical system. This is ad hoc and requires the full dataset for evaluation. In this paper, we offer an optimization scheme to allow joint learning of the observables and Koopman operator with online data. Our results show we are able to reconstruct the evolution and represent the global features of complex dynamical systems.
传递算子提供了非线性动力系统的线性表示和全局的、物理上有意义的特征。发现转移算子,如库普曼算子,需要精心制作的可观察对象字典,作用于动力系统的状态。这是临时的,需要完整的数据集进行评估。在本文中,我们提出了一种优化方案,允许在线数据的可观测值和Koopman算子的联合学习。我们的研究结果表明,我们能够重构复杂动力系统的演化过程,并表现出复杂动力系统的全局特征。
{"title":"Online Estimation of the Koopman Operator Using Fourier Features","authors":"Tahiya Salam, Alice K. Li, M. Hsieh","doi":"10.48550/arXiv.2212.01503","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01503","url":null,"abstract":"Transfer operators offer linear representations and global, physically meaningful features of nonlinear dynamical systems. Discovering transfer operators, such as the Koopman operator, require careful crafted dictionaries of observables, acting on states of the dynamical system. This is ad hoc and requires the full dataset for evaluation. In this paper, we offer an optimization scheme to allow joint learning of the observables and Koopman operator with online data. Our results show we are able to reconstruct the evolution and represent the global features of complex dynamical systems.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"99 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113993793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CT-DQN: Control-Tutored Deep Reinforcement Learning CT-DQN:控制辅导深度强化学习
Pub Date : 2022-12-02 DOI: 10.48550/arXiv.2212.01343
F. D. Lellis, M. Coraggio, G. Russo, Mirco Musolesi, M. Bernardo
One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.
深度强化学习用于控制的主要挑战之一是需要大量的训练来学习策略。基于此,我们提出了控制辅导深度q网络(CT-DQN)算法的设计,这是一种深度强化学习算法,利用控制辅导,即外生控制律来减少学习时间。导师可以使用系统的近似模型来设计,而不需要对系统动力学知识有任何假设。如果单独使用,不期望它能够实现控制目标。在学习过程中,导师偶尔会提出一个动作,从而部分地指导探索。我们在OpenAI Gym的三个场景中验证了我们的方法:倒立摆、月球着陆器和赛车。我们证明了CT-DQN能够相对于经典的函数近似解实现更好或等效的数据效率。
{"title":"CT-DQN: Control-Tutored Deep Reinforcement Learning","authors":"F. D. Lellis, M. Coraggio, G. Russo, Mirco Musolesi, M. Bernardo","doi":"10.48550/arXiv.2212.01343","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01343","url":null,"abstract":"One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130017767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Conference on Learning for Dynamics & Control
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1