首页 > 最新文献

Conference on Learning for Dynamics & Control最新文献

英文 中文
Agile Catching with Whole-Body MPC and Blackbox Policy Learning 敏捷捕获与全身MPC和黑盒策略学习
Pub Date : 2023-06-14 DOI: 10.48550/arXiv.2306.08205
Saminda Abeyruwan, A. Bewley, Nicholas M. Boffi, K. Choromanski, David B. D'Ambrosio, Deepali Jain, P. Sanketi, A. Shankar, Vikas Sindhwani, Sumeet Singh, J. Slotine, Stephen Tu
We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance trade-offs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and whole-body multimodality via extensive on-hardware experiments. We conclude with proposals on fusing"classical"and"learning-based"techniques for agile robot control. Videos of our experiments may be found at https://sites.google.com/view/agile-catching
我们解决了敏捷机器人中的一个基准任务:捕捉高速抛出的物体。这是一项具有挑战性的任务,涉及跟踪,拦截和抱起投掷的物体,只能通过对物体的视觉观察和机器人的本体感觉状态,所有这些都在几分之一秒内完成。我们提出了两种根本不同的解决策略的相对优点:(i)使用加速约束轨迹优化的模型预测控制,以及(ii)使用零阶优化的强化学习。通过广泛的硬件实验,我们提供了各种性能权衡的见解,包括样本效率,模拟到真实的转移,对分布转移的鲁棒性和全身多模态。最后,我们提出了融合“经典”和“基于学习”的敏捷机器人控制技术的建议。我们的实验视频可以在https://sites.google.com/view/agile-catching上找到
{"title":"Agile Catching with Whole-Body MPC and Blackbox Policy Learning","authors":"Saminda Abeyruwan, A. Bewley, Nicholas M. Boffi, K. Choromanski, David B. D'Ambrosio, Deepali Jain, P. Sanketi, A. Shankar, Vikas Sindhwani, Sumeet Singh, J. Slotine, Stephen Tu","doi":"10.48550/arXiv.2306.08205","DOIUrl":"https://doi.org/10.48550/arXiv.2306.08205","url":null,"abstract":"We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance trade-offs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and whole-body multimodality via extensive on-hardware experiments. We conclude with proposals on fusing\"classical\"and\"learning-based\"techniques for agile robot control. Videos of our experiments may be found at https://sites.google.com/view/agile-catching","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132272534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Time Dependent Inverse Optimal Control using Trigonometric Basis Functions 基于三角基函数的时变逆最优控制
Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02820
Rahel Rickenbach, Elena Arcari, M. Zeilinger
The choice of objective is critical for the performance of an optimal controller. When control requirements vary during operation, e.g. due to changes in the environment with which the system is interacting, these variations should be reflected in the cost function. In this paper we consider the problem of identifying a time dependent cost function from given trajectories. We propose a strategy for explicitly representing time dependency in the cost function, i.e. decomposing it into the product of an unknown time dependent parameter vector and a known state and input dependent vector, modelling the former via a linear combination of trigonometric basis functions. These are incorporated within an inverse optimal control framework that uses the Karush-Kuhn-Tucker (KKT) conditions for ensuring optimality, and allows for formulating an optimization problem with respect to a finite set of basis function hyperparameters. Results are shown for two systems in simulation and evaluated against state-of-the-art approaches.
目标的选择对最优控制器的性能至关重要。当控制要求在运行过程中发生变化时,例如由于与系统相互作用的环境发生变化,这些变化应反映在成本函数中。在本文中,我们考虑从给定轨迹中识别时间相关成本函数的问题。我们提出了一种显式表示成本函数中时间依赖性的策略,即将其分解为未知时间相关参数向量与已知状态和输入相关向量的乘积,通过三角基函数的线性组合对前者进行建模。这些都包含在一个逆最优控制框架中,该框架使用Karush-Kuhn-Tucker (KKT)条件来确保最优性,并允许制定一个关于有限基函数超参数集的优化问题。结果显示了两个系统的模拟和评估对最先进的方法。
{"title":"Time Dependent Inverse Optimal Control using Trigonometric Basis Functions","authors":"Rahel Rickenbach, Elena Arcari, M. Zeilinger","doi":"10.48550/arXiv.2306.02820","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02820","url":null,"abstract":"The choice of objective is critical for the performance of an optimal controller. When control requirements vary during operation, e.g. due to changes in the environment with which the system is interacting, these variations should be reflected in the cost function. In this paper we consider the problem of identifying a time dependent cost function from given trajectories. We propose a strategy for explicitly representing time dependency in the cost function, i.e. decomposing it into the product of an unknown time dependent parameter vector and a known state and input dependent vector, modelling the former via a linear combination of trigonometric basis functions. These are incorporated within an inverse optimal control framework that uses the Karush-Kuhn-Tucker (KKT) conditions for ensuring optimality, and allows for formulating an optimization problem with respect to a finite set of basis function hyperparameters. Results are shown for two systems in simulation and evaluated against state-of-the-art approaches.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127570774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning 安全多智能体强化学习的可证明高效广义拉格朗日策略优化
Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2306.00212
Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovi'c
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.
我们使用约束马尔可夫博弈来检验在线安全多智能体强化学习,其中智能体在期望总效用的约束下通过最大化其期望总奖励来竞争。我们的重点局限于一个情景二人零和约束马尔可夫博弈,具有独立的转移函数(未知的代理)、对抗奖励函数和随机效用函数。对于这样的马尔可夫博弈,我们采用基于占用度量的方法将其表述为带有显式约束的在线约束鞍点问题。通过建立一个具有极大极小决策原变量和对偶变量的广义拉格朗日函数,将约束优化中的拉格朗日乘子方法推广到处理约束问题。接下来,我们开发了一种上置信度强化学习算法来解决这个拉格朗日问题,同时平衡了探索和利用。我们的算法通过在线镜像下降更新极大极小决策原始变量,通过投影梯度步进更新对偶变量,我们证明了它在玩了$T$集的游戏后,对于后悔和约束违反都具有次线性速率$ O((|X|+|Y|) L sqrt{T(|A|+|B|)}))$。这里,$L$是每个情节的视界,$(|X|,|A|)$和$(|Y|,|B|)$分别是最小玩家和最大玩家的状态/行动空间大小。据我们所知,我们在约束马尔可夫博弈中提供了第一个可证明有效的在线安全强化学习算法。
{"title":"Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning","authors":"Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovi'c","doi":"10.48550/arXiv.2306.00212","DOIUrl":"https://doi.org/10.48550/arXiv.2306.00212","url":null,"abstract":"We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134306833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts 黑盒与灰盒:基于旋转和冲击的乒乓球运动轨迹预测学习案例研究
Pub Date : 2023-05-24 DOI: 10.48550/arXiv.2305.15189
Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler
In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approaches, which are not supplied with physical prior knowledge. We demonstrate that initializing the spin from parameters of the ball launcher using a neural network drastically improves long-time prediction performance over estimating the spin purely from measured ball positions. An accurate prediction of the ball trajectory is crucial for successful returns. We therefore evaluate the return performance with a pneumatic artificial muscular robot and achieve a return rate of 29/30 (97.7%).
本文提出了一种乒乓球运动轨迹滤波和预测方法。我们的灰盒方法建立在物理模型之上。同时,我们使用数据来学习动力学模型的参数,扩展卡尔曼滤波器的参数,以及推断球初始条件的神经模型的参数。我们证明了我们的方法优于两种不提供物理先验知识的黑盒方法的预测性能。我们证明,使用神经网络从球发射器的参数初始化自旋,比纯粹从测量的球位置估计自旋大大提高了长期预测性能。准确预测球的运动轨迹对成功回击至关重要。因此,我们使用气动人工肌肉机器人评估返回性能,并实现了29/30(97.7%)的返回率。
{"title":"Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts","authors":"Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler","doi":"10.48550/arXiv.2305.15189","DOIUrl":"https://doi.org/10.48550/arXiv.2305.15189","url":null,"abstract":"In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approaches, which are not supplied with physical prior knowledge. We demonstrate that initializing the spin from parameters of the ball launcher using a neural network drastically improves long-time prediction performance over estimating the spin purely from measured ball positions. An accurate prediction of the ball trajectory is crucial for successful returns. We therefore evaluate the return performance with a pneumatic artificial muscular robot and achieve a return rate of 29/30 (97.7%).","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129488281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based Validation as Probabilistic Inference 基于模型的验证作为概率推理
Pub Date : 2023-05-17 DOI: 10.48550/arXiv.2305.09930
Harrison Delecki, Anthony Corso, Mykel J. Kochenderfer
Estimating the distribution over failures is a key step in validating autonomous systems. Existing approaches focus on finding failures for a small range of initial conditions or make restrictive assumptions about the properties of the system under test. We frame estimating the distribution over failure trajectories for sequential systems as Bayesian inference. Our model-based approach represents the distribution over failure trajectories using rollouts of system dynamics and computes trajectory gradients using automatic differentiation. Our approach is demonstrated in an inverted pendulum control system, an autonomous vehicle driving scenario, and a partially observable lunar lander. Sampling is performed using an off-the-shelf implementation of Hamiltonian Monte Carlo with multiple chains to capture multimodality and gradient smoothing for safe trajectories. In all experiments, we observed improvements in sample efficiency and parameter space coverage compared to black-box baseline approaches. This work is open sourced.
估计故障分布是验证自治系统的关键步骤。现有的方法侧重于在小范围的初始条件下发现故障,或者对被测系统的特性做出限制性假设。我们将序列系统故障轨迹分布的估计框架为贝叶斯推理。我们基于模型的方法使用系统动力学的展开来表示故障轨迹上的分布,并使用自动微分计算轨迹梯度。我们的方法在倒立摆控制系统、自动驾驶车辆场景和部分可观测的月球着陆器中得到了验证。采样使用一个现成的哈密顿蒙特卡罗实现与多链来捕获多模态和梯度平滑的安全轨迹。在所有实验中,我们都观察到与黑盒基线方法相比,样本效率和参数空间覆盖率有所提高。这项工作是开源的。
{"title":"Model-based Validation as Probabilistic Inference","authors":"Harrison Delecki, Anthony Corso, Mykel J. Kochenderfer","doi":"10.48550/arXiv.2305.09930","DOIUrl":"https://doi.org/10.48550/arXiv.2305.09930","url":null,"abstract":"Estimating the distribution over failures is a key step in validating autonomous systems. Existing approaches focus on finding failures for a small range of initial conditions or make restrictive assumptions about the properties of the system under test. We frame estimating the distribution over failure trajectories for sequential systems as Bayesian inference. Our model-based approach represents the distribution over failure trajectories using rollouts of system dynamics and computes trajectory gradients using automatic differentiation. Our approach is demonstrated in an inverted pendulum control system, an autonomous vehicle driving scenario, and a partially observable lunar lander. Sampling is performed using an off-the-shelf implementation of Hamiltonian Monte Carlo with multiple chains to capture multimodality and gradient smoothing for safe trajectories. In all experiments, we observed improvements in sample efficiency and parameter space coverage compared to black-box baseline approaches. This work is open sourced.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128146345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control 分布式事件触发控制的多智能体强化学习研究
Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.08723
Lukas Kesper, Sebastian Trimpe, Dominik Baumann
Event-triggered communication and control provide high control performance in networked control systems without overloading the communication network. However, most approaches require precise mathematical models of the system dynamics, which may not always be available. Model-free learning of communication and control policies provides an alternative. Nevertheless, existing methods typically consider single-agent settings. This paper proposes a model-free reinforcement learning algorithm that jointly learns resource-aware communication and control policies for distributed multi-agent systems from data. We evaluate the algorithm in a high-dimensional and nonlinear simulation example and discuss promising avenues for further research.
事件触发通信和控制在网络控制系统中提供了高控制性能,而不会使通信网络过载。然而,大多数方法需要精确的系统动力学数学模型,这可能并不总是可用的。通信和控制策略的无模型学习提供了另一种选择。然而,现有的方法通常考虑单代理设置。本文提出了一种无模型强化学习算法,该算法从数据中共同学习分布式多智能体系统的资源感知通信和控制策略。我们在一个高维和非线性的仿真例子中评估了该算法,并讨论了进一步研究的有前途的途径。
{"title":"Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control","authors":"Lukas Kesper, Sebastian Trimpe, Dominik Baumann","doi":"10.48550/arXiv.2305.08723","DOIUrl":"https://doi.org/10.48550/arXiv.2305.08723","url":null,"abstract":"Event-triggered communication and control provide high control performance in networked control systems without overloading the communication network. However, most approaches require precise mathematical models of the system dynamics, which may not always be available. Model-free learning of communication and control policies provides an alternative. Nevertheless, existing methods typically consider single-agent settings. This paper proposes a model-free reinforcement learning algorithm that jointly learns resource-aware communication and control policies for distributed multi-agent systems from data. We evaluate the algorithm in a high-dimensional and nonlinear simulation example and discuss promising avenues for further research.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125473694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equilibria of Fully Decentralized Learning in Networked Systems 网络系统中完全分散学习的均衡
Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.09002
Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cort'es
Existing settings of decentralized learning either require players to have full information or the system to have certain special structure that may be hard to check and hinder their applicability to practical systems. To overcome this, we identify a structure that is simple to check for linear dynamical system, where each player learns in a fully decentralized fashion to minimize its cost. We first establish the existence of pure strategy Nash equilibria in the resulting noncooperative game. We then conjecture that the Nash equilibrium is unique provided that the system satisfies an additional requirement on its structure. We also introduce a decentralized mechanism based on projected gradient descent to have agents learn the Nash equilibrium. Simulations on a $5$-player game validate our results.
现有的分散学习设置要么要求玩家拥有完整的信息,要么要求系统具有某些难以检查的特殊结构,从而阻碍其在实际系统中的适用性。为了克服这个问题,我们确定了一个简单的线性动态系统检查结构,其中每个参与者以完全分散的方式学习以最小化其成本。首先在非合作对策中建立了纯策略纳什均衡的存在性。然后我们推测,如果系统满足其结构上的附加要求,纳什均衡是唯一的。我们还引入了一种基于投影梯度下降的分散机制,使智能体学习纳什均衡。在一款5美元玩家游戏上的模拟验证了我们的结果。
{"title":"Equilibria of Fully Decentralized Learning in Networked Systems","authors":"Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cort'es","doi":"10.48550/arXiv.2305.09002","DOIUrl":"https://doi.org/10.48550/arXiv.2305.09002","url":null,"abstract":"Existing settings of decentralized learning either require players to have full information or the system to have certain special structure that may be hard to check and hinder their applicability to practical systems. To overcome this, we identify a structure that is simple to check for linear dynamical system, where each player learns in a fully decentralized fashion to minimize its cost. We first establish the existence of pure strategy Nash equilibria in the resulting noncooperative game. We then conjecture that the Nash equilibrium is unique provided that the system satisfies an additional requirement on its structure. We also introduce a decentralized mechanism based on projected gradient descent to have agents learn the Nash equilibrium. Simulations on a $5$-player game validate our results.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131317997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Template-Based Piecewise Affine Regression 基于模板的分段仿射回归
Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.08686
Guillaume O. Berger, S. Sankaranarayanan
We investigate the problem of fitting piecewise affine functions (PWA) to data. Our algorithm divides the input domain into finitely many polyhedral regions whose shapes are specified using a user-defined template such that the data points in each region are fit by an affine function within a desired error bound. We first prove that this problem is NP-hard. Next, we present a top-down algorithm that considers subsets of the overall data set in a systematic manner, trying to fit an affine function for each subset using linear regression. If regression fails on a subset, we extract a minimal set of points that led to a failure in order to split the original index set into smaller subsets. Using a combination of this top-down scheme and a set covering algorithm, we derive an overall approach that is optimal in terms of the number of pieces of the resulting PWA model. We demonstrate our approach on two numerical examples that include PWA approximations of a widely used nonlinear insulin--glucose regulation model and a double inverted pendulum with soft contacts.
研究了分段仿射函数(PWA)对数据的拟合问题。我们的算法将输入域划分为有限多个多面体区域,这些区域的形状使用用户自定义模板指定,使得每个区域中的数据点在期望的误差范围内由仿射函数拟合。我们首先证明了这个问题是np困难的。接下来,我们提出了一种自上而下的算法,该算法以系统的方式考虑整个数据集的子集,试图使用线性回归为每个子集拟合仿射函数。如果回归在一个子集上失败,我们提取导致失败的最小点集,以便将原始索引集分成更小的子集。结合使用这种自上而下的方案和集合覆盖算法,我们得出了一种总体方法,该方法在得到的PWA模型的片段数量方面是最优的。我们通过两个数值例子展示了我们的方法,其中包括广泛使用的非线性胰岛素-葡萄糖调节模型的PWA近似和具有软接触的双倒立摆。
{"title":"Template-Based Piecewise Affine Regression","authors":"Guillaume O. Berger, S. Sankaranarayanan","doi":"10.48550/arXiv.2305.08686","DOIUrl":"https://doi.org/10.48550/arXiv.2305.08686","url":null,"abstract":"We investigate the problem of fitting piecewise affine functions (PWA) to data. Our algorithm divides the input domain into finitely many polyhedral regions whose shapes are specified using a user-defined template such that the data points in each region are fit by an affine function within a desired error bound. We first prove that this problem is NP-hard. Next, we present a top-down algorithm that considers subsets of the overall data set in a systematic manner, trying to fit an affine function for each subset using linear regression. If regression fails on a subset, we extract a minimal set of points that led to a failure in order to split the original index set into smaller subsets. Using a combination of this top-down scheme and a set covering algorithm, we derive an overall approach that is optimal in terms of the number of pieces of the resulting PWA model. We demonstrate our approach on two numerical examples that include PWA approximations of a widely used nonlinear insulin--glucose regulation model and a double inverted pendulum with soft contacts.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generalizable Physics-informed Learning Framework for Risk Probability Estimation 风险概率估计的可推广物理知识学习框架
Pub Date : 2023-05-10 DOI: 10.48550/arXiv.2305.06432
Zhuoyuan Wang, Yorie Nakahira
Accurate estimates of long-term risk probabilities and their gradients are critical for many stochastic safe control methods. However, computing such risk probabilities in real-time and in unseen or changing environments is challenging. Monte Carlo (MC) methods cannot accurately evaluate the probabilities and their gradients as an infinitesimal devisor can amplify the sampling noise. In this paper, we develop an efficient method to evaluate the probabilities of long-term risk and their gradients. The proposed method exploits the fact that long-term risk probability satisfies certain partial differential equations (PDEs), which characterize the neighboring relations between the probabilities, to integrate MC methods and physics-informed neural networks. We provide theoretical guarantees of the estimation error given certain choices of training configurations. Numerical results show the proposed method has better sample efficiency, generalizes well to unseen regions, and can adapt to systems with changing parameters. The proposed method can also accurately estimate the gradients of risk probabilities, which enables first- and second-order techniques on risk probabilities to be used for learning and control.
准确估计长期风险概率及其梯度对许多随机安全控制方法至关重要。然而,在不可见或不断变化的环境中实时计算此类风险概率具有挑战性。蒙特卡罗(MC)方法不能准确地计算概率及其梯度,因为它是一个无穷小的设计器,会放大采样噪声。本文提出了一种评估长期风险概率及其梯度的有效方法。该方法利用长期风险概率满足一定的偏微分方程(PDEs)这一事实,将MC方法与物理信息神经网络相结合。在给定训练配置的情况下,我们提供了估计误差的理论保证。数值结果表明,该方法具有较好的采样效率,对未知区域有较好的泛化能力,能够适应参数变化的系统。该方法还可以准确地估计风险概率的梯度,从而使风险概率的一阶和二阶技术可以用于学习和控制。
{"title":"A Generalizable Physics-informed Learning Framework for Risk Probability Estimation","authors":"Zhuoyuan Wang, Yorie Nakahira","doi":"10.48550/arXiv.2305.06432","DOIUrl":"https://doi.org/10.48550/arXiv.2305.06432","url":null,"abstract":"Accurate estimates of long-term risk probabilities and their gradients are critical for many stochastic safe control methods. However, computing such risk probabilities in real-time and in unseen or changing environments is challenging. Monte Carlo (MC) methods cannot accurately evaluate the probabilities and their gradients as an infinitesimal devisor can amplify the sampling noise. In this paper, we develop an efficient method to evaluate the probabilities of long-term risk and their gradients. The proposed method exploits the fact that long-term risk probability satisfies certain partial differential equations (PDEs), which characterize the neighboring relations between the probabilities, to integrate MC methods and physics-informed neural networks. We provide theoretical guarantees of the estimation error given certain choices of training configurations. Numerical results show the proposed method has better sample efficiency, generalizes well to unseen regions, and can adapt to systems with changing parameters. The proposed method can also accurately estimate the gradients of risk probabilities, which enables first- and second-order techniques on risk probabilities to be used for learning and control.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback 约束集几何性质对强盗反馈安全优化的影响
Pub Date : 2023-05-01 DOI: 10.48550/arXiv.2305.00889
Spencer Hutchinson, Berkay Turan, M. Alizadeh
We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.
我们考虑了一个具有强盗反馈的安全优化问题,其中智能体顺序地选择行动并观察来自环境的响应,其目标是在尊重阶段约束的同时最大化响应的任意函数。针对这一问题提出了一种算法,并研究了约束集的几何性质对算法的遗憾率的影响。为了做到这一点,我们引入了特定约束集的清晰度的概念,它表征了在不确定设置的约束集内执行学习的难度。这种清晰度的概念使我们能够识别约束集的类别,对于这些约束集,所提出的算法保证享有次线性后悔。该算法的仿真结果支持次线性遗憾界,并提供了约束集的清晰度影响算法性能的经验证据。
{"title":"The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback","authors":"Spencer Hutchinson, Berkay Turan, M. Alizadeh","doi":"10.48550/arXiv.2305.00889","DOIUrl":"https://doi.org/10.48550/arXiv.2305.00889","url":null,"abstract":"We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130736905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Conference on Learning for Dynamics & Control
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1