首页 > 最新文献

Conference on Learning for Dynamics & Control最新文献

英文 中文
Model Predictive Control via On-Policy Imitation Learning 基于策略模仿学习的模型预测控制
Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.09206
Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, A. Jadbabaie
In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying samples from an expert. Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system. Behavior cloning, however, is a method that is known to be data inefficient and suffer from distribution shifts. As an alternative, we develop a variant of the forward training algorithm which is an on-policy imitation learning method proposed by Ross et al. (2010). Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance. We validate our results through simulations and show that the forward training algorithm is indeed superior to behavior cloning when applied to MPC.
在本文中,我们利用模仿学习的快速发展,这是最近强化学习(RL)文献中一个备受关注的话题,为约束线性系统的数据驱动模型预测控制(MPC)开发新的样本复杂性结果和性能保证。在最简单的形式中,模仿学习是一种通过向专家查询样本来学习专家策略的方法。最近的数据驱动MPC方法使用了最简单的模仿学习形式,即行为克隆,通过对闭环MPC系统的轨迹进行在线采样来学习模拟MPC性能的控制器。然而,行为克隆是一种众所周知的数据效率低下且受分布变化影响的方法。作为替代方案,我们开发了一种前向训练算法的变体,即Ross等人(2010)提出的策略模仿学习方法。我们的算法使用约束线性MPC的结构,我们的分析使用显式MPC解决方案的特性,从理论上约束实现最佳性能所需的在线MPC轨迹的数量。我们通过仿真验证了我们的结果,并表明前向训练算法在应用于MPC时确实优于行为克隆。
{"title":"Model Predictive Control via On-Policy Imitation Learning","authors":"Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, A. Jadbabaie","doi":"10.48550/arXiv.2210.09206","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09206","url":null,"abstract":"In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying samples from an expert. Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system. Behavior cloning, however, is a method that is known to be data inefficient and suffer from distribution shifts. As an alternative, we develop a variant of the forward training algorithm which is an on-policy imitation learning method proposed by Ross et al. (2010). Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance. We validate our results through simulations and show that the forward training algorithm is indeed superior to behavior cloning when applied to MPC.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":" 36","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Learning and Control Perspective for Microfinance 小额信贷的学习与控制视角
Pub Date : 2022-07-26 DOI: 10.48550/arXiv.2207.12631
Christian Kurniawan, Xiyu Deng, Adhiraj Chakraborty, A. Gueye, Niangjun Chen, Yorie Nakahira
Microfinance, despite its significant potential for poverty reduction, is facing sustainability hardships due to high default rates. Although many methods in regular finance can estimate credit scores and default probabilities, these methods are not directly applicable to microfinance due to the following unique characteristics: a) under-explored (developing) areas such as rural Africa do not have sufficient prior loan data for microfinance institutions (MFIs) to establish a credit scoring system; b) microfinance applicants may have difficulty providing sufficient information for MFIs to accurately predict default probabilities; and c) many MFIs use group liability (instead of collateral) to secure repayment. Here, we present a novel control-theoretic model of microfinance that accounts for these characteristics. We construct an algorithm to learn microfinance decision policies that achieve financial inclusion, fairness, social welfare, and sustainability. We characterize the convergence conditions to Pareto-optimum and the convergence speeds. We demonstrate, in numerous real and synthetic datasets, that the proposed method accounts for the complexities induced by group liability to produce robust decisions before sufficient loans are given to establish credit scoring systems and for applicants whose default probability cannot be accurately estimated due to missing information. To the best of our knowledge, this paper is the first to connect microfinance and control theory. We envision that the connection will enable safe learning and control techniques to help modernize microfinance and alleviate poverty.
小额信贷尽管具有巨大的减贫潜力,但由于违约率高,其可持续性面临困难。尽管常规金融中的许多方法可以估计信用评分和违约概率,但由于以下独特的特点,这些方法并不直接适用于小额信贷:a)未开发(发展中)地区,如非洲农村,没有足够的前期贷款数据供小额信贷机构(mfi)建立信用评分系统;b)小额信贷申请人可能难以为小额信贷机构提供足够的信息,以准确预测违约概率;c)许多小额信贷机构使用集团负债(而不是抵押品)来确保还款。在这里,我们提出了一个新的小额信贷控制理论模型,该模型考虑了这些特征。我们构建了一个算法来学习实现金融普惠、公平、社会福利和可持续性的小额信贷决策政策。我们将收敛条件和收敛速度刻画为pareto最优。我们在大量真实和合成数据集中证明,所提出的方法考虑了群体责任引起的复杂性,在提供足够的贷款以建立信用评分系统之前,以及由于信息缺失而无法准确估计违约概率的申请人产生稳健决策。据我们所知,本文是第一个将小额信贷与控制理论联系起来的论文。我们设想,这种联系将使安全的学习和控制技术成为可能,从而帮助实现小额信贷的现代化和减轻贫困。
{"title":"A Learning and Control Perspective for Microfinance","authors":"Christian Kurniawan, Xiyu Deng, Adhiraj Chakraborty, A. Gueye, Niangjun Chen, Yorie Nakahira","doi":"10.48550/arXiv.2207.12631","DOIUrl":"https://doi.org/10.48550/arXiv.2207.12631","url":null,"abstract":"Microfinance, despite its significant potential for poverty reduction, is facing sustainability hardships due to high default rates. Although many methods in regular finance can estimate credit scores and default probabilities, these methods are not directly applicable to microfinance due to the following unique characteristics: a) under-explored (developing) areas such as rural Africa do not have sufficient prior loan data for microfinance institutions (MFIs) to establish a credit scoring system; b) microfinance applicants may have difficulty providing sufficient information for MFIs to accurately predict default probabilities; and c) many MFIs use group liability (instead of collateral) to secure repayment. Here, we present a novel control-theoretic model of microfinance that accounts for these characteristics. We construct an algorithm to learn microfinance decision policies that achieve financial inclusion, fairness, social welfare, and sustainability. We characterize the convergence conditions to Pareto-optimum and the convergence speeds. We demonstrate, in numerous real and synthetic datasets, that the proposed method accounts for the complexities induced by group liability to produce robust decisions before sufficient loans are given to establish credit scoring systems and for applicants whose default probability cannot be accurately estimated due to missing information. To the best of our knowledge, this paper is the first to connect microfinance and control theory. We envision that the connection will enable safe learning and control techniques to help modernize microfinance and alleviate poverty.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115820400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MyoSuite: A Contact-rich Simulation Suite for Musculoskeletal Motor Control MyoSuite:用于肌肉骨骼运动控制的丰富接触仿真套件
Pub Date : 2022-05-26 DOI: 10.48550/arXiv.2205.13600
V. Caggiano, Huawei Wang, G. Durandau, Massimo Sartori, Vikash Kumar
Embodied agents in continuous control domains have had limited exposure to tasks allowing to explore musculoskeletal properties that enable agile and nimble behaviors in biological beings. The sophistication behind neuro-musculoskeletal control can pose new challenges for the motor learning community. At the same time, agents solving complex neural control problems allow impact in fields such as neuro-rehabilitation, as well as collaborative-robotics. Human biomechanics underlies complex multi-joint-multi-actuator musculoskeletal systems. The sensory-motor system relies on a range of sensory-contact rich and proprioceptive inputs that define and condition muscle actuation required to exhibit intelligent behaviors in the physical world. Current frameworks for musculoskeletal control do not support physiological sophistication of the musculoskeletal systems along with physical world interaction capabilities. In addition, they are neither embedded in complex and skillful motor tasks nor are computationally effective and scalable to study large-scale learning paradigms. Here, we present MyoSuite -- a suite of physiologically accurate biomechanical models of elbow, wrist, and hand, with physical contact capabilities, which allow learning of complex and skillful contact-rich real-world tasks. We provide diverse motor-control challenges: from simple postural control to skilled hand-object interactions such as turning a key, twirling a pen, rotating two balls in one hand, etc. By supporting physiological alterations in musculoskeletal geometry (tendon transfer), assistive devices (exoskeleton assistance), and muscle contraction dynamics (muscle fatigue, sarcopenia), we present real-life tasks with temporal changes, thereby exposing realistic non-stationary conditions in our tasks which most continuous control benchmarks lack.
连续控制域中的具身代理在允许探索生物生物中敏捷和灵活行为的肌肉骨骼特性的任务中暴露有限。神经-肌肉-骨骼控制背后的复杂性可能给运动学习社区带来新的挑战。与此同时,解决复杂神经控制问题的智能体可以在神经康复和协作机器人等领域产生影响。人体生物力学是复杂的多关节-多致动器肌肉骨骼系统的基础。感觉-运动系统依赖于一系列丰富的感觉-接触和本体感受输入,这些输入定义和条件肌肉驱动需要在物理世界中表现出智能行为。目前的肌肉骨骼控制框架不支持肌肉骨骼系统的生理复杂性以及物理世界的交互能力。此外,它们既不能嵌入复杂和熟练的运动任务中,也不能在计算上有效和可扩展地研究大规模的学习范式。在这里,我们展示了MyoSuite——一套生理上准确的肘部、手腕和手的生物力学模型,具有物理接触能力,可以学习复杂和熟练的接触丰富的现实世界任务。我们提供各种各样的运动控制挑战:从简单的姿势控制到熟练的手-物交互,如转动钥匙,旋转笔,单手旋转两个球等。通过支持肌肉骨骼几何形状(肌腱转移)、辅助装置(外骨骼辅助)和肌肉收缩动力学(肌肉疲劳、肌肉减少症)的生理改变,我们呈现了具有时间变化的现实生活任务,从而揭示了我们任务中大多数连续控制基准所缺乏的现实非平稳条件。
{"title":"MyoSuite: A Contact-rich Simulation Suite for Musculoskeletal Motor Control","authors":"V. Caggiano, Huawei Wang, G. Durandau, Massimo Sartori, Vikash Kumar","doi":"10.48550/arXiv.2205.13600","DOIUrl":"https://doi.org/10.48550/arXiv.2205.13600","url":null,"abstract":"Embodied agents in continuous control domains have had limited exposure to tasks allowing to explore musculoskeletal properties that enable agile and nimble behaviors in biological beings. The sophistication behind neuro-musculoskeletal control can pose new challenges for the motor learning community. At the same time, agents solving complex neural control problems allow impact in fields such as neuro-rehabilitation, as well as collaborative-robotics. Human biomechanics underlies complex multi-joint-multi-actuator musculoskeletal systems. The sensory-motor system relies on a range of sensory-contact rich and proprioceptive inputs that define and condition muscle actuation required to exhibit intelligent behaviors in the physical world. Current frameworks for musculoskeletal control do not support physiological sophistication of the musculoskeletal systems along with physical world interaction capabilities. In addition, they are neither embedded in complex and skillful motor tasks nor are computationally effective and scalable to study large-scale learning paradigms. Here, we present MyoSuite -- a suite of physiologically accurate biomechanical models of elbow, wrist, and hand, with physical contact capabilities, which allow learning of complex and skillful contact-rich real-world tasks. We provide diverse motor-control challenges: from simple postural control to skilled hand-object interactions such as turning a key, twirling a pen, rotating two balls in one hand, etc. By supporting physiological alterations in musculoskeletal geometry (tendon transfer), assistive devices (exoskeleton assistance), and muscle contraction dynamics (muscle fatigue, sarcopenia), we present real-life tasks with temporal changes, thereby exposing realistic non-stationary conditions in our tasks which most continuous control benchmarks lack.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124828466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Robust Data-Driven Output Feedback Control via Bootstrapped Multiplicative Noise 基于自举乘性噪声的鲁棒数据驱动输出反馈控制
Pub Date : 2022-05-10 DOI: 10.48550/arXiv.2205.05119
Benjamin J. Gravell, I. Shames, T. Summers
We propose a robust data-driven output feedback control algorithm that explicitly incorporates inherent finite-sample model estimate uncertainties into the control design. The algorithm has three components: (1) a subspace identification nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method comprising a coupled optimal dynamic output feedback filter and controller with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. Moreover, the control design method accommodates a highly structured uncertainty representation that can capture uncertainty shape more effectively than existing approaches. We show through numerical experiments that the proposed robust data-driven output feedback controller can significantly outperform a certainty equivalent controller on various measures of sample complexity and stability robustness.
我们提出了一种鲁棒的数据驱动输出反馈控制算法,该算法明确地将固有的有限样本模型估计不确定性纳入控制设计。该算法由三个部分组成:(1)子空间识别标称模型估计器;(2)对标称模型估计的非渐近方差进行量化的自举重采样方法;(3)一种由最优动态输出反馈滤波器和带有乘性噪声的控制器耦合组成的非常规鲁棒控制设计方法。该方法的一个关键优点是系统识别和鲁棒控制设计过程都使用随机不确定性表示,因此实际固有的统计估计不确定性与鲁棒控制器设计所针对的不确定性直接一致。此外,控制设计方法适应高度结构化的不确定性表示,可以比现有方法更有效地捕获不确定性形状。我们通过数值实验表明,所提出的鲁棒数据驱动输出反馈控制器在样本复杂性和稳定性鲁棒性的各种度量上都明显优于确定性等效控制器。
{"title":"Robust Data-Driven Output Feedback Control via Bootstrapped Multiplicative Noise","authors":"Benjamin J. Gravell, I. Shames, T. Summers","doi":"10.48550/arXiv.2205.05119","DOIUrl":"https://doi.org/10.48550/arXiv.2205.05119","url":null,"abstract":"We propose a robust data-driven output feedback control algorithm that explicitly incorporates inherent finite-sample model estimate uncertainties into the control design. The algorithm has three components: (1) a subspace identification nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method comprising a coupled optimal dynamic output feedback filter and controller with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. Moreover, the control design method accommodates a highly structured uncertainty representation that can capture uncertainty shape more effectively than existing approaches. We show through numerical experiments that the proposed robust data-driven output feedback controller can significantly outperform a certainty equivalent controller on various measures of sample complexity and stability robustness.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125104042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation? 基础模型能否实现机器人操作的零射击任务规范?
Pub Date : 2022-04-23 DOI: 10.48550/arXiv.2204.11134
Yuchen Cui, S. Niekum, Abhi Gupta, Vikash Kumar, A. Rajeswaran
Task specification is at the core of programming autonomous robots. A low-effort modality for task specification is critical for engagement of non-expert end-users and ultimate adoption of personalized robot agents. A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene. The former is hard to interpret for non-experts and necessitates detailed state estimation and scene understanding. The latter requires the generation of desired goal image, which often requires a human to complete the task, defeating the purpose of having autonomous robots. In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use such as images obtained from the internet, hand sketches that provide a visual description of the desired task, or simple language descriptions. As a preliminary step towards this, we investigate the capabilities of large scale pre-trained models (foundation models) for zero-shot goal specification, and find promising results in a collection of simulated robot manipulation tasks and real-world datasets.
任务规范是自主机器人编程的核心。任务规范的低工作量模式对于非专业最终用户的参与和个性化机器人代理的最终采用至关重要。一种被广泛研究的任务规范方法是通过目标,使用压缩状态向量或来自同一机器人场景的目标图像。前者对于非专家来说很难解释,需要详细的状态估计和场景理解。后者需要生成期望的目标图像,这往往需要人类来完成任务,违背了拥有自主机器人的目的。在这项工作中,我们探索了目标规范的替代形式和更一般的形式,这些形式有望更容易被人类指定和使用,例如从互联网上获得的图像,提供所需任务的视觉描述的手绘草图,或简单的语言描述。作为实现这一目标的初步步骤,我们研究了大规模预训练模型(基础模型)用于零射击目标规范的能力,并在模拟机器人操作任务和现实世界数据集的集合中发现了有希望的结果。
{"title":"Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?","authors":"Yuchen Cui, S. Niekum, Abhi Gupta, Vikash Kumar, A. Rajeswaran","doi":"10.48550/arXiv.2204.11134","DOIUrl":"https://doi.org/10.48550/arXiv.2204.11134","url":null,"abstract":"Task specification is at the core of programming autonomous robots. A low-effort modality for task specification is critical for engagement of non-expert end-users and ultimate adoption of personalized robot agents. A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene. The former is hard to interpret for non-experts and necessitates detailed state estimation and scene understanding. The latter requires the generation of desired goal image, which often requires a human to complete the task, defeating the purpose of having autonomous robots. In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use such as images obtained from the internet, hand sketches that provide a visual description of the desired task, or simple language descriptions. As a preliminary step towards this, we investigate the capabilities of large scale pre-trained models (foundation models) for zero-shot goal specification, and find promising results in a collection of simulated robot manipulation tasks and real-world datasets.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116756614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Neural Gaits: Learning Bipedal Locomotion via Control Barrier Functions and Zero Dynamics Policies 神经步态:通过控制障碍函数和零动力学策略学习双足运动
Pub Date : 2022-04-18 DOI: 10.48550/arXiv.2204.08120
I. D. Rodriguez, Noel Csomay-Shanklin, Yisong Yue, A. Ames
This work presents Neural Gaits, a method for learning dynamic walking gaits through the enforce-ment of set invariance that can be refined episodically using experimental data from the robot. We frame walking as a set invariance problem enforceable via control barrier functions (CBFs) defined on the reduced-order dynamics quantifying the underactuated component of the robot: the zero dynamics. Our approach contains two learning modules: one for learning a policy that satisfies the CBF condition, and another for learning a residual dynamics model to refine imperfections of the nominal model. Importantly, learning only over the zero dynamics significantly reduces the dimensionality of the learning problem while using CBFs allows us to still make guarantees for the full-order system. The method is demonstrated experimentally on an underactuated bipedal robot, where we are able to show agile and dynamic locomotion, even with partially unknown dynamics.
这项工作提出了Neural Gaits,这是一种通过强化集合不变性来学习动态步行步态的方法,可以使用机器人的实验数据进行偶然的改进。我们将行走作为一个集合不变性问题,通过控制障碍函数(cbf)来实现,cbf定义在量化机器人欠驱动部分的降阶动力学上:零动力学。我们的方法包含两个学习模块:一个用于学习满足CBF条件的策略,另一个用于学习残余动力学模型以改进名义模型的缺陷。重要的是,仅在零动态上学习显著降低了学习问题的维数,而使用cbf允许我们仍然对全阶系统做出保证。该方法在欠驱动双足机器人上进行了实验验证,即使在部分未知的动力学情况下,我们也能够显示出敏捷和动态的运动。
{"title":"Neural Gaits: Learning Bipedal Locomotion via Control Barrier Functions and Zero Dynamics Policies","authors":"I. D. Rodriguez, Noel Csomay-Shanklin, Yisong Yue, A. Ames","doi":"10.48550/arXiv.2204.08120","DOIUrl":"https://doi.org/10.48550/arXiv.2204.08120","url":null,"abstract":"This work presents Neural Gaits, a method for learning dynamic walking gaits through the enforce-ment of set invariance that can be refined episodically using experimental data from the robot. We frame walking as a set invariance problem enforceable via control barrier functions (CBFs) defined on the reduced-order dynamics quantifying the underactuated component of the robot: the zero dynamics. Our approach contains two learning modules: one for learning a policy that satisfies the CBF condition, and another for learning a residual dynamics model to refine imperfections of the nominal model. Importantly, learning only over the zero dynamics significantly reduces the dimensionality of the learning problem while using CBFs allows us to still make guarantees for the full-order system. The method is demonstrated experimentally on an underactuated bipedal robot, where we are able to show agile and dynamic locomotion, even with partially unknown dynamics.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126927432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bounding the difference between model predictive control and neural networks 界定模型预测控制与神经网络的区别
Pub Date : 2022-04-13 DOI: 10.48550/arXiv.2204.06486
R. Drummond, S. Duncan, M. Turner, Patricia Pauli, F. Allgöwer
There is a growing debate on whether the future of feedback control systems will be dominated by data-driven or model-driven approaches. Each of these two approaches has their own complimentary set of advantages and disadvantages, however, only limited attempts have, so far, been developed to bridge the gap between them. To address this issue, this paper introduces a method to bound the worst-case error between feedback control policies based upon model predictive control (MPC) and neural networks (NNs). This result is leveraged into an approach to automatically synthesize MPC policies minimising the worst-case error with respect to a NN. Numerical examples highlight the application of the bounds, with the goal of the paper being to encourage a more quantitative understanding of the relationship between data-driven and model-driven control.
关于反馈控制系统的未来是由数据驱动还是模型驱动的方法主导的争论越来越多。这两种方法都有各自的优点和缺点,然而,到目前为止,只有有限的尝试来弥补它们之间的差距。为了解决这一问题,本文介绍了一种基于模型预测控制(MPC)和神经网络(nn)的反馈控制策略之间最坏情况误差的绑定方法。这一结果被用于自动合成MPC策略的方法,使相对于神经网络的最坏情况误差最小化。数值例子强调了边界的应用,本文的目标是鼓励对数据驱动和模型驱动控制之间的关系进行更定量的理解。
{"title":"Bounding the difference between model predictive control and neural networks","authors":"R. Drummond, S. Duncan, M. Turner, Patricia Pauli, F. Allgöwer","doi":"10.48550/arXiv.2204.06486","DOIUrl":"https://doi.org/10.48550/arXiv.2204.06486","url":null,"abstract":"There is a growing debate on whether the future of feedback control systems will be dominated by data-driven or model-driven approaches. Each of these two approaches has their own complimentary set of advantages and disadvantages, however, only limited attempts have, so far, been developed to bridge the gap between them. To address this issue, this paper introduces a method to bound the worst-case error between feedback control policies based upon model predictive control (MPC) and neural networks (NNs). This result is leveraged into an approach to automatically synthesize MPC policies minimising the worst-case error with respect to a NN. Numerical examples highlight the application of the bounds, with the goal of the paper being to encourage a more quantitative understanding of the relationship between data-driven and model-driven control.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120940264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Barrier Bayesian Linear Regression: Online Learning of Control Barrier Conditions for Safety-Critical Control of Uncertain Systems 障碍贝叶斯线性回归:不确定系统安全临界控制的控制障碍条件在线学习
Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.03801
Lukas Brunke, Siqi Zhou, Angela P. Schoellig
In this work, we consider the problem of designing a safety filter for a nonlinear uncertain control system. Our goal is to augment an arbitrary controller with a safety filter such that the overall closed-loop system is guaranteed to stay within a given state constraint set, referred to as being safe. For systems with known dynamics, control barrier functions (CBFs) provide a scalar condition for determining if a system is safe. For uncertain systems, robust or adaptive CBF certification approaches have been proposed. However, these approaches can be conservative or require the system to have a particular parametric structure. For more generic uncertain systems, machine learning approaches have been used to approximate the CBF condition. These works typically assume that the learning module is sufficiently trained prior to deployment. Safety during learning is not guaranteed. We propose a barrier Bayesian linear regression (BBLR) approach that guarantees safe online learning of the CBF condition for the true, uncertain system. We assume that the error between the nominal system and the true system is bounded and exploit the structure of the CBF condition. We show that our approach can safely expand the set of certifiable control inputs despite system and learning uncertainties. The effectiveness of our approach is demonstrated in simulation using a two-dimensional pendulum stabilization task.
本文研究了非线性不确定控制系统的安全滤波器设计问题。我们的目标是用安全滤波器增强任意控制器,从而保证整个闭环系统保持在给定的状态约束集内,称为安全。对于动力学已知的系统,控制屏障函数(cbf)为确定系统是否安全提供了一个标量条件。对于不确定系统,提出了鲁棒或自适应CBF认证方法。然而,这些方法可能是保守的,或者要求系统具有特定的参数结构。对于更一般的不确定系统,机器学习方法已被用于近似CBF条件。这些工作通常假设学习模块在部署之前得到了充分的训练。学习期间的安全无法保证。我们提出了一种屏障贝叶斯线性回归(BBLR)方法,以保证对真实的不确定系统的CBF条件进行安全的在线学习。我们假设标称系统与真实系统之间的误差是有界的,并利用了CBF条件的结构。我们证明了我们的方法可以安全地扩展可证控制输入集,尽管系统和学习的不确定性。通过一个二维摆稳定任务的仿真验证了该方法的有效性。
{"title":"Barrier Bayesian Linear Regression: Online Learning of Control Barrier Conditions for Safety-Critical Control of Uncertain Systems","authors":"Lukas Brunke, Siqi Zhou, Angela P. Schoellig","doi":"10.48550/arXiv.2204.03801","DOIUrl":"https://doi.org/10.48550/arXiv.2204.03801","url":null,"abstract":"In this work, we consider the problem of designing a safety filter for a nonlinear uncertain control system. Our goal is to augment an arbitrary controller with a safety filter such that the overall closed-loop system is guaranteed to stay within a given state constraint set, referred to as being safe. For systems with known dynamics, control barrier functions (CBFs) provide a scalar condition for determining if a system is safe. For uncertain systems, robust or adaptive CBF certification approaches have been proposed. However, these approaches can be conservative or require the system to have a particular parametric structure. For more generic uncertain systems, machine learning approaches have been used to approximate the CBF condition. These works typically assume that the learning module is sufficiently trained prior to deployment. Safety during learning is not guaranteed. We propose a barrier Bayesian linear regression (BBLR) approach that guarantees safe online learning of the CBF condition for the true, uncertain system. We assume that the error between the nominal system and the true system is bounded and exploit the structure of the CBF condition. We show that our approach can safely expand the set of certifiable control inputs despite system and learning uncertainties. The effectiveness of our approach is demonstrated in simulation using a two-dimensional pendulum stabilization task.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116186052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Adaptive Stochastic MPC under Unknown Noise Distribution 未知噪声分布下的自适应随机MPC
Pub Date : 2022-04-03 DOI: 10.48550/arXiv.2204.01107
Charis J. Stamouli, Anastasios Tsiamis, M. Morari, George J. Pappas
In this paper, we address the stochastic MPC (SMPC) problem for linear systems, subject to chance state constraints and hard input constraints, under unknown noise distribution. First, we reformulate the chance state constraints as deterministic constraints depending only on explicit noise statistics. Based on these reformulated constraints, we design a distributionally robust and robustly stable benchmark SMPC algorithm for the ideal setting of known noise statistics. Then, we employ this benchmark controller to derive a novel robustly stable adaptive SMPC scheme that learns the necessary noise statistics online, while guaranteeing time-uniform satisfaction of the unknown reformulated state constraints with high probability. The latter is achieved through the use of confidence intervals which rely on the empirical noise statistics and are valid uniformly over time. Moreover, control performance is improved over time as more noise samples are gathered and better estimates of the noise statistics are obtained, given the online adaptation of the estimated reformulated constraints. Additionally, in tracking problems with multiple successive targets our approach leads to an online-enlarged domain of attraction compared to robust tube-based MPC. A numerical simulation of a DC-DC converter is used to demonstrate the effectiveness of the developed methodology.
本文研究了在未知噪声分布下,受随机状态约束和硬输入约束的线性系统的随机MPC (SMPC)问题。首先,我们将机会状态约束重新表述为仅依赖于显式噪声统计的确定性约束。基于这些重新表述的约束,我们设计了一个分布鲁棒性和鲁棒稳定性的基准SMPC算法,用于已知噪声统计量的理想设置。然后,我们利用该基准控制器推导出一种新的鲁棒稳定自适应SMPC方案,该方案在线学习必要的噪声统计量,同时保证未知的重构状态约束以高概率满足时间均匀性。后者是通过使用依赖于经验噪声统计的置信区间来实现的,并且随着时间的推移均匀有效。此外,随着时间的推移,控制性能得到改善,因为收集了更多的噪声样本,并获得了更好的噪声统计估计,考虑到估计的重新制定的约束的在线适应。此外,在跟踪多个连续目标的问题时,与鲁棒的基于管的MPC相比,我们的方法导致了在线扩大的吸引力域。通过对直流-直流变换器的数值模拟,验证了所提出方法的有效性。
{"title":"Adaptive Stochastic MPC under Unknown Noise Distribution","authors":"Charis J. Stamouli, Anastasios Tsiamis, M. Morari, George J. Pappas","doi":"10.48550/arXiv.2204.01107","DOIUrl":"https://doi.org/10.48550/arXiv.2204.01107","url":null,"abstract":"In this paper, we address the stochastic MPC (SMPC) problem for linear systems, subject to chance state constraints and hard input constraints, under unknown noise distribution. First, we reformulate the chance state constraints as deterministic constraints depending only on explicit noise statistics. Based on these reformulated constraints, we design a distributionally robust and robustly stable benchmark SMPC algorithm for the ideal setting of known noise statistics. Then, we employ this benchmark controller to derive a novel robustly stable adaptive SMPC scheme that learns the necessary noise statistics online, while guaranteeing time-uniform satisfaction of the unknown reformulated state constraints with high probability. The latter is achieved through the use of confidence intervals which rely on the empirical noise statistics and are valid uniformly over time. Moreover, control performance is improved over time as more noise samples are gathered and better estimates of the noise statistics are obtained, given the online adaptation of the estimated reformulated constraints. Additionally, in tracking problems with multiple successive targets our approach leads to an online-enlarged domain of attraction compared to robust tube-based MPC. A numerical simulation of a DC-DC converter is used to demonstrate the effectiveness of the developed methodology.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Safe Control with Minimal Regret 以最小的遗憾安全控制
Pub Date : 2022-03-01 DOI: 10.48550/arXiv.2203.00358
Andrea Martin, Luca Furieri, F. Dörfler, J. Lygeros, G. Ferrari-Trecate
As we move towards safety-critical cyber-physical systems that operate in non-stationary and uncertain environments, it becomes crucial to close the gap between classical optimal control algorithms and adaptive learning-based methods. In this paper, we present an efficient optimization-based approach for computing a finite-horizon robustly safe control policy that minimizes dynamic regret, in the sense of the loss relative to the optimal sequence of control actions selected in hindsight by a clairvoyant controller. By leveraging the system level synthesis framework (SLS), our method extends recent results on regret minimization for the linear quadratic regulator to optimal control subject to hard safety constraints, and allows competing against a safety-aware clairvoyant policy with minor modifications. Numerical experiments confirm superior performance with respect to finite-horizon constrained $mathcal{H}_2$ and $mathcal{H}_infty$ control laws when the disturbance realizations poorly fit classical assumptions.
当我们转向在非平稳和不确定环境中运行的安全关键网络物理系统时,缩小经典最优控制算法与基于自适应学习的方法之间的差距变得至关重要。在本文中,我们提出了一种有效的基于优化的方法来计算一个有限视界鲁棒安全控制策略,该策略可以最小化动态后悔,即相对于由千里眼控制器事后选择的最优控制动作序列的损失。通过利用系统级综合框架(SLS),我们的方法将线性二次调节器的遗憾最小化的最新结果扩展到受硬安全约束的最优控制,并允许与安全感知的洞察力策略进行轻微修改。数值实验证实,当扰动实现不符合经典假设时,有限视界约束$mathcal{H}_2$和$mathcal{H}_infty$控制律具有较好的性能。
{"title":"Safe Control with Minimal Regret","authors":"Andrea Martin, Luca Furieri, F. Dörfler, J. Lygeros, G. Ferrari-Trecate","doi":"10.48550/arXiv.2203.00358","DOIUrl":"https://doi.org/10.48550/arXiv.2203.00358","url":null,"abstract":"As we move towards safety-critical cyber-physical systems that operate in non-stationary and uncertain environments, it becomes crucial to close the gap between classical optimal control algorithms and adaptive learning-based methods. In this paper, we present an efficient optimization-based approach for computing a finite-horizon robustly safe control policy that minimizes dynamic regret, in the sense of the loss relative to the optimal sequence of control actions selected in hindsight by a clairvoyant controller. By leveraging the system level synthesis framework (SLS), our method extends recent results on regret minimization for the linear quadratic regulator to optimal control subject to hard safety constraints, and allows competing against a safety-aware clairvoyant policy with minor modifications. Numerical experiments confirm superior performance with respect to finite-horizon constrained $mathcal{H}_2$ and $mathcal{H}_infty$ control laws when the disturbance realizations poorly fit classical assumptions.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131556905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
Conference on Learning for Dynamics & Control
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1