Pub Date : 2022-10-17DOI: 10.48550/arXiv.2210.09206
Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, A. Jadbabaie
In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying samples from an expert. Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system. Behavior cloning, however, is a method that is known to be data inefficient and suffer from distribution shifts. As an alternative, we develop a variant of the forward training algorithm which is an on-policy imitation learning method proposed by Ross et al. (2010). Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance. We validate our results through simulations and show that the forward training algorithm is indeed superior to behavior cloning when applied to MPC.
{"title":"Model Predictive Control via On-Policy Imitation Learning","authors":"Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, A. Jadbabaie","doi":"10.48550/arXiv.2210.09206","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09206","url":null,"abstract":"In this paper, we leverage the rapid advances in imitation learning, a topic of intense recent focus in the Reinforcement Learning (RL) literature, to develop new sample complexity results and performance guarantees for data-driven Model Predictive Control (MPC) for constrained linear systems. In its simplest form, imitation learning is an approach that tries to learn an expert policy by querying samples from an expert. Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system. Behavior cloning, however, is a method that is known to be data inefficient and suffer from distribution shifts. As an alternative, we develop a variant of the forward training algorithm which is an on-policy imitation learning method proposed by Ross et al. (2010). Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance. We validate our results through simulations and show that the forward training algorithm is indeed superior to behavior cloning when applied to MPC.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":" 36","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-26DOI: 10.48550/arXiv.2207.12631
Christian Kurniawan, Xiyu Deng, Adhiraj Chakraborty, A. Gueye, Niangjun Chen, Yorie Nakahira
Microfinance, despite its significant potential for poverty reduction, is facing sustainability hardships due to high default rates. Although many methods in regular finance can estimate credit scores and default probabilities, these methods are not directly applicable to microfinance due to the following unique characteristics: a) under-explored (developing) areas such as rural Africa do not have sufficient prior loan data for microfinance institutions (MFIs) to establish a credit scoring system; b) microfinance applicants may have difficulty providing sufficient information for MFIs to accurately predict default probabilities; and c) many MFIs use group liability (instead of collateral) to secure repayment. Here, we present a novel control-theoretic model of microfinance that accounts for these characteristics. We construct an algorithm to learn microfinance decision policies that achieve financial inclusion, fairness, social welfare, and sustainability. We characterize the convergence conditions to Pareto-optimum and the convergence speeds. We demonstrate, in numerous real and synthetic datasets, that the proposed method accounts for the complexities induced by group liability to produce robust decisions before sufficient loans are given to establish credit scoring systems and for applicants whose default probability cannot be accurately estimated due to missing information. To the best of our knowledge, this paper is the first to connect microfinance and control theory. We envision that the connection will enable safe learning and control techniques to help modernize microfinance and alleviate poverty.
{"title":"A Learning and Control Perspective for Microfinance","authors":"Christian Kurniawan, Xiyu Deng, Adhiraj Chakraborty, A. Gueye, Niangjun Chen, Yorie Nakahira","doi":"10.48550/arXiv.2207.12631","DOIUrl":"https://doi.org/10.48550/arXiv.2207.12631","url":null,"abstract":"Microfinance, despite its significant potential for poverty reduction, is facing sustainability hardships due to high default rates. Although many methods in regular finance can estimate credit scores and default probabilities, these methods are not directly applicable to microfinance due to the following unique characteristics: a) under-explored (developing) areas such as rural Africa do not have sufficient prior loan data for microfinance institutions (MFIs) to establish a credit scoring system; b) microfinance applicants may have difficulty providing sufficient information for MFIs to accurately predict default probabilities; and c) many MFIs use group liability (instead of collateral) to secure repayment. Here, we present a novel control-theoretic model of microfinance that accounts for these characteristics. We construct an algorithm to learn microfinance decision policies that achieve financial inclusion, fairness, social welfare, and sustainability. We characterize the convergence conditions to Pareto-optimum and the convergence speeds. We demonstrate, in numerous real and synthetic datasets, that the proposed method accounts for the complexities induced by group liability to produce robust decisions before sufficient loans are given to establish credit scoring systems and for applicants whose default probability cannot be accurately estimated due to missing information. To the best of our knowledge, this paper is the first to connect microfinance and control theory. We envision that the connection will enable safe learning and control techniques to help modernize microfinance and alleviate poverty.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115820400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-26DOI: 10.48550/arXiv.2205.13600
V. Caggiano, Huawei Wang, G. Durandau, Massimo Sartori, Vikash Kumar
Embodied agents in continuous control domains have had limited exposure to tasks allowing to explore musculoskeletal properties that enable agile and nimble behaviors in biological beings. The sophistication behind neuro-musculoskeletal control can pose new challenges for the motor learning community. At the same time, agents solving complex neural control problems allow impact in fields such as neuro-rehabilitation, as well as collaborative-robotics. Human biomechanics underlies complex multi-joint-multi-actuator musculoskeletal systems. The sensory-motor system relies on a range of sensory-contact rich and proprioceptive inputs that define and condition muscle actuation required to exhibit intelligent behaviors in the physical world. Current frameworks for musculoskeletal control do not support physiological sophistication of the musculoskeletal systems along with physical world interaction capabilities. In addition, they are neither embedded in complex and skillful motor tasks nor are computationally effective and scalable to study large-scale learning paradigms. Here, we present MyoSuite -- a suite of physiologically accurate biomechanical models of elbow, wrist, and hand, with physical contact capabilities, which allow learning of complex and skillful contact-rich real-world tasks. We provide diverse motor-control challenges: from simple postural control to skilled hand-object interactions such as turning a key, twirling a pen, rotating two balls in one hand, etc. By supporting physiological alterations in musculoskeletal geometry (tendon transfer), assistive devices (exoskeleton assistance), and muscle contraction dynamics (muscle fatigue, sarcopenia), we present real-life tasks with temporal changes, thereby exposing realistic non-stationary conditions in our tasks which most continuous control benchmarks lack.
{"title":"MyoSuite: A Contact-rich Simulation Suite for Musculoskeletal Motor Control","authors":"V. Caggiano, Huawei Wang, G. Durandau, Massimo Sartori, Vikash Kumar","doi":"10.48550/arXiv.2205.13600","DOIUrl":"https://doi.org/10.48550/arXiv.2205.13600","url":null,"abstract":"Embodied agents in continuous control domains have had limited exposure to tasks allowing to explore musculoskeletal properties that enable agile and nimble behaviors in biological beings. The sophistication behind neuro-musculoskeletal control can pose new challenges for the motor learning community. At the same time, agents solving complex neural control problems allow impact in fields such as neuro-rehabilitation, as well as collaborative-robotics. Human biomechanics underlies complex multi-joint-multi-actuator musculoskeletal systems. The sensory-motor system relies on a range of sensory-contact rich and proprioceptive inputs that define and condition muscle actuation required to exhibit intelligent behaviors in the physical world. Current frameworks for musculoskeletal control do not support physiological sophistication of the musculoskeletal systems along with physical world interaction capabilities. In addition, they are neither embedded in complex and skillful motor tasks nor are computationally effective and scalable to study large-scale learning paradigms. Here, we present MyoSuite -- a suite of physiologically accurate biomechanical models of elbow, wrist, and hand, with physical contact capabilities, which allow learning of complex and skillful contact-rich real-world tasks. We provide diverse motor-control challenges: from simple postural control to skilled hand-object interactions such as turning a key, twirling a pen, rotating two balls in one hand, etc. By supporting physiological alterations in musculoskeletal geometry (tendon transfer), assistive devices (exoskeleton assistance), and muscle contraction dynamics (muscle fatigue, sarcopenia), we present real-life tasks with temporal changes, thereby exposing realistic non-stationary conditions in our tasks which most continuous control benchmarks lack.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124828466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-10DOI: 10.48550/arXiv.2205.05119
Benjamin J. Gravell, I. Shames, T. Summers
We propose a robust data-driven output feedback control algorithm that explicitly incorporates inherent finite-sample model estimate uncertainties into the control design. The algorithm has three components: (1) a subspace identification nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method comprising a coupled optimal dynamic output feedback filter and controller with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. Moreover, the control design method accommodates a highly structured uncertainty representation that can capture uncertainty shape more effectively than existing approaches. We show through numerical experiments that the proposed robust data-driven output feedback controller can significantly outperform a certainty equivalent controller on various measures of sample complexity and stability robustness.
{"title":"Robust Data-Driven Output Feedback Control via Bootstrapped Multiplicative Noise","authors":"Benjamin J. Gravell, I. Shames, T. Summers","doi":"10.48550/arXiv.2205.05119","DOIUrl":"https://doi.org/10.48550/arXiv.2205.05119","url":null,"abstract":"We propose a robust data-driven output feedback control algorithm that explicitly incorporates inherent finite-sample model estimate uncertainties into the control design. The algorithm has three components: (1) a subspace identification nominal model estimator; (2) a bootstrap resampling method that quantifies non-asymptotic variance of the nominal model estimate; and (3) a non-conventional robust control design method comprising a coupled optimal dynamic output feedback filter and controller with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. Moreover, the control design method accommodates a highly structured uncertainty representation that can capture uncertainty shape more effectively than existing approaches. We show through numerical experiments that the proposed robust data-driven output feedback controller can significantly outperform a certainty equivalent controller on various measures of sample complexity and stability robustness.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125104042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-23DOI: 10.48550/arXiv.2204.11134
Yuchen Cui, S. Niekum, Abhi Gupta, Vikash Kumar, A. Rajeswaran
Task specification is at the core of programming autonomous robots. A low-effort modality for task specification is critical for engagement of non-expert end-users and ultimate adoption of personalized robot agents. A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene. The former is hard to interpret for non-experts and necessitates detailed state estimation and scene understanding. The latter requires the generation of desired goal image, which often requires a human to complete the task, defeating the purpose of having autonomous robots. In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use such as images obtained from the internet, hand sketches that provide a visual description of the desired task, or simple language descriptions. As a preliminary step towards this, we investigate the capabilities of large scale pre-trained models (foundation models) for zero-shot goal specification, and find promising results in a collection of simulated robot manipulation tasks and real-world datasets.
{"title":"Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?","authors":"Yuchen Cui, S. Niekum, Abhi Gupta, Vikash Kumar, A. Rajeswaran","doi":"10.48550/arXiv.2204.11134","DOIUrl":"https://doi.org/10.48550/arXiv.2204.11134","url":null,"abstract":"Task specification is at the core of programming autonomous robots. A low-effort modality for task specification is critical for engagement of non-expert end-users and ultimate adoption of personalized robot agents. A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene. The former is hard to interpret for non-experts and necessitates detailed state estimation and scene understanding. The latter requires the generation of desired goal image, which often requires a human to complete the task, defeating the purpose of having autonomous robots. In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use such as images obtained from the internet, hand sketches that provide a visual description of the desired task, or simple language descriptions. As a preliminary step towards this, we investigate the capabilities of large scale pre-trained models (foundation models) for zero-shot goal specification, and find promising results in a collection of simulated robot manipulation tasks and real-world datasets.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116756614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-18DOI: 10.48550/arXiv.2204.08120
I. D. Rodriguez, Noel Csomay-Shanklin, Yisong Yue, A. Ames
This work presents Neural Gaits, a method for learning dynamic walking gaits through the enforce-ment of set invariance that can be refined episodically using experimental data from the robot. We frame walking as a set invariance problem enforceable via control barrier functions (CBFs) defined on the reduced-order dynamics quantifying the underactuated component of the robot: the zero dynamics. Our approach contains two learning modules: one for learning a policy that satisfies the CBF condition, and another for learning a residual dynamics model to refine imperfections of the nominal model. Importantly, learning only over the zero dynamics significantly reduces the dimensionality of the learning problem while using CBFs allows us to still make guarantees for the full-order system. The method is demonstrated experimentally on an underactuated bipedal robot, where we are able to show agile and dynamic locomotion, even with partially unknown dynamics.
{"title":"Neural Gaits: Learning Bipedal Locomotion via Control Barrier Functions and Zero Dynamics Policies","authors":"I. D. Rodriguez, Noel Csomay-Shanklin, Yisong Yue, A. Ames","doi":"10.48550/arXiv.2204.08120","DOIUrl":"https://doi.org/10.48550/arXiv.2204.08120","url":null,"abstract":"This work presents Neural Gaits, a method for learning dynamic walking gaits through the enforce-ment of set invariance that can be refined episodically using experimental data from the robot. We frame walking as a set invariance problem enforceable via control barrier functions (CBFs) defined on the reduced-order dynamics quantifying the underactuated component of the robot: the zero dynamics. Our approach contains two learning modules: one for learning a policy that satisfies the CBF condition, and another for learning a residual dynamics model to refine imperfections of the nominal model. Importantly, learning only over the zero dynamics significantly reduces the dimensionality of the learning problem while using CBFs allows us to still make guarantees for the full-order system. The method is demonstrated experimentally on an underactuated bipedal robot, where we are able to show agile and dynamic locomotion, even with partially unknown dynamics.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126927432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-13DOI: 10.48550/arXiv.2204.06486
R. Drummond, S. Duncan, M. Turner, Patricia Pauli, F. Allgöwer
There is a growing debate on whether the future of feedback control systems will be dominated by data-driven or model-driven approaches. Each of these two approaches has their own complimentary set of advantages and disadvantages, however, only limited attempts have, so far, been developed to bridge the gap between them. To address this issue, this paper introduces a method to bound the worst-case error between feedback control policies based upon model predictive control (MPC) and neural networks (NNs). This result is leveraged into an approach to automatically synthesize MPC policies minimising the worst-case error with respect to a NN. Numerical examples highlight the application of the bounds, with the goal of the paper being to encourage a more quantitative understanding of the relationship between data-driven and model-driven control.
{"title":"Bounding the difference between model predictive control and neural networks","authors":"R. Drummond, S. Duncan, M. Turner, Patricia Pauli, F. Allgöwer","doi":"10.48550/arXiv.2204.06486","DOIUrl":"https://doi.org/10.48550/arXiv.2204.06486","url":null,"abstract":"There is a growing debate on whether the future of feedback control systems will be dominated by data-driven or model-driven approaches. Each of these two approaches has their own complimentary set of advantages and disadvantages, however, only limited attempts have, so far, been developed to bridge the gap between them. To address this issue, this paper introduces a method to bound the worst-case error between feedback control policies based upon model predictive control (MPC) and neural networks (NNs). This result is leveraged into an approach to automatically synthesize MPC policies minimising the worst-case error with respect to a NN. Numerical examples highlight the application of the bounds, with the goal of the paper being to encourage a more quantitative understanding of the relationship between data-driven and model-driven control.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120940264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-08DOI: 10.48550/arXiv.2204.03801
Lukas Brunke, Siqi Zhou, Angela P. Schoellig
In this work, we consider the problem of designing a safety filter for a nonlinear uncertain control system. Our goal is to augment an arbitrary controller with a safety filter such that the overall closed-loop system is guaranteed to stay within a given state constraint set, referred to as being safe. For systems with known dynamics, control barrier functions (CBFs) provide a scalar condition for determining if a system is safe. For uncertain systems, robust or adaptive CBF certification approaches have been proposed. However, these approaches can be conservative or require the system to have a particular parametric structure. For more generic uncertain systems, machine learning approaches have been used to approximate the CBF condition. These works typically assume that the learning module is sufficiently trained prior to deployment. Safety during learning is not guaranteed. We propose a barrier Bayesian linear regression (BBLR) approach that guarantees safe online learning of the CBF condition for the true, uncertain system. We assume that the error between the nominal system and the true system is bounded and exploit the structure of the CBF condition. We show that our approach can safely expand the set of certifiable control inputs despite system and learning uncertainties. The effectiveness of our approach is demonstrated in simulation using a two-dimensional pendulum stabilization task.
{"title":"Barrier Bayesian Linear Regression: Online Learning of Control Barrier Conditions for Safety-Critical Control of Uncertain Systems","authors":"Lukas Brunke, Siqi Zhou, Angela P. Schoellig","doi":"10.48550/arXiv.2204.03801","DOIUrl":"https://doi.org/10.48550/arXiv.2204.03801","url":null,"abstract":"In this work, we consider the problem of designing a safety filter for a nonlinear uncertain control system. Our goal is to augment an arbitrary controller with a safety filter such that the overall closed-loop system is guaranteed to stay within a given state constraint set, referred to as being safe. For systems with known dynamics, control barrier functions (CBFs) provide a scalar condition for determining if a system is safe. For uncertain systems, robust or adaptive CBF certification approaches have been proposed. However, these approaches can be conservative or require the system to have a particular parametric structure. For more generic uncertain systems, machine learning approaches have been used to approximate the CBF condition. These works typically assume that the learning module is sufficiently trained prior to deployment. Safety during learning is not guaranteed. We propose a barrier Bayesian linear regression (BBLR) approach that guarantees safe online learning of the CBF condition for the true, uncertain system. We assume that the error between the nominal system and the true system is bounded and exploit the structure of the CBF condition. We show that our approach can safely expand the set of certifiable control inputs despite system and learning uncertainties. The effectiveness of our approach is demonstrated in simulation using a two-dimensional pendulum stabilization task.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116186052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.48550/arXiv.2204.01107
Charis J. Stamouli, Anastasios Tsiamis, M. Morari, George J. Pappas
In this paper, we address the stochastic MPC (SMPC) problem for linear systems, subject to chance state constraints and hard input constraints, under unknown noise distribution. First, we reformulate the chance state constraints as deterministic constraints depending only on explicit noise statistics. Based on these reformulated constraints, we design a distributionally robust and robustly stable benchmark SMPC algorithm for the ideal setting of known noise statistics. Then, we employ this benchmark controller to derive a novel robustly stable adaptive SMPC scheme that learns the necessary noise statistics online, while guaranteeing time-uniform satisfaction of the unknown reformulated state constraints with high probability. The latter is achieved through the use of confidence intervals which rely on the empirical noise statistics and are valid uniformly over time. Moreover, control performance is improved over time as more noise samples are gathered and better estimates of the noise statistics are obtained, given the online adaptation of the estimated reformulated constraints. Additionally, in tracking problems with multiple successive targets our approach leads to an online-enlarged domain of attraction compared to robust tube-based MPC. A numerical simulation of a DC-DC converter is used to demonstrate the effectiveness of the developed methodology.
{"title":"Adaptive Stochastic MPC under Unknown Noise Distribution","authors":"Charis J. Stamouli, Anastasios Tsiamis, M. Morari, George J. Pappas","doi":"10.48550/arXiv.2204.01107","DOIUrl":"https://doi.org/10.48550/arXiv.2204.01107","url":null,"abstract":"In this paper, we address the stochastic MPC (SMPC) problem for linear systems, subject to chance state constraints and hard input constraints, under unknown noise distribution. First, we reformulate the chance state constraints as deterministic constraints depending only on explicit noise statistics. Based on these reformulated constraints, we design a distributionally robust and robustly stable benchmark SMPC algorithm for the ideal setting of known noise statistics. Then, we employ this benchmark controller to derive a novel robustly stable adaptive SMPC scheme that learns the necessary noise statistics online, while guaranteeing time-uniform satisfaction of the unknown reformulated state constraints with high probability. The latter is achieved through the use of confidence intervals which rely on the empirical noise statistics and are valid uniformly over time. Moreover, control performance is improved over time as more noise samples are gathered and better estimates of the noise statistics are obtained, given the online adaptation of the estimated reformulated constraints. Additionally, in tracking problems with multiple successive targets our approach leads to an online-enlarged domain of attraction compared to robust tube-based MPC. A numerical simulation of a DC-DC converter is used to demonstrate the effectiveness of the developed methodology.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-01DOI: 10.48550/arXiv.2203.00358
Andrea Martin, Luca Furieri, F. Dörfler, J. Lygeros, G. Ferrari-Trecate
As we move towards safety-critical cyber-physical systems that operate in non-stationary and uncertain environments, it becomes crucial to close the gap between classical optimal control algorithms and adaptive learning-based methods. In this paper, we present an efficient optimization-based approach for computing a finite-horizon robustly safe control policy that minimizes dynamic regret, in the sense of the loss relative to the optimal sequence of control actions selected in hindsight by a clairvoyant controller. By leveraging the system level synthesis framework (SLS), our method extends recent results on regret minimization for the linear quadratic regulator to optimal control subject to hard safety constraints, and allows competing against a safety-aware clairvoyant policy with minor modifications. Numerical experiments confirm superior performance with respect to finite-horizon constrained $mathcal{H}_2$ and $mathcal{H}_infty$ control laws when the disturbance realizations poorly fit classical assumptions.
{"title":"Safe Control with Minimal Regret","authors":"Andrea Martin, Luca Furieri, F. Dörfler, J. Lygeros, G. Ferrari-Trecate","doi":"10.48550/arXiv.2203.00358","DOIUrl":"https://doi.org/10.48550/arXiv.2203.00358","url":null,"abstract":"As we move towards safety-critical cyber-physical systems that operate in non-stationary and uncertain environments, it becomes crucial to close the gap between classical optimal control algorithms and adaptive learning-based methods. In this paper, we present an efficient optimization-based approach for computing a finite-horizon robustly safe control policy that minimizes dynamic regret, in the sense of the loss relative to the optimal sequence of control actions selected in hindsight by a clairvoyant controller. By leveraging the system level synthesis framework (SLS), our method extends recent results on regret minimization for the linear quadratic regulator to optimal control subject to hard safety constraints, and allows competing against a safety-aware clairvoyant policy with minor modifications. Numerical experiments confirm superior performance with respect to finite-horizon constrained $mathcal{H}_2$ and $mathcal{H}_infty$ control laws when the disturbance realizations poorly fit classical assumptions.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131556905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}