In this paper, we consider optimal control problems (OCPs) applied to large-scale linear dynamical systems with a large number of states and inputs. We attempt to reduce such problems into a set of independent OCPs of lower dimensions. Our decomposition is ‘exact’ in the sense that it preserves all the information about the original system and the objective function. Previous work in this area has focused on strategies that exploit symmetries of the underlying system and of the objective function. Here, instead, we implement the algebraic method of simultaneous block diagonalization of matrices (SBD), which we show provides advantages both in terms of the dimension of the subproblems that are obtained and of the computation time. We provide practical examples with networked systems that demonstrate the benefits of applying the SBD decomposition over the decomposition method based on group symmetries.
{"title":"Exact Decomposition of Optimal Control Problems via Simultaneous Block Diagonalization of Matrices","authors":"Amirhossein Nazerian;Kshitij Bhatta;Francesco Sorrentino","doi":"10.1109/OJCSYS.2022.3231553","DOIUrl":"10.1109/OJCSYS.2022.3231553","url":null,"abstract":"In this paper, we consider optimal control problems (OCPs) applied to large-scale linear dynamical systems with a large number of states and inputs. We attempt to reduce such problems into a set of independent OCPs of lower dimensions. Our decomposition is ‘exact’ in the sense that it preserves all the information about the original system and the objective function. Previous work in this area has focused on strategies that exploit symmetries of the underlying system and of the objective function. Here, instead, we implement the algebraic method of simultaneous block diagonalization of matrices (SBD), which we show provides advantages both in terms of the dimension of the subproblems that are obtained and of the computation time. We provide practical examples with networked systems that demonstrate the benefits of applying the SBD decomposition over the decomposition method based on group symmetries.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"24-35"},"PeriodicalIF":0.0,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9996568","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9111923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-22DOI: 10.1109/OJCSYS.2022.3231523
Sean Wilson;Magnus Egerstedt
In robotic research and education, the cost in terms of money, expertise, and time required to instantiate and maintain robotic testbeds can prevent researchers and educators from including hardware based experimentation in their laboratories and classrooms. This results in robotic algorithms often being validated by low-fidelity simulation due to the complexity and computational demand required by high-fidelity simulators. Unfortunately, these simulation environments often neglect real world complexities, such as wheel slip, actuator dynamics, computation time, communication delays, and sensor noise. The Robotarium provides a solution to these problems by providing a state-of-the-art, multi-robot research facility to everyone around the world free of charge for academic and educational purposes. This paper discusses the remote usage of the testbed since its opening in 2017, details the testbeds design, and provides a brief tutorial on how to use it.
{"title":"The Robotarium: A Remotely-Accessible, Multi-Robot Testbed for Control Research and Education","authors":"Sean Wilson;Magnus Egerstedt","doi":"10.1109/OJCSYS.2022.3231523","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3231523","url":null,"abstract":"In robotic research and education, the cost in terms of money, expertise, and time required to instantiate and maintain robotic testbeds can prevent researchers and educators from including hardware based experimentation in their laboratories and classrooms. This results in robotic algorithms often being validated by low-fidelity simulation due to the complexity and computational demand required by high-fidelity simulators. Unfortunately, these simulation environments often neglect real world complexities, such as wheel slip, actuator dynamics, computation time, communication delays, and sensor noise. The Robotarium provides a solution to these problems by providing a state-of-the-art, multi-robot research facility to everyone around the world free of charge for academic and educational purposes. This paper discusses the remote usage of the testbed since its opening in 2017, details the testbeds design, and provides a brief tutorial on how to use it.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"12-23"},"PeriodicalIF":0.0,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/09996578.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50226356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-02DOI: 10.1109/OJCSYS.2022.3219740
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
列出本期出版物的编辑委员会、董事会、现任工作人员、委员会成员和/或协会编辑。
{"title":"IEEE Open Journal of Control Systems Publication Information","authors":"","doi":"10.1109/OJCSYS.2022.3219740","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3219740","url":null,"abstract":"Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"C2-C2"},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09969409.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-02DOI: 10.1109/OJCSYS.2022.3219735
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
列出本期出版物的编辑委员会、董事会、现任工作人员、委员会成员和/或协会编辑。
{"title":"IEEE Control Systems Society Information","authors":"","doi":"10.1109/OJCSYS.2022.3219735","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3219735","url":null,"abstract":"Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"C3-C3"},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09969411.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-16DOI: 10.1109/OJCSYS.2022.3222753
Stefan B. Liu;Andrea Giusti;Matthias Althoff
Accurate velocity information is often essential to the control of robot manipulators, especially for precise tracking of fast trajectories. However, joint velocities are rarely directly measured and instead estimated to save costs. While many approaches have been proposed for the velocity estimation of robot joints, no comprehensive experimental evaluation exists, making it difficult to choose the appropriate method. This paper compares multiple estimation methods running on a six degrees-of-freedom manipulator. We evaluate: 1) the estimation error using a ground-truth signal, 2) the closed-loop tracking error, 3) convergence behavior, 4) sensor fault tolerance, 5) implementation and tuning effort. To ensure a fair comparison, we optimally tune the estimators using a genetic algorithm. All estimation methods have a similar estimation error and similar closed-loop tracking performance, except for the nonlinear high-gain observer, which is not accurate enough. Sliding-mode observers can provide a precise velocity estimation despite sensor faults.
{"title":"Velocity Estimation of Robot Manipulators: An Experimental Comparison","authors":"Stefan B. Liu;Andrea Giusti;Matthias Althoff","doi":"10.1109/OJCSYS.2022.3222753","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3222753","url":null,"abstract":"Accurate velocity information is often essential to the control of robot manipulators, especially for precise tracking of fast trajectories. However, joint velocities are rarely directly measured and instead estimated to save costs. While many approaches have been proposed for the velocity estimation of robot joints, no comprehensive experimental evaluation exists, making it difficult to choose the appropriate method. This paper compares multiple estimation methods running on a six degrees-of-freedom manipulator. We evaluate: 1) the estimation error using a ground-truth signal, 2) the closed-loop tracking error, 3) convergence behavior, 4) sensor fault tolerance, 5) implementation and tuning effort. To ensure a fair comparison, we optimally tune the estimators using a genetic algorithm. All estimation methods have a similar estimation error and similar closed-loop tracking performance, except for the nonlinear high-gain observer, which is not accurate enough. Sliding-mode observers can provide a precise velocity estimation despite sensor faults.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/09953534.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50376166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-10DOI: 10.1109/OJCSYS.2022.3221063
Katrine Seel;Arash Bahari Kordabad;Sébastien Gros;Jan Tommy Gravdahl
Developing model predictive control (MPC) schemes can be challenging for systems where an accurate model is not available, or too costly to develop. With the increasing availability of data and tools to treat them, learning-based MPC has of late attracted wide attention. It has recently been shown that adapting not only the MPC model, but also its cost function is conducive to achieving optimal closed-loop performance when an accurate model cannot be provided. In the learning context, this modification can be performed via parametrizing the MPC cost and adjusting the parameters via, e.g., reinforcement learning (RL). In this framework, simple cost parametrizations can be effective, but the underlying theory suggests that rich parametrizations in principle can be useful. In this paper, we propose such a cost parametrization using a class of neural networks (NNs) that preserves convexity. This choice avoids creating difficulties when solving the MPC problem via sensitivity-based solvers. In addition, this choice of cost parametrization ensures nominal stability of the resulting MPC scheme. Moreover, we detail how this choice can be applied to economic MPC problems where the cost function is generic and therefore does not necessarily fulfill any specific property.
{"title":"Convex Neural Network-Based Cost Modifications for Learning Model Predictive Control","authors":"Katrine Seel;Arash Bahari Kordabad;Sébastien Gros;Jan Tommy Gravdahl","doi":"10.1109/OJCSYS.2022.3221063","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3221063","url":null,"abstract":"Developing model predictive control (MPC) schemes can be challenging for systems where an accurate model is not available, or too costly to develop. With the increasing availability of data and tools to treat them, learning-based MPC has of late attracted wide attention. It has recently been shown that adapting not only the MPC model, but also its cost function is conducive to achieving optimal closed-loop performance when an accurate model cannot be provided. In the learning context, this modification can be performed via parametrizing the MPC cost and adjusting the parameters via, e.g., reinforcement learning (RL). In this framework, simple cost parametrizations can be effective, but the underlying theory suggests that rich parametrizations in principle can be useful. In this paper, we propose such a cost parametrization using a class of neural networks (NNs) that preserves convexity. This choice avoids creating difficulties when solving the MPC problem via sensitivity-based solvers. In addition, this choice of cost parametrization ensures nominal stability of the resulting MPC scheme. Moreover, we detail how this choice can be applied to economic MPC problems where the cost function is generic and therefore does not necessarily fulfill any specific property.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"366-379"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09944720.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-21DOI: 10.1109/OJCSYS.2022.3216545
Iman Salehi;Tyler Taplin;Ashwin P. Dani
This paper presents a discrete-time dynamical system model learning method from demonstration while providing probabilistic guarantees on the safety and stability of the learned model. The controlled dynamic model of a discrete-time system with a zero-mean Gaussian process noise is approximated using an Extreme Learning Machine (ELM) whose parameters are learned subject to chance constraints derived using a discrete-time control barrier function and discrete-time control Lyapunov function in the presence of the ELM reconstruction error. To estimate the ELM parameters a quadratically constrained quadratic program (QCQP) is developed subject to the constraints that are only required to be evaluated at sampled points. Simulations validate that the system model learned using the proposed method can reproduce the demonstrations inside a prescribed safe set while converging to the desired goal location starting from various different initial conditions inside the safe set. Furthermore, it is shown that the learned model can adapt to changes in goal location during reproductions without violating the stability and safety constraints.
{"title":"Learning Discrete-Time Uncertain Nonlinear Systems With Probabilistic Safety and Stability Constraints","authors":"Iman Salehi;Tyler Taplin;Ashwin P. Dani","doi":"10.1109/OJCSYS.2022.3216545","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3216545","url":null,"abstract":"This paper presents a discrete-time dynamical system model learning method from demonstration while providing probabilistic guarantees on the safety and stability of the learned model. The controlled dynamic model of a discrete-time system with a zero-mean Gaussian process noise is approximated using an Extreme Learning Machine (ELM) whose parameters are learned subject to chance constraints derived using a discrete-time control barrier function and discrete-time control Lyapunov function in the presence of the ELM reconstruction error. To estimate the ELM parameters a quadratically constrained quadratic program (QCQP) is developed subject to the constraints that are only required to be evaluated at sampled points. Simulations validate that the system model learned using the proposed method can reproduce the demonstrations inside a prescribed safe set while converging to the desired goal location starting from various different initial conditions inside the safe set. Furthermore, it is shown that the learned model can adapt to changes in goal location during reproductions without violating the stability and safety constraints.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"354-365"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09926168.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-10DOI: 10.1109/OJCSYS.2022.3212613
Zhe Du;Laura Balzano;Necmiye Ozay
Switched systems are capable of modeling processes with underlying dynamics that may change abruptly over time. To achieve accurate modeling in practice, one may need a large number of modes, but this may in turn increase the model complexity drastically. Existing work on reducing system complexity mainly considers state space reduction, whereas reducing the number of modes is less studied. In this work, we consider Markov jump linear systems (MJSs), a special class of switched systems where the active mode switches according to a Markov chain, and several issues associated with its mode complexity. Specifically, inspired by clustering techniques from unsupervised learning, we are able to construct a reduced MJS with fewer modes that approximates the original MJS well under various metrics. Furthermore, both theoretically and empirically, we show how one can use the reduced MJS to analyze stability and design controllers with significant reduction in computational cost while achieving guaranteed accuracy.
{"title":"Mode Reduction for Markov Jump Systems","authors":"Zhe Du;Laura Balzano;Necmiye Ozay","doi":"10.1109/OJCSYS.2022.3212613","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3212613","url":null,"abstract":"Switched systems are capable of modeling processes with underlying dynamics that may change abruptly over time. To achieve accurate modeling in practice, one may need a large number of modes, but this may in turn increase the model complexity drastically. Existing work on reducing system complexity mainly considers state space reduction, whereas reducing the number of modes is less studied. In this work, we consider Markov jump linear systems (MJSs), a special class of switched systems where the active mode switches according to a Markov chain, and several issues associated with its mode complexity. Specifically, inspired by clustering techniques from unsupervised learning, we are able to construct a reduced MJS with fewer modes that approximates the original MJS well under various metrics. Furthermore, both theoretically and empirically, we show how one can use the reduced MJS to analyze stability and design controllers with significant reduction in computational cost while achieving guaranteed accuracy.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"335-353"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09913637.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-28DOI: 10.1109/OJCSYS.2022.3210453
Mohammadreza Doostmohammadian;Alireza Aghasi;Apostolos I. Rikos;Andreas Grammenos;Evangelia Kalyvianaki;Christoforos N. Hadjicostis;Karl H. Johansson;Themistoklis Charalambous
This paper considers distributed allocation strategies, formulated as a distributed sum-preserving (fixed-sum) allocation of resources over a multi-agent network in the presence of heterogeneous arbitrary time-varying delays. We propose a double time-scale scenario for unknown delays and a faster single time-scale scenario for known delays. Further, the links among the nodes are considered subject to certain nonlinearities (e.g, quantization and saturation/clipping). We discuss different models for nonlinearities and how they may affect the convergence, sum-preserving feasibility constraint, and solution optimality over general weight-balanced uniformly strongly connected networks and, further, time-delayed undirected networks. Our proposed scheme works in a variety of applications with general non-quadratic strongly-convex smooth objective functions. The non-quadratic part, for example, can be due to additive convex penalty or barrier functions to address the local box constraints. The network can change over time, is not necessarily connected at all times, but is only assumed to be uniformly-connected. The novelty of this work is to address all-time feasible Laplacian gradient solutions in presence of nonlinearities, switching digraph topology (not necessarily all-time connected), and heterogeneous time-varying delays.
{"title":"Distributed Anytime-Feasible Resource Allocation Subject to Heterogeneous Time-Varying Delays","authors":"Mohammadreza Doostmohammadian;Alireza Aghasi;Apostolos I. Rikos;Andreas Grammenos;Evangelia Kalyvianaki;Christoforos N. Hadjicostis;Karl H. Johansson;Themistoklis Charalambous","doi":"10.1109/OJCSYS.2022.3210453","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3210453","url":null,"abstract":"This paper considers distributed allocation strategies, formulated as a distributed sum-preserving (fixed-sum) allocation of resources over a multi-agent network in the presence of heterogeneous arbitrary time-varying delays. We propose a double time-scale scenario for unknown delays and a faster single time-scale scenario for known delays. Further, the links among the nodes are considered subject to certain nonlinearities (e.g, quantization and saturation/clipping). We discuss different models for nonlinearities and how they may affect the convergence, sum-preserving feasibility constraint, and solution optimality over general weight-balanced uniformly strongly connected networks and, further, time-delayed undirected networks. Our proposed scheme works in a variety of applications with general non-quadratic strongly-convex smooth objective functions. The non-quadratic part, for example, can be due to additive convex penalty or barrier functions to address the local box constraints. The network can change over time, is not necessarily connected at all times, but is only assumed to be uniformly-connected. The novelty of this work is to address all-time feasible Laplacian gradient solutions in presence of nonlinearities, switching digraph topology (not necessarily all-time connected), and heterogeneous time-varying delays.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"255-267"},"PeriodicalIF":0.0,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09904851.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50348955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-28DOI: 10.1109/OJCSYS.2022.3209945
Zahra Marvi;Bahare Kiumarsi
The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.
{"title":"Reinforcement Learning With Safety and Stability Guarantees During Exploration For Linear Systems","authors":"Zahra Marvi;Bahare Kiumarsi","doi":"10.1109/OJCSYS.2022.3209945","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3209945","url":null,"abstract":"The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"322-334"},"PeriodicalIF":0.0,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09904857.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}