Pub Date : 2022-09-28DOI: 10.1109/OJCSYS.2022.3210453
Mohammadreza Doostmohammadian;Alireza Aghasi;Apostolos I. Rikos;Andreas Grammenos;Evangelia Kalyvianaki;Christoforos N. Hadjicostis;Karl H. Johansson;Themistoklis Charalambous
This paper considers distributed allocation strategies, formulated as a distributed sum-preserving (fixed-sum) allocation of resources over a multi-agent network in the presence of heterogeneous arbitrary time-varying delays. We propose a double time-scale scenario for unknown delays and a faster single time-scale scenario for known delays. Further, the links among the nodes are considered subject to certain nonlinearities (e.g, quantization and saturation/clipping). We discuss different models for nonlinearities and how they may affect the convergence, sum-preserving feasibility constraint, and solution optimality over general weight-balanced uniformly strongly connected networks and, further, time-delayed undirected networks. Our proposed scheme works in a variety of applications with general non-quadratic strongly-convex smooth objective functions. The non-quadratic part, for example, can be due to additive convex penalty or barrier functions to address the local box constraints. The network can change over time, is not necessarily connected at all times, but is only assumed to be uniformly-connected. The novelty of this work is to address all-time feasible Laplacian gradient solutions in presence of nonlinearities, switching digraph topology (not necessarily all-time connected), and heterogeneous time-varying delays.
{"title":"Distributed Anytime-Feasible Resource Allocation Subject to Heterogeneous Time-Varying Delays","authors":"Mohammadreza Doostmohammadian;Alireza Aghasi;Apostolos I. Rikos;Andreas Grammenos;Evangelia Kalyvianaki;Christoforos N. Hadjicostis;Karl H. Johansson;Themistoklis Charalambous","doi":"10.1109/OJCSYS.2022.3210453","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3210453","url":null,"abstract":"This paper considers distributed allocation strategies, formulated as a distributed sum-preserving (fixed-sum) allocation of resources over a multi-agent network in the presence of heterogeneous arbitrary time-varying delays. We propose a double time-scale scenario for unknown delays and a faster single time-scale scenario for known delays. Further, the links among the nodes are considered subject to certain nonlinearities (e.g, quantization and saturation/clipping). We discuss different models for nonlinearities and how they may affect the convergence, sum-preserving feasibility constraint, and solution optimality over general weight-balanced uniformly strongly connected networks and, further, time-delayed undirected networks. Our proposed scheme works in a variety of applications with general non-quadratic strongly-convex smooth objective functions. The non-quadratic part, for example, can be due to additive convex penalty or barrier functions to address the local box constraints. The network can change over time, is not necessarily connected at all times, but is only assumed to be uniformly-connected. The novelty of this work is to address all-time feasible Laplacian gradient solutions in presence of nonlinearities, switching digraph topology (not necessarily all-time connected), and heterogeneous time-varying delays.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"255-267"},"PeriodicalIF":0.0,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09904851.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50348955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-28DOI: 10.1109/OJCSYS.2022.3209945
Zahra Marvi;Bahare Kiumarsi
The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.
{"title":"Reinforcement Learning With Safety and Stability Guarantees During Exploration For Linear Systems","authors":"Zahra Marvi;Bahare Kiumarsi","doi":"10.1109/OJCSYS.2022.3209945","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3209945","url":null,"abstract":"The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"322-334"},"PeriodicalIF":0.0,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09904857.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50237539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-15DOI: 10.1109/OJCSYS.2022.3206710
Shayok Mukhopadhyay;Hafiz M. Usman;Habibur Rehman
This paper proposes an accurate and efficient Universal Adaptive Stabilizer (UAS) based online parameters estimation technique for a 400 V Li-ion battery bank. The battery open circuit voltage, parameters modeling the transient response, and series resistance are all estimated in a single real-time test. In contrast to earlier UAS based work on individual battery packs, this work does not require prior offline experimentation or any post-processing. Real time fast convergence of parameters' estimates with minimal experimental effort enables update of battery parameters during run-time. The proposed strategy is mathematically validated and its performance is demonstrated on a 400 V, 6.6 Ah Li-ion battery bank powering an induction motor driven prototype electric vehicle (EV) traction system.
{"title":"Real Time Li-Ion Battery Bank Parameters Estimation via Universal Adaptive Stabilization","authors":"Shayok Mukhopadhyay;Hafiz M. Usman;Habibur Rehman","doi":"10.1109/OJCSYS.2022.3206710","DOIUrl":"https://doi.org/10.1109/OJCSYS.2022.3206710","url":null,"abstract":"This paper proposes an accurate and efficient Universal Adaptive Stabilizer (UAS) based online parameters estimation technique for a 400 V Li-ion battery bank. The battery open circuit voltage, parameters modeling the transient response, and series resistance are all estimated in a single real-time test. In contrast to earlier UAS based work on individual battery packs, this work does not require prior offline experimentation or any post-processing. Real time fast convergence of parameters' estimates with minimal experimental effort enables update of battery parameters during run-time. The proposed strategy is mathematically validated and its performance is demonstrated on a 400 V, 6.6 Ah Li-ion battery bank powering an induction motor driven prototype electric vehicle (EV) traction system.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"268-293"},"PeriodicalIF":0.0,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09893763.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50348786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-09-15DOI: 10.1109/OJCSYS.2022.3207108
Yu Wang;Hussein Sibai;Mark Yen;Sayan Mitra;Geir E. Dullerud
Statistical model checking is a class of sequential algorithms that can verify specifications of interest on an ensemble of cyber-physical systems (e.g., whether 99% of cars from a batch meet a requirement on their functionality). These algorithms infer the probability that given specifications are satisfied by the systems with provable statistical guarantees by drawing sufficient numbers of independent and identically distributed samples. During the process of statistical model checking, the values of the samples (e.g., a user's car trajectory) may be inferred by intruders, causing privacy concerns in consumer-level applications (e.g., automobiles and medical devices). This paper addresses the privacy of statistical model checking algorithms from the point of view of differential privacy. These algorithms are sequential, drawing samples until a condition on their values is met. We show that revealing the number of samples drawn can violate privacy. We also show that the standard exponential mechanism that randomizes the output of an algorithm to achieve differential privacy fails to do so in the context of sequential algorithms. Instead, we relax the conservative requirement in differential privacy that the sensitivity of the output of the algorithm should be bounded to any perturbation for any data set. We propose a new notion of differential privacy which we call expected differential privacy