Lan Mei, Thorir Mar Ingolfsson, Cristian Cioflan, Victor Kartsch, Andrea Cossettini, Xiaying Wang, Luca Benini
Driven by the progress in efficient embedded processing, there is an accelerating trend toward running machine learning models directly on wearable Brain-Machine Interfaces (BMIs) to improve portability and privacy and maximize battery life. However, achieving low latency and high classification performance remains challenging due to the inherent variability of electroencephalographic (EEG) signals across sessions and the limited onboard resources. This work proposes a comprehensive BMI workflow based on a CNN-based Continual Learning (CL) framework, allowing the system to adapt to inter-session changes. The workflow is deployed on a wearable, parallel ultra-low power BMI platform (BioGAP). Our results based on two in-house datasets, Dataset A and Dataset B, show that the CL workflow improves average accuracy by up to 30.36% and 10.17%, respectively. Furthermore, when implementing the continual learning on a Parallel Ultra-Low Power (PULP) microcontroller (GAP9), it achieves an energy consumption as low as 0.45mJ per inference and an adaptation time of only 21.5ms, yielding around 25h of battery life with a small 100mAh, 3.7V battery on BioGAP. Our setup, coupled with the compact CNN model and on-device CL capabilities, meets users' needs for improved privacy, reduced latency, and enhanced inter-session performance, offering good promise for smart embedded real-world BMIs.
{"title":"An Ultra-Low Power Wearable BMI System with Continual Learning Capabilities","authors":"Lan Mei, Thorir Mar Ingolfsson, Cristian Cioflan, Victor Kartsch, Andrea Cossettini, Xiaying Wang, Luca Benini","doi":"arxiv-2409.10654","DOIUrl":"https://doi.org/arxiv-2409.10654","url":null,"abstract":"Driven by the progress in efficient embedded processing, there is an\u0000accelerating trend toward running machine learning models directly on wearable\u0000Brain-Machine Interfaces (BMIs) to improve portability and privacy and maximize\u0000battery life. However, achieving low latency and high classification\u0000performance remains challenging due to the inherent variability of\u0000electroencephalographic (EEG) signals across sessions and the limited onboard\u0000resources. This work proposes a comprehensive BMI workflow based on a CNN-based\u0000Continual Learning (CL) framework, allowing the system to adapt to\u0000inter-session changes. The workflow is deployed on a wearable, parallel\u0000ultra-low power BMI platform (BioGAP). Our results based on two in-house\u0000datasets, Dataset A and Dataset B, show that the CL workflow improves average\u0000accuracy by up to 30.36% and 10.17%, respectively. Furthermore, when\u0000implementing the continual learning on a Parallel Ultra-Low Power (PULP)\u0000microcontroller (GAP9), it achieves an energy consumption as low as 0.45mJ per\u0000inference and an adaptation time of only 21.5ms, yielding around 25h of battery\u0000life with a small 100mAh, 3.7V battery on BioGAP. Our setup, coupled with the\u0000compact CNN model and on-device CL capabilities, meets users' needs for\u0000improved privacy, reduced latency, and enhanced inter-session performance,\u0000offering good promise for smart embedded real-world BMIs.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analyzing and certifying stability and attractivity of nonlinear systems is a topic of research interest that has been extensively investigated by control theorists and engineers for many years. Despite that, accurately estimating domains of attraction for nonlinear systems remains a challenging task, where available estimation approaches are either conservative or limited to low-dimensional systems. In this work, we propose an iterative approach to accurately underapproximate safe (i.e., state-constrained) domains of attraction for general discrete-time autonomous nonlinear systems. Our approach relies on implicit representations of safe backward reachable sets of safe regions of attraction, where such regions can be be easily constructed using, e.g., quadratic Lyapunov functions. The iterations of our approach are monotonic (in the sense of set inclusion), where each iteration results in a safe region of attraction, given as a sublevel set, that underapproximates the safe domain of attraction. The sublevel set representations of the resulting regions of attraction can be efficiently utilized in verifying the inclusion of given points of interest in the safe domain of attraction. We illustrate our approach through two numerical examples, involving two- and four-dimensional nonlinear systems.
{"title":"Underapproximating Safe Domains of Attraction for Discrete-Time Systems Using Implicit Representations of Backward Reachable Sets","authors":"Mohamed Serry, Jun Liu","doi":"arxiv-2409.10657","DOIUrl":"https://doi.org/arxiv-2409.10657","url":null,"abstract":"Analyzing and certifying stability and attractivity of nonlinear systems is a\u0000topic of research interest that has been extensively investigated by control\u0000theorists and engineers for many years. Despite that, accurately estimating\u0000domains of attraction for nonlinear systems remains a challenging task, where\u0000available estimation approaches are either conservative or limited to\u0000low-dimensional systems. In this work, we propose an iterative approach to\u0000accurately underapproximate safe (i.e., state-constrained) domains of\u0000attraction for general discrete-time autonomous nonlinear systems. Our approach\u0000relies on implicit representations of safe backward reachable sets of safe\u0000regions of attraction, where such regions can be be easily constructed using,\u0000e.g., quadratic Lyapunov functions. The iterations of our approach are\u0000monotonic (in the sense of set inclusion), where each iteration results in a\u0000safe region of attraction, given as a sublevel set, that underapproximates the\u0000safe domain of attraction. The sublevel set representations of the resulting\u0000regions of attraction can be efficiently utilized in verifying the inclusion of\u0000given points of interest in the safe domain of attraction. We illustrate our\u0000approach through two numerical examples, involving two- and four-dimensional\u0000nonlinear systems.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In stochastic systems, risk-sensitive control balances performance with resilience to less likely events. Although existing methods rely on finite-horizon risk criteria, this paper introduces textit{limiting-risk criteria} that capture long-term cumulative risks through probabilistic limiting theorems. Extending the Linear Quadratic Regulation (LQR) framework, we incorporate constraints on these limiting-risk criteria derived from the asymptotic behavior of cumulative costs, accounting for extreme deviations. Using tailored Functional Central Limit Theorems (FCLT), we demonstrate that the time-correlated terms in the limiting-risk criteria converge under strong ergodicity, and establish conditions for convergence in non-stationary settings while characterizing the distribution and providing explicit formulations for the limiting variance of the risk functional. The FCLT is developed by applying ergodic theory for Markov chains and obtaining textit{uniform ergodicity} of the controlled process. For quadratic risk functionals on linear dynamics, in addition to internal stability, the uniform ergodicity requires the (possibly heavy-tailed) dynamic noise to have a finite fourth moment. This offers a clear path to quantifying long-term uncertainty. We also propose a primal-dual constrained policy optimization method that optimizes the average performance while ensuring limiting-risk constraints are satisfied. Our framework offers a practical, theoretically guaranteed approach for long-term risk-sensitive control, backed by convergence guarantees and validations through simulations.
{"title":"Uniform Ergodicity and Ergodic-Risk Constrained Policy Optimization","authors":"Shahriar Talebi, Na Li","doi":"arxiv-2409.10767","DOIUrl":"https://doi.org/arxiv-2409.10767","url":null,"abstract":"In stochastic systems, risk-sensitive control balances performance with\u0000resilience to less likely events. Although existing methods rely on\u0000finite-horizon risk criteria, this paper introduces textit{limiting-risk\u0000criteria} that capture long-term cumulative risks through probabilistic\u0000limiting theorems. Extending the Linear Quadratic Regulation (LQR) framework,\u0000we incorporate constraints on these limiting-risk criteria derived from the\u0000asymptotic behavior of cumulative costs, accounting for extreme deviations.\u0000Using tailored Functional Central Limit Theorems (FCLT), we demonstrate that\u0000the time-correlated terms in the limiting-risk criteria converge under strong\u0000ergodicity, and establish conditions for convergence in non-stationary settings\u0000while characterizing the distribution and providing explicit formulations for\u0000the limiting variance of the risk functional. The FCLT is developed by applying\u0000ergodic theory for Markov chains and obtaining textit{uniform ergodicity} of\u0000the controlled process. For quadratic risk functionals on linear dynamics, in\u0000addition to internal stability, the uniform ergodicity requires the (possibly\u0000heavy-tailed) dynamic noise to have a finite fourth moment. This offers a clear\u0000path to quantifying long-term uncertainty. We also propose a primal-dual\u0000constrained policy optimization method that optimizes the average performance\u0000while ensuring limiting-risk constraints are satisfied. Our framework offers a\u0000practical, theoretically guaranteed approach for long-term risk-sensitive\u0000control, backed by convergence guarantees and validations through simulations.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this letter, we study the proximal gradient dynamics. This recently-proposed continuous-time dynamics solves optimization problems whose cost functions are separable into a nonsmooth convex and a smooth component. First, we show that the cost function decreases monotonically along the trajectories of the proximal gradient dynamics. We then introduce a new condition that guarantees exponential convergence of the cost function to its optimal value, and show that this condition implies the proximal Polyak-{L}ojasiewicz condition. We also show that the proximal Polyak-{L}ojasiewicz condition guarantees exponential convergence of the cost function. Moreover, we extend these results to time-varying optimization problems, providing bounds for equilibrium tracking. Finally, we discuss applications of these findings, including the LASSO problem, quadratic optimization with polytopic constraints, and certain matrix based problems.
{"title":"Proximal Gradient Dynamics: Monotonicity, Exponential Convergence, and Applications","authors":"Anand Gokhale, Alexander Davydov, Francesco Bullo","doi":"arxiv-2409.10664","DOIUrl":"https://doi.org/arxiv-2409.10664","url":null,"abstract":"In this letter, we study the proximal gradient dynamics. This\u0000recently-proposed continuous-time dynamics solves optimization problems whose\u0000cost functions are separable into a nonsmooth convex and a smooth component.\u0000First, we show that the cost function decreases monotonically along the\u0000trajectories of the proximal gradient dynamics. We then introduce a new\u0000condition that guarantees exponential convergence of the cost function to its\u0000optimal value, and show that this condition implies the proximal\u0000Polyak-{L}ojasiewicz condition. We also show that the proximal\u0000Polyak-{L}ojasiewicz condition guarantees exponential convergence of the cost\u0000function. Moreover, we extend these results to time-varying optimization\u0000problems, providing bounds for equilibrium tracking. Finally, we discuss\u0000applications of these findings, including the LASSO problem, quadratic\u0000optimization with polytopic constraints, and certain matrix based problems.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonda Fourati, Wael Jaafar, Noura Baccar, Safwan Alfattani
Large Language Models (LLMs) have showcased remarkable proficiency in various information-processing tasks. These tasks span from extracting data and summarizing literature to generating content, predictive modeling, decision-making, and system controls. Moreover, Vision Large Models (VLMs) and Multimodal LLMs (MLLMs), which represent the next generation of language models, a.k.a., XLMs, can combine and integrate many data modalities with the strength of language understanding, thus advancing several information-based systems, such as Autonomous Driving Systems (ADS). Indeed, by combining language communication with multimodal sensory inputs, e.g., panoramic images and LiDAR or radar data, accurate driving actions can be taken. In this context, we provide in this survey paper a comprehensive overview of the potential of XLMs towards achieving autonomous driving. Specifically, we review the relevant literature on ADS and XLMs, including their architectures, tools, and frameworks. Then, we detail the proposed approaches to deploy XLMs for autonomous driving solutions. Finally, we provide the related challenges to XLM deployment for ADS and point to future research directions aiming to enable XLM adoption in future ADS frameworks.
{"title":"XLM for Autonomous Driving Systems: A Comprehensive Review","authors":"Sonda Fourati, Wael Jaafar, Noura Baccar, Safwan Alfattani","doi":"arxiv-2409.10484","DOIUrl":"https://doi.org/arxiv-2409.10484","url":null,"abstract":"Large Language Models (LLMs) have showcased remarkable proficiency in various\u0000information-processing tasks. These tasks span from extracting data and\u0000summarizing literature to generating content, predictive modeling,\u0000decision-making, and system controls. Moreover, Vision Large Models (VLMs) and\u0000Multimodal LLMs (MLLMs), which represent the next generation of language\u0000models, a.k.a., XLMs, can combine and integrate many data modalities with the\u0000strength of language understanding, thus advancing several information-based\u0000systems, such as Autonomous Driving Systems (ADS). Indeed, by combining\u0000language communication with multimodal sensory inputs, e.g., panoramic images\u0000and LiDAR or radar data, accurate driving actions can be taken. In this\u0000context, we provide in this survey paper a comprehensive overview of the\u0000potential of XLMs towards achieving autonomous driving. Specifically, we review\u0000the relevant literature on ADS and XLMs, including their architectures, tools,\u0000and frameworks. Then, we detail the proposed approaches to deploy XLMs for\u0000autonomous driving solutions. Finally, we provide the related challenges to XLM\u0000deployment for ADS and point to future research directions aiming to enable XLM\u0000adoption in future ADS frameworks.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elia Mateu-Barriendos, Onur Alican, Javier Renedo, Carlos Collados-Rodriguez, Macarena Martin, Edgar Nuño, Eduardo Prieto-Araujo, Oriol Gomis-Bellmunt
Inter-area oscillations have been extensively studied in conventional power systems dominated by synchronous machines, as well as methods to mitigate them. Several publications have addressed Power Oscillation Damping (POD) controllers in grid-following voltage source converters (GFOL). However, the performance of POD controllers for Grid-Forming voltage source converters (GFOR) in modern power systems with increased penetration of power electronics requires further investigation. This paper investigates the performance of GFORs and supplementary POD controllers in the damping of electromechanical oscillations in modern power systems. This paper proposes POD controllers in GFORs by supplementary modulation of active- and reactive-power injections of the converter and both simultaneously (POD- P, POD-Q and POD-PQ, respectively). The proposed POD controllers use the frequency imposed by the GFOR as the input signal, which has a simple implementation and it eliminates the need for additional measurements. Eigenvalue-sensitivity methods using a synthetic test system are applied to the design of POD controllers in GFORs, which is useful when limited information of the power system is available. This paper demonstrates the effectiveness of POD controllers in GFOR converters to damp electromechanical oscillations, by small-signal stability analysis and non-linear time-domain simulations in a small test system and in a large-scale power system.
在以同步电机为主导的传统电力系统中,人们已经广泛研究了区域间振荡以及缓解振荡的方法。然而,在电力电子技术渗透率不断提高的现代电力系统中,电网电压源变换器(GFOR)的功率振荡抑制(POD)控制器的性能还需要进一步研究。本文研究了现代电力系统中 GFOR 和辅助 POD 控制器在抑制机电振荡方面的性能。本文提出了 GFOR 中的 POD 控制器,即同时对变流器的有功功率注入和无功功率注入进行补充调制(分别为 POD-P、POD-Q 和 POD-PQ)。拟议的 POD 控制器使用 GFOR 施加的频率作为输入信号,实现简单,无需额外测量。使用合成测试系统的特征值灵敏度方法被应用于 GFOR 中 POD 控制器的设计,这在电力系统信息有限的情况下非常有用。本文通过在小型测试系统和大型电力系统中进行小信号稳定性分析和非线性时域仿真,证明了 POD 控制器在 GFOR 变流器中抑制机电振荡的有效性。
{"title":"Power Oscillation Damping Controllers for Grid-Forming Power Converters in Modern PowerSystems","authors":"Elia Mateu-Barriendos, Onur Alican, Javier Renedo, Carlos Collados-Rodriguez, Macarena Martin, Edgar Nuño, Eduardo Prieto-Araujo, Oriol Gomis-Bellmunt","doi":"arxiv-2409.10726","DOIUrl":"https://doi.org/arxiv-2409.10726","url":null,"abstract":"Inter-area oscillations have been extensively studied in conventional power\u0000systems dominated by synchronous machines, as well as methods to mitigate them.\u0000Several publications have addressed Power Oscillation Damping (POD) controllers\u0000in grid-following voltage source converters (GFOL). However, the performance of\u0000POD controllers for Grid-Forming voltage source converters (GFOR) in modern\u0000power systems with increased penetration of power electronics requires further\u0000investigation. This paper investigates the performance of GFORs and\u0000supplementary POD controllers in the damping of electromechanical oscillations\u0000in modern power systems. This paper proposes POD controllers in GFORs by\u0000supplementary modulation of active- and reactive-power injections of the\u0000converter and both simultaneously (POD- P, POD-Q and POD-PQ, respectively). The\u0000proposed POD controllers use the frequency imposed by the GFOR as the input\u0000signal, which has a simple implementation and it eliminates the need for\u0000additional measurements. Eigenvalue-sensitivity methods using a synthetic test\u0000system are applied to the design of POD controllers in GFORs, which is useful\u0000when limited information of the power system is available. This paper\u0000demonstrates the effectiveness of POD controllers in GFOR converters to damp\u0000electromechanical oscillations, by small-signal stability analysis and\u0000non-linear time-domain simulations in a small test system and in a large-scale\u0000power system.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Flögel, Marcos Gómez Villafañe, Joshua Ransiek, Sören Hohmann
Autonomous mobile robots are increasingly employed in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL-based navigation framework for uncertainty estimates in decision-making. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of Deep Ensembles and Monte-Carlo Dropout (MC-Dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show that the ODV-PPO algorithm converges faster with better generalization and disentangles the aleatoric and epistemic uncertainties. In addition, the MC-Dropout approach is more sensitive to perturbations and capable to correlate the uncertainty type to the perturbation type better. With the proposed safe action selection scheme, the robot can navigate in perturbed environments with fewer collisions.
{"title":"Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning","authors":"Daniel Flögel, Marcos Gómez Villafañe, Joshua Ransiek, Sören Hohmann","doi":"arxiv-2409.10655","DOIUrl":"https://doi.org/arxiv-2409.10655","url":null,"abstract":"Autonomous mobile robots are increasingly employed in pedestrian-rich\u0000environments where safe navigation and appropriate human interaction are\u0000crucial. While Deep Reinforcement Learning (DRL) enables socially integrated\u0000robot behavior, challenges persist in novel or perturbed scenarios to indicate\u0000when and why the policy is uncertain. Unknown uncertainty in decision-making\u0000can lead to collisions or human discomfort and is one reason why safe and\u0000risk-aware navigation is still an open problem. This work introduces a novel\u0000approach that integrates aleatoric, epistemic, and predictive uncertainty\u0000estimation into a DRL-based navigation framework for uncertainty estimates in\u0000decision-making. We, therefore, incorporate Observation-Dependent Variance\u0000(ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For\u0000different types of perturbations, we compare the ability of Deep Ensembles and\u0000Monte-Carlo Dropout (MC-Dropout) to estimate the uncertainties of the policy.\u0000In uncertain decision-making situations, we propose to change the robot's\u0000social behavior to conservative collision avoidance. The results show that the\u0000ODV-PPO algorithm converges faster with better generalization and disentangles\u0000the aleatoric and epistemic uncertainties. In addition, the MC-Dropout approach\u0000is more sensitive to perturbations and capable to correlate the uncertainty\u0000type to the perturbation type better. With the proposed safe action selection\u0000scheme, the robot can navigate in perturbed environments with fewer collisions.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we introduce a novel gradient descent-based approach for optimizing control systems, leveraging a new representation of stable closed-loop dynamics as a function of two matrices i.e. the step size or direction matrix and value matrix of the Lyapunov cost function. This formulation provides a new framework for analyzing and designing feedback control laws. We show that any stable closed-loop system can be expressed in this form with appropriate values for the step size and value matrices. Furthermore, we show that this parameterization of the closed-loop system is equivalent to a linear quadratic regulator for appropriately chosen weighting matrices. We also show that trajectories can be shaped using this approach to achieve a desired closed-loop behavior.
{"title":"Trajectory-Oriented Control Using Gradient Descent: An Unconventional Approach","authors":"Ramin Esmzad, Hamidreza Modares","doi":"arxiv-2409.10662","DOIUrl":"https://doi.org/arxiv-2409.10662","url":null,"abstract":"In this work, we introduce a novel gradient descent-based approach for\u0000optimizing control systems, leveraging a new representation of stable\u0000closed-loop dynamics as a function of two matrices i.e. the step size or\u0000direction matrix and value matrix of the Lyapunov cost function. This\u0000formulation provides a new framework for analyzing and designing feedback\u0000control laws. We show that any stable closed-loop system can be expressed in\u0000this form with appropriate values for the step size and value matrices.\u0000Furthermore, we show that this parameterization of the closed-loop system is\u0000equivalent to a linear quadratic regulator for appropriately chosen weighting\u0000matrices. We also show that trajectories can be shaped using this approach to\u0000achieve a desired closed-loop behavior.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a one-shot learning approach with performance and robustness guarantees for the linear quadratic regulator (LQR) control of stochastic linear systems. Even though data-based LQR control has been widely considered, existing results suffer either from data hungriness due to the inherently iterative nature of the optimization formulation (e.g., value learning or policy gradient reinforcement learning algorithms) or from a lack of robustness guarantees in one-shot non-iterative algorithms. To avoid data hungriness while ensuing robustness guarantees, an adaptive dynamic programming formalization of the LQR is presented that relies on solving a Bellman inequality. The control gain and the value function are directly learned by using a control-oriented approach that characterizes the closed-loop system using data and a decision variable from which the control is obtained. This closed-loop characterization is noise-dependent. The effect of the closed-loop system noise on the Bellman inequality is considered to ensure both robust stability and suboptimal performance despite ignoring the measurement noise. To ensure robust stability, it is shown that this system characterization leads to a closed-loop system with multiplicative and additive noise, enabling the application of distributional robust control techniques. The analysis of the suboptimality gap reveals that robustness can be achieved without the need for regularization or parameter tuning. The simulation results on the active car suspension problem demonstrate the superiority of the proposed method in terms of robustness and performance gap compared to existing methods.
{"title":"Direct Data-Driven Discounted Infinite Horizon Linear Quadratic Regulator with Robustness Guarantees","authors":"Ramin Esmzad, Hamidreza Modares","doi":"arxiv-2409.10703","DOIUrl":"https://doi.org/arxiv-2409.10703","url":null,"abstract":"This paper presents a one-shot learning approach with performance and\u0000robustness guarantees for the linear quadratic regulator (LQR) control of\u0000stochastic linear systems. Even though data-based LQR control has been widely\u0000considered, existing results suffer either from data hungriness due to the\u0000inherently iterative nature of the optimization formulation (e.g., value\u0000learning or policy gradient reinforcement learning algorithms) or from a lack\u0000of robustness guarantees in one-shot non-iterative algorithms. To avoid data\u0000hungriness while ensuing robustness guarantees, an adaptive dynamic programming\u0000formalization of the LQR is presented that relies on solving a Bellman\u0000inequality. The control gain and the value function are directly learned by\u0000using a control-oriented approach that characterizes the closed-loop system\u0000using data and a decision variable from which the control is obtained. This\u0000closed-loop characterization is noise-dependent. The effect of the closed-loop\u0000system noise on the Bellman inequality is considered to ensure both robust\u0000stability and suboptimal performance despite ignoring the measurement noise. To\u0000ensure robust stability, it is shown that this system characterization leads to\u0000a closed-loop system with multiplicative and additive noise, enabling the\u0000application of distributional robust control techniques. The analysis of the\u0000suboptimality gap reveals that robustness can be achieved without the need for\u0000regularization or parameter tuning. The simulation results on the active car\u0000suspension problem demonstrate the superiority of the proposed method in terms\u0000of robustness and performance gap compared to existing methods.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Wang, Ayal Taitler, Scott Sanner, Baher Abdulhai
Efficient traffic signal control is essential for managing urban transportation, minimizing congestion, and improving safety and sustainability. Reinforcement Learning (RL) has emerged as a promising approach to enhancing adaptive traffic signal control (ATSC) systems, allowing controllers to learn optimal policies through interaction with the environment. However, challenges arise due to partial observability (PO) in traffic networks, where agents have limited visibility, hindering effectiveness. This paper presents the integration of Transformer-based controllers into ATSC systems to address PO effectively. We propose strategies to enhance training efficiency and effectiveness, demonstrating improved coordination capabilities in real-world scenarios. The results showcase the Transformer-based model's ability to capture significant information from historical observations, leading to better control policies and improved traffic flow. This study highlights the potential of leveraging the advanced Transformer architecture to enhance urban transportation management.
{"title":"Mitigating Partial Observability in Adaptive Traffic Signal Control with Transformers","authors":"Xiaoyu Wang, Ayal Taitler, Scott Sanner, Baher Abdulhai","doi":"arxiv-2409.10693","DOIUrl":"https://doi.org/arxiv-2409.10693","url":null,"abstract":"Efficient traffic signal control is essential for managing urban\u0000transportation, minimizing congestion, and improving safety and sustainability.\u0000Reinforcement Learning (RL) has emerged as a promising approach to enhancing\u0000adaptive traffic signal control (ATSC) systems, allowing controllers to learn\u0000optimal policies through interaction with the environment. However, challenges\u0000arise due to partial observability (PO) in traffic networks, where agents have\u0000limited visibility, hindering effectiveness. This paper presents the\u0000integration of Transformer-based controllers into ATSC systems to address PO\u0000effectively. We propose strategies to enhance training efficiency and\u0000effectiveness, demonstrating improved coordination capabilities in real-world\u0000scenarios. The results showcase the Transformer-based model's ability to\u0000capture significant information from historical observations, leading to better\u0000control policies and improved traffic flow. This study highlights the potential\u0000of leveraging the advanced Transformer architecture to enhance urban\u0000transportation management.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"92 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142264036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}