{"title":"Direct Data-Driven Discounted Infinite Horizon Linear Quadratic Regulator with Robustness Guarantees","authors":"Ramin Esmzad, Hamidreza Modares","doi":"arxiv-2409.10703","DOIUrl":null,"url":null,"abstract":"This paper presents a one-shot learning approach with performance and\nrobustness guarantees for the linear quadratic regulator (LQR) control of\nstochastic linear systems. Even though data-based LQR control has been widely\nconsidered, existing results suffer either from data hungriness due to the\ninherently iterative nature of the optimization formulation (e.g., value\nlearning or policy gradient reinforcement learning algorithms) or from a lack\nof robustness guarantees in one-shot non-iterative algorithms. To avoid data\nhungriness while ensuing robustness guarantees, an adaptive dynamic programming\nformalization of the LQR is presented that relies on solving a Bellman\ninequality. The control gain and the value function are directly learned by\nusing a control-oriented approach that characterizes the closed-loop system\nusing data and a decision variable from which the control is obtained. This\nclosed-loop characterization is noise-dependent. The effect of the closed-loop\nsystem noise on the Bellman inequality is considered to ensure both robust\nstability and suboptimal performance despite ignoring the measurement noise. To\nensure robust stability, it is shown that this system characterization leads to\na closed-loop system with multiplicative and additive noise, enabling the\napplication of distributional robust control techniques. The analysis of the\nsuboptimality gap reveals that robustness can be achieved without the need for\nregularization or parameter tuning. The simulation results on the active car\nsuspension problem demonstrate the superiority of the proposed method in terms\nof robustness and performance gap compared to existing methods.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a one-shot learning approach with performance and
robustness guarantees for the linear quadratic regulator (LQR) control of
stochastic linear systems. Even though data-based LQR control has been widely
considered, existing results suffer either from data hungriness due to the
inherently iterative nature of the optimization formulation (e.g., value
learning or policy gradient reinforcement learning algorithms) or from a lack
of robustness guarantees in one-shot non-iterative algorithms. To avoid data
hungriness while ensuing robustness guarantees, an adaptive dynamic programming
formalization of the LQR is presented that relies on solving a Bellman
inequality. The control gain and the value function are directly learned by
using a control-oriented approach that characterizes the closed-loop system
using data and a decision variable from which the control is obtained. This
closed-loop characterization is noise-dependent. The effect of the closed-loop
system noise on the Bellman inequality is considered to ensure both robust
stability and suboptimal performance despite ignoring the measurement noise. To
ensure robust stability, it is shown that this system characterization leads to
a closed-loop system with multiplicative and additive noise, enabling the
application of distributional robust control techniques. The analysis of the
suboptimality gap reveals that robustness can be achieved without the need for
regularization or parameter tuning. The simulation results on the active car
suspension problem demonstrate the superiority of the proposed method in terms
of robustness and performance gap compared to existing methods.