Output Feedback H∞ Control of Unknown Discrete-time Linear Systems: Off-policy Reinforcement Learning

2021 60th IEEE Conference on Decision and Control (CDC) Pub Date : 2021-12-14 DOI:10.1109/CDC45484.2021.9683057

P. Tooranjipour, Bahare Kiumarsi-Khomartash

{"title":"Output Feedback H∞ Control of Unknown Discrete-time Linear Systems: Off-policy Reinforcement Learning","authors":"P. Tooranjipour, Bahare Kiumarsi-Khomartash","doi":"10.1109/CDC45484.2021.9683057","DOIUrl":null,"url":null,"abstract":"In this paper, a data-driven output feedback approach is developed for solving H∞ control problem of linear discrete-time systems based on off-policy reinforcement learning (RL) algorithm. Past input-output measurements are leveraged to implicitly reconstruct the system's states to alleviate the requirement to measure or estimate the system's states. Then, an off-policy input-output Bellman equation is derived based on this implicit reconstruction to evaluate control policies using only input-output measurements. An improved control policy is then learned utilizing the solution to the Bellman equation without knowing the system's dynamics. In the proposed approach, unlike the on-policy methods, the disturbance does not need to be updated in a predefined manner at each iteration, which makes it more practical. While the state-feedback off-policy RL method is shown to be a bias-free approach for deterministic systems, it is shown that once the system's states have been reconstructed from the input-output measurements, the input-output off-policy method cannot be considered as an immune approach against the probing noises. To cope with this, a discount factor is utilized in the performance function to decay the deleterious effect of probing noises. Finally, to illustrate the sensitivity of the problem to the probing noises and the efficacy of the proposed approach, the flight control system is tested in the simulation.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 60th IEEE Conference on Decision and Control (CDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC45484.2021.9683057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this paper, a data-driven output feedback approach is developed for solving H∞ control problem of linear discrete-time systems based on off-policy reinforcement learning (RL) algorithm. Past input-output measurements are leveraged to implicitly reconstruct the system's states to alleviate the requirement to measure or estimate the system's states. Then, an off-policy input-output Bellman equation is derived based on this implicit reconstruction to evaluate control policies using only input-output measurements. An improved control policy is then learned utilizing the solution to the Bellman equation without knowing the system's dynamics. In the proposed approach, unlike the on-policy methods, the disturbance does not need to be updated in a predefined manner at each iteration, which makes it more practical. While the state-feedback off-policy RL method is shown to be a bias-free approach for deterministic systems, it is shown that once the system's states have been reconstructed from the input-output measurements, the input-output off-policy method cannot be considered as an immune approach against the probing noises. To cope with this, a discount factor is utilized in the performance function to decay the deleterious effect of probing noises. Finally, to illustrate the sensitivity of the problem to the probing noises and the efficacy of the proposed approach, the flight control system is tested in the simulation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

未知离散线性系统的输出反馈H∞控制:非策略强化学习

针对线性离散系统的H∞控制问题，提出了一种基于非策略强化学习算法的数据驱动输出反馈方法。利用过去的输入-输出度量来隐式地重建系统的状态，以减轻度量或估计系统状态的需求。然后，在此隐式重构的基础上推导出非策略输入输出Bellman方程，仅使用输入输出度量来评估控制策略。然后在不知道系统动力学的情况下，利用Bellman方程的解学习改进的控制策略。在本文提出的方法中，与on-policy方法不同，扰动不需要在每次迭代中以预定义的方式更新，这使得它更实用。虽然状态反馈脱策略强化学习方法被证明是确定性系统的一种无偏置方法，但研究表明，一旦系统的状态从输入-输出测量中重构出来，输入-输出脱策略方法就不能被认为是一种抵抗探测噪声的免疫方法。为了解决这一问题，在性能函数中使用了一个折现因子来衰减探测噪声的有害影响。最后，为了说明问题对探测噪声的敏感性和所提方法的有效性，在仿真中对飞行控制系统进行了测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 60th IEEE Conference on Decision and Control (CDC)

自引率

0.00%

发文量