未知离散线性系统的输出反馈H∞控制:非策略强化学习

P. Tooranjipour, Bahare Kiumarsi-Khomartash
{"title":"未知离散线性系统的输出反馈H∞控制:非策略强化学习","authors":"P. Tooranjipour, Bahare Kiumarsi-Khomartash","doi":"10.1109/CDC45484.2021.9683057","DOIUrl":null,"url":null,"abstract":"In this paper, a data-driven output feedback approach is developed for solving H∞ control problem of linear discrete-time systems based on off-policy reinforcement learning (RL) algorithm. Past input-output measurements are leveraged to implicitly reconstruct the system's states to alleviate the requirement to measure or estimate the system's states. Then, an off-policy input-output Bellman equation is derived based on this implicit reconstruction to evaluate control policies using only input-output measurements. An improved control policy is then learned utilizing the solution to the Bellman equation without knowing the system's dynamics. In the proposed approach, unlike the on-policy methods, the disturbance does not need to be updated in a predefined manner at each iteration, which makes it more practical. While the state-feedback off-policy RL method is shown to be a bias-free approach for deterministic systems, it is shown that once the system's states have been reconstructed from the input-output measurements, the input-output off-policy method cannot be considered as an immune approach against the probing noises. To cope with this, a discount factor is utilized in the performance function to decay the deleterious effect of probing noises. Finally, to illustrate the sensitivity of the problem to the probing noises and the efficacy of the proposed approach, the flight control system is tested in the simulation.","PeriodicalId":229089,"journal":{"name":"2021 60th IEEE Conference on Decision and Control (CDC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Output Feedback H∞ Control of Unknown Discrete-time Linear Systems: Off-policy Reinforcement Learning\",\"authors\":\"P. Tooranjipour, Bahare Kiumarsi-Khomartash\",\"doi\":\"10.1109/CDC45484.2021.9683057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a data-driven output feedback approach is developed for solving H∞ control problem of linear discrete-time systems based on off-policy reinforcement learning (RL) algorithm. Past input-output measurements are leveraged to implicitly reconstruct the system's states to alleviate the requirement to measure or estimate the system's states. Then, an off-policy input-output Bellman equation is derived based on this implicit reconstruction to evaluate control policies using only input-output measurements. An improved control policy is then learned utilizing the solution to the Bellman equation without knowing the system's dynamics. In the proposed approach, unlike the on-policy methods, the disturbance does not need to be updated in a predefined manner at each iteration, which makes it more practical. While the state-feedback off-policy RL method is shown to be a bias-free approach for deterministic systems, it is shown that once the system's states have been reconstructed from the input-output measurements, the input-output off-policy method cannot be considered as an immune approach against the probing noises. To cope with this, a discount factor is utilized in the performance function to decay the deleterious effect of probing noises. Finally, to illustrate the sensitivity of the problem to the probing noises and the efficacy of the proposed approach, the flight control system is tested in the simulation.\",\"PeriodicalId\":229089,\"journal\":{\"name\":\"2021 60th IEEE Conference on Decision and Control (CDC)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 60th IEEE Conference on Decision and Control (CDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC45484.2021.9683057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 60th IEEE Conference on Decision and Control (CDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC45484.2021.9683057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

针对线性离散系统的H∞控制问题,提出了一种基于非策略强化学习算法的数据驱动输出反馈方法。利用过去的输入-输出度量来隐式地重建系统的状态,以减轻度量或估计系统状态的需求。然后,在此隐式重构的基础上推导出非策略输入输出Bellman方程,仅使用输入输出度量来评估控制策略。然后在不知道系统动力学的情况下,利用Bellman方程的解学习改进的控制策略。在本文提出的方法中,与on-policy方法不同,扰动不需要在每次迭代中以预定义的方式更新,这使得它更实用。虽然状态反馈脱策略强化学习方法被证明是确定性系统的一种无偏置方法,但研究表明,一旦系统的状态从输入-输出测量中重构出来,输入-输出脱策略方法就不能被认为是一种抵抗探测噪声的免疫方法。为了解决这一问题,在性能函数中使用了一个折现因子来衰减探测噪声的有害影响。最后,为了说明问题对探测噪声的敏感性和所提方法的有效性,在仿真中对飞行控制系统进行了测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Output Feedback H∞ Control of Unknown Discrete-time Linear Systems: Off-policy Reinforcement Learning
In this paper, a data-driven output feedback approach is developed for solving H∞ control problem of linear discrete-time systems based on off-policy reinforcement learning (RL) algorithm. Past input-output measurements are leveraged to implicitly reconstruct the system's states to alleviate the requirement to measure or estimate the system's states. Then, an off-policy input-output Bellman equation is derived based on this implicit reconstruction to evaluate control policies using only input-output measurements. An improved control policy is then learned utilizing the solution to the Bellman equation without knowing the system's dynamics. In the proposed approach, unlike the on-policy methods, the disturbance does not need to be updated in a predefined manner at each iteration, which makes it more practical. While the state-feedback off-policy RL method is shown to be a bias-free approach for deterministic systems, it is shown that once the system's states have been reconstructed from the input-output measurements, the input-output off-policy method cannot be considered as an immune approach against the probing noises. To cope with this, a discount factor is utilized in the performance function to decay the deleterious effect of probing noises. Finally, to illustrate the sensitivity of the problem to the probing noises and the efficacy of the proposed approach, the flight control system is tested in the simulation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Computationally Efficient LQR based Model Predictive Control Scheme for Discrete-Time Switched Linear Systems Stability Analysis of LTI Fractional-order Systems with Distributed Delay Nonlinear Data-Driven Control via State-Dependent Representations Constraint-based Verification of Formation Control Robust Output Set-Point Tracking for a Power Flow Controller via Forwarding Design
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1