Discounted Inverse Reinforcement Learning for Linear Quadratic Control

IF 10.5 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Cybernetics Pub Date : 2025-03-04 DOI:10.1109/TCYB.2025.3540967

Han Wu;Qinglei Hu;Jianying Zheng;Fei Dong;Zhenchao Ouyang;Dongyu Li

{"title":"Discounted Inverse Reinforcement Learning for Linear Quadratic Control","authors":"Han Wu;Qinglei Hu;Jianying Zheng;Fei Dong;Zhenchao Ouyang;Dongyu Li","doi":"10.1109/TCYB.2025.3540967","DOIUrl":null,"url":null,"abstract":"Linear quadratic control with unknown value functions and dynamics is extremely challenging, and most of the existing studies have focused on the regulation problem, incapable of dealing with the tracking problem. To solve both linear quadratic regulation and tracking problems for continuous-time systems with unknown value functions, this article develops a discounted inverse reinforcement learning (DIRL) method that inherits the model-independent property of reinforcement learning (RL). More specifically, we first formulate a standard paradigm for solving linear quadratic control using DIRL. To recover the value function and the target control gain, an error metric is elaborately constructed, and a quasi-Newton algorithm is adopted to minimize it. Furthermore, three DIRL algorithms, including model-based, model-free off-policy, and model-free on-policy algorithms, are proposed. The latter two rely on the expert’s demonstration data or the online observed data, requiring no prior knowledge of the system dynamics and value function. The stability, convergence, and existence conditions of multiple solutions are thoroughly analyzed. Finally, numerical simulations demonstrate the effectiveness of the theoretical results.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 4","pages":"1995-2007"},"PeriodicalIF":10.5000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10909692/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Linear quadratic control with unknown value functions and dynamics is extremely challenging, and most of the existing studies have focused on the regulation problem, incapable of dealing with the tracking problem. To solve both linear quadratic regulation and tracking problems for continuous-time systems with unknown value functions, this article develops a discounted inverse reinforcement learning (DIRL) method that inherits the model-independent property of reinforcement learning (RL). More specifically, we first formulate a standard paradigm for solving linear quadratic control using DIRL. To recover the value function and the target control gain, an error metric is elaborately constructed, and a quasi-Newton algorithm is adopted to minimize it. Furthermore, three DIRL algorithms, including model-based, model-free off-policy, and model-free on-policy algorithms, are proposed. The latter two rely on the expert’s demonstration data or the online observed data, requiring no prior knowledge of the system dynamics and value function. The stability, convergence, and existence conditions of multiple solutions are thoroughly analyzed. Finally, numerical simulations demonstrate the effectiveness of the theoretical results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

线性二次控制的折现逆强化学习

具有未知值函数和动力学特性的线性二次控制是一个极具挑战性的问题，现有的研究大多集中在调节问题上，无法处理跟踪问题。为了解决具有未知值函数的连续时间系统的线性二次调节和跟踪问题，本文开发了一种折现逆强化学习（DIRL）方法，该方法继承了强化学习（RL）的模型无关性。更具体地说，我们首先制定了一个使用DIRL求解线性二次控制的标准范例。为了恢复值函数和目标控制增益，精心构造了误差度量，并采用准牛顿算法将其最小化。在此基础上，提出了三种DIRL算法，包括基于模型的、无模型的策略下算法和无模型的策略上算法。后两种方法依赖于专家的演示数据或在线观测数据，不需要事先了解系统动力学和价值函数。深入分析了多解的稳定性、收敛性和存在条件。最后，通过数值仿真验证了理论结果的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

25.40

自引率

11.00%

发文量

1869

期刊介绍： The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.