Han Wu;Qinglei Hu;Jianying Zheng;Fei Dong;Zhenchao Ouyang;Dongyu Li
{"title":"Discounted Inverse Reinforcement Learning for Linear Quadratic Control","authors":"Han Wu;Qinglei Hu;Jianying Zheng;Fei Dong;Zhenchao Ouyang;Dongyu Li","doi":"10.1109/TCYB.2025.3540967","DOIUrl":null,"url":null,"abstract":"Linear quadratic control with unknown value functions and dynamics is extremely challenging, and most of the existing studies have focused on the regulation problem, incapable of dealing with the tracking problem. To solve both linear quadratic regulation and tracking problems for continuous-time systems with unknown value functions, this article develops a discounted inverse reinforcement learning (DIRL) method that inherits the model-independent property of reinforcement learning (RL). More specifically, we first formulate a standard paradigm for solving linear quadratic control using DIRL. To recover the value function and the target control gain, an error metric is elaborately constructed, and a quasi-Newton algorithm is adopted to minimize it. Furthermore, three DIRL algorithms, including model-based, model-free off-policy, and model-free on-policy algorithms, are proposed. The latter two rely on the expert’s demonstration data or the online observed data, requiring no prior knowledge of the system dynamics and value function. The stability, convergence, and existence conditions of multiple solutions are thoroughly analyzed. Finally, numerical simulations demonstrate the effectiveness of the theoretical results.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 4","pages":"1995-2007"},"PeriodicalIF":10.5000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10909692/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Linear quadratic control with unknown value functions and dynamics is extremely challenging, and most of the existing studies have focused on the regulation problem, incapable of dealing with the tracking problem. To solve both linear quadratic regulation and tracking problems for continuous-time systems with unknown value functions, this article develops a discounted inverse reinforcement learning (DIRL) method that inherits the model-independent property of reinforcement learning (RL). More specifically, we first formulate a standard paradigm for solving linear quadratic control using DIRL. To recover the value function and the target control gain, an error metric is elaborately constructed, and a quasi-Newton algorithm is adopted to minimize it. Furthermore, three DIRL algorithms, including model-based, model-free off-policy, and model-free on-policy algorithms, are proposed. The latter two rely on the expert’s demonstration data or the online observed data, requiring no prior knowledge of the system dynamics and value function. The stability, convergence, and existence conditions of multiple solutions are thoroughly analyzed. Finally, numerical simulations demonstrate the effectiveness of the theoretical results.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.