{"title":"Q-Learning Methods for LQR Control of Completely Unknown Discrete-Time Linear Systems","authors":"Wenwu Fan;Junlin Xiong","doi":"10.1109/TASE.2024.3434533","DOIUrl":null,"url":null,"abstract":"This paper focuses on solving the linear quadratic regulator problem for discrete-time linear systems without knowing system matrices. The classical Q-learning methods for linear systems can be divided into Q-learning value iteration and Q-learning policy iteration. Q-learning value iteration converges at a linear convergence rate. Q-learning policy iteration has a second-order convergence rate but requires an initial stabilizing control policy. This paper aims to propose efficient model-free algorithms for solving the optimal control problem without requiring an initial stabilizing control policy. In this paper, we first present an equivalent problem for an auxiliary system with the same optimal control policy as the LQR problem. A Q-learning algorithm is proposed to solve the equivalent problem, which is proven to converge monotonically to the optimal solution. The convergence rate of the Q-learning algorithm is heavily dependent on the auxiliary system, so we introduce a model-free homotopy method based on Q-learning to solve the LQR problem. This homotopy method can achieve the optimal solution in a finite number of iterations by solving an LQR problem in each iteration. Additionally, we propose a Q-learning Lyapunov iteration algorithm to solve the equivalent problem for an auxiliary system and analyze its properties. Finally, two examples are provided to demonstrate our results. Note to Practitioners—This paper proposes several Q-learning methods to solve the linear quadratic regulator problem for discrete-time linear systems. On the one hand, it is difficult to know the exact system dynamics knowledge in actual engineering, so this paper is devoted to developing model-free algorithms. On the other hand, this paper focuses on the LQR problem because it is widely spread in practical applications. We propose several model-free algorithms to solve the LQR problem, which provides the basis for optimal control of actual applications. Similar to policy iteration, our algorithms need to solve the Lyapunov equation. The advantage of our methods is that all of our algorithms do not have strict constraints on initial conditions compared with policy iteration. The properties of every algorithm proposed in this paper are provided. In addition, we focus on the efficiency of algorithms to obtain the optimal control policy faster. Two practical examples are used to verify the effectiveness of our methods. Finally, the applicable situations of each algorithm are summarized in the conclusion.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"5933-5943"},"PeriodicalIF":6.4000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10622003/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper focuses on solving the linear quadratic regulator problem for discrete-time linear systems without knowing system matrices. The classical Q-learning methods for linear systems can be divided into Q-learning value iteration and Q-learning policy iteration. Q-learning value iteration converges at a linear convergence rate. Q-learning policy iteration has a second-order convergence rate but requires an initial stabilizing control policy. This paper aims to propose efficient model-free algorithms for solving the optimal control problem without requiring an initial stabilizing control policy. In this paper, we first present an equivalent problem for an auxiliary system with the same optimal control policy as the LQR problem. A Q-learning algorithm is proposed to solve the equivalent problem, which is proven to converge monotonically to the optimal solution. The convergence rate of the Q-learning algorithm is heavily dependent on the auxiliary system, so we introduce a model-free homotopy method based on Q-learning to solve the LQR problem. This homotopy method can achieve the optimal solution in a finite number of iterations by solving an LQR problem in each iteration. Additionally, we propose a Q-learning Lyapunov iteration algorithm to solve the equivalent problem for an auxiliary system and analyze its properties. Finally, two examples are provided to demonstrate our results. Note to Practitioners—This paper proposes several Q-learning methods to solve the linear quadratic regulator problem for discrete-time linear systems. On the one hand, it is difficult to know the exact system dynamics knowledge in actual engineering, so this paper is devoted to developing model-free algorithms. On the other hand, this paper focuses on the LQR problem because it is widely spread in practical applications. We propose several model-free algorithms to solve the LQR problem, which provides the basis for optimal control of actual applications. Similar to policy iteration, our algorithms need to solve the Lyapunov equation. The advantage of our methods is that all of our algorithms do not have strict constraints on initial conditions compared with policy iteration. The properties of every algorithm proposed in this paper are provided. In addition, we focus on the efficiency of algorithms to obtain the optimal control policy faster. Two practical examples are used to verify the effectiveness of our methods. Finally, the applicable situations of each algorithm are summarized in the conclusion.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.