{"title":"利用时变 Q-learning 实现不确定线性离散时间系统的最佳轨迹跟踪","authors":"Maxwell Geiger, Vignesh Narayanan, Sarangapani Jagannathan","doi":"10.1002/acs.3807","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This article introduces a novel optimal trajectory tracking control scheme designed for uncertain linear discrete-time (DT) systems. In contrast to traditional tracking control methods, our approach removes the requirement for the reference trajectory to align with the generator dynamics of an autonomous dynamical system. Moreover, it does not demand the complete desired trajectory to be known in advance, whether through the generator model or any other means. Instead, our approach can dynamically incorporate segments (finite horizons) of reference trajectories and autonomously learn an optimal control policy to track them in real time. To achieve this, we address the tracking problem by learning a time-varying <span></span><math>\n <semantics>\n <mrow>\n <mi>Q</mi>\n </mrow>\n <annotation>$$ Q $$</annotation>\n </semantics></math>-function through state feedback. This <span></span><math>\n <semantics>\n <mrow>\n <mi>Q</mi>\n </mrow>\n <annotation>$$ Q $$</annotation>\n </semantics></math>-function is then utilized to calculate the optimal feedback gain and explicitly time-varying feedforward control input, all without the need for prior knowledge of the system dynamics or having the complete reference trajectory in advance. Additionally, we introduce an adaptive observer to extend the applicability of the tracking control scheme to situations where full state measurements are unavailable. We rigorously establish the closed-loop stability of our optimal adaptive control approach, both with and without the adaptive observer, employing Lyapunov theory. Moreover, we characterize the optimality of the controller with respect to the finite horizon length of the known components of the desired trajectory. To further enhance the controller's adaptability and effectiveness in multitask environments, we employ the Efficient Lifelong Learning Algorithm, which leverages a shared knowledge base within the recursive least squares algorithm for multitask <span></span><math>\n <semantics>\n <mrow>\n <mi>Q</mi>\n </mrow>\n <annotation>$$ Q $$</annotation>\n </semantics></math>-learning. The efficacy of our approach is substantiated through a comprehensive set of simulation results by using a power system example.</p>\n </div>","PeriodicalId":50347,"journal":{"name":"International Journal of Adaptive Control and Signal Processing","volume":"38 7","pages":"2340-2368"},"PeriodicalIF":3.9000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning\",\"authors\":\"Maxwell Geiger, Vignesh Narayanan, Sarangapani Jagannathan\",\"doi\":\"10.1002/acs.3807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>This article introduces a novel optimal trajectory tracking control scheme designed for uncertain linear discrete-time (DT) systems. In contrast to traditional tracking control methods, our approach removes the requirement for the reference trajectory to align with the generator dynamics of an autonomous dynamical system. Moreover, it does not demand the complete desired trajectory to be known in advance, whether through the generator model or any other means. Instead, our approach can dynamically incorporate segments (finite horizons) of reference trajectories and autonomously learn an optimal control policy to track them in real time. To achieve this, we address the tracking problem by learning a time-varying <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>Q</mi>\\n </mrow>\\n <annotation>$$ Q $$</annotation>\\n </semantics></math>-function through state feedback. This <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>Q</mi>\\n </mrow>\\n <annotation>$$ Q $$</annotation>\\n </semantics></math>-function is then utilized to calculate the optimal feedback gain and explicitly time-varying feedforward control input, all without the need for prior knowledge of the system dynamics or having the complete reference trajectory in advance. Additionally, we introduce an adaptive observer to extend the applicability of the tracking control scheme to situations where full state measurements are unavailable. We rigorously establish the closed-loop stability of our optimal adaptive control approach, both with and without the adaptive observer, employing Lyapunov theory. Moreover, we characterize the optimality of the controller with respect to the finite horizon length of the known components of the desired trajectory. To further enhance the controller's adaptability and effectiveness in multitask environments, we employ the Efficient Lifelong Learning Algorithm, which leverages a shared knowledge base within the recursive least squares algorithm for multitask <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>Q</mi>\\n </mrow>\\n <annotation>$$ Q $$</annotation>\\n </semantics></math>-learning. The efficacy of our approach is substantiated through a comprehensive set of simulation results by using a power system example.</p>\\n </div>\",\"PeriodicalId\":50347,\"journal\":{\"name\":\"International Journal of Adaptive Control and Signal Processing\",\"volume\":\"38 7\",\"pages\":\"2340-2368\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Adaptive Control and Signal Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/acs.3807\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Adaptive Control and Signal Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acs.3807","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning
This article introduces a novel optimal trajectory tracking control scheme designed for uncertain linear discrete-time (DT) systems. In contrast to traditional tracking control methods, our approach removes the requirement for the reference trajectory to align with the generator dynamics of an autonomous dynamical system. Moreover, it does not demand the complete desired trajectory to be known in advance, whether through the generator model or any other means. Instead, our approach can dynamically incorporate segments (finite horizons) of reference trajectories and autonomously learn an optimal control policy to track them in real time. To achieve this, we address the tracking problem by learning a time-varying -function through state feedback. This -function is then utilized to calculate the optimal feedback gain and explicitly time-varying feedforward control input, all without the need for prior knowledge of the system dynamics or having the complete reference trajectory in advance. Additionally, we introduce an adaptive observer to extend the applicability of the tracking control scheme to situations where full state measurements are unavailable. We rigorously establish the closed-loop stability of our optimal adaptive control approach, both with and without the adaptive observer, employing Lyapunov theory. Moreover, we characterize the optimality of the controller with respect to the finite horizon length of the known components of the desired trajectory. To further enhance the controller's adaptability and effectiveness in multitask environments, we employ the Efficient Lifelong Learning Algorithm, which leverages a shared knowledge base within the recursive least squares algorithm for multitask -learning. The efficacy of our approach is substantiated through a comprehensive set of simulation results by using a power system example.
期刊介绍:
The International Journal of Adaptive Control and Signal Processing is concerned with the design, synthesis and application of estimators or controllers where adaptive features are needed to cope with uncertainties.Papers on signal processing should also have some relevance to adaptive systems. The journal focus is on model based control design approaches rather than heuristic or rule based control design methods. All papers will be expected to include significant novel material.
Both the theory and application of adaptive systems and system identification are areas of interest. Papers on applications can include problems in the implementation of algorithms for real time signal processing and control. The stability, convergence, robustness and numerical aspects of adaptive algorithms are also suitable topics. The related subjects of controller tuning, filtering, networks and switching theory are also of interest. Principal areas to be addressed include:
Auto-Tuning, Self-Tuning and Model Reference Adaptive Controllers
Nonlinear, Robust and Intelligent Adaptive Controllers
Linear and Nonlinear Multivariable System Identification and Estimation
Identification of Linear Parameter Varying, Distributed and Hybrid Systems
Multiple Model Adaptive Control
Adaptive Signal processing Theory and Algorithms
Adaptation in Multi-Agent Systems
Condition Monitoring Systems
Fault Detection and Isolation Methods
Fault Detection and Isolation Methods
Fault-Tolerant Control (system supervision and diagnosis)
Learning Systems and Adaptive Modelling
Real Time Algorithms for Adaptive Signal Processing and Control
Adaptive Signal Processing and Control Applications
Adaptive Cloud Architectures and Networking
Adaptive Mechanisms for Internet of Things
Adaptive Sliding Mode Control.