利用时变 Q-learning 实现不确定线性离散时间系统的最佳轨迹跟踪

IF 3.9 4区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS International Journal of Adaptive Control and Signal Processing Pub Date : 2024-04-24 DOI:10.1002/acs.3807

Maxwell Geiger, Vignesh Narayanan, Sarangapani Jagannathan

{"title":"利用时变 Q-learning 实现不确定线性离散时间系统的最佳轨迹跟踪","authors":"Maxwell Geiger, Vignesh Narayanan, Sarangapani Jagannathan","doi":"10.1002/acs.3807","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This article introduces a novel optimal trajectory tracking control scheme designed for uncertain linear discrete-time (DT) systems. In contrast to traditional tracking control methods, our approach removes the requirement for the reference trajectory to align with the generator dynamics of an autonomous dynamical system. Moreover, it does not demand the complete desired trajectory to be known in advance, whether through the generator model or any other means. Instead, our approach can dynamically incorporate segments (finite horizons) of reference trajectories and autonomously learn an optimal control policy to track them in real time. To achieve this, we address the tracking problem by learning a time-varying <span></span><math>\n <semantics>\n <mrow>\n <mi>Q</mi>\n </mrow>\n <annotation>$$ Q $$</annotation>\n </semantics></math>-function through state feedback. This <span></span><math>\n <semantics>\n <mrow>\n <mi>Q</mi>\n </mrow>\n <annotation>$$ Q $$</annotation>\n </semantics></math>-function is then utilized to calculate the optimal feedback gain and explicitly time-varying feedforward control input, all without the need for prior knowledge of the system dynamics or having the complete reference trajectory in advance. Additionally, we introduce an adaptive observer to extend the applicability of the tracking control scheme to situations where full state measurements are unavailable. We rigorously establish the closed-loop stability of our optimal adaptive control approach, both with and without the adaptive observer, employing Lyapunov theory. Moreover, we characterize the optimality of the controller with respect to the finite horizon length of the known components of the desired trajectory. To further enhance the controller's adaptability and effectiveness in multitask environments, we employ the Efficient Lifelong Learning Algorithm, which leverages a shared knowledge base within the recursive least squares algorithm for multitask <span></span><math>\n <semantics>\n <mrow>\n <mi>Q</mi>\n </mrow>\n <annotation>$$ Q $$</annotation>\n </semantics></math>-learning. The efficacy of our approach is substantiated through a comprehensive set of simulation results by using a power system example.</p>\n </div>","PeriodicalId":50347,"journal":{"name":"International Journal of Adaptive Control and Signal Processing","volume":"38 7","pages":"2340-2368"},"PeriodicalIF":3.9000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning\",\"authors\":\"Maxwell Geiger, Vignesh Narayanan, Sarangapani Jagannathan\",\"doi\":\"10.1002/acs.3807\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>This article introduces a novel optimal trajectory tracking control scheme designed for uncertain linear discrete-time (DT) systems. In contrast to traditional tracking control methods, our approach removes the requirement for the reference trajectory to align with the generator dynamics of an autonomous dynamical system. Moreover, it does not demand the complete desired trajectory to be known in advance, whether through the generator model or any other means. Instead, our approach can dynamically incorporate segments (finite horizons) of reference trajectories and autonomously learn an optimal control policy to track them in real time. To achieve this, we address the tracking problem by learning a time-varying <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>Q</mi>\\n </mrow>\\n <annotation>$$ Q $$</annotation>\\n </semantics></math>-function through state feedback. This <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>Q</mi>\\n </mrow>\\n <annotation>$$ Q $$</annotation>\\n </semantics></math>-function is then utilized to calculate the optimal feedback gain and explicitly time-varying feedforward control input, all without the need for prior knowledge of the system dynamics or having the complete reference trajectory in advance. Additionally, we introduce an adaptive observer to extend the applicability of the tracking control scheme to situations where full state measurements are unavailable. We rigorously establish the closed-loop stability of our optimal adaptive control approach, both with and without the adaptive observer, employing Lyapunov theory. Moreover, we characterize the optimality of the controller with respect to the finite horizon length of the known components of the desired trajectory. To further enhance the controller's adaptability and effectiveness in multitask environments, we employ the Efficient Lifelong Learning Algorithm, which leverages a shared knowledge base within the recursive least squares algorithm for multitask <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>Q</mi>\\n </mrow>\\n <annotation>$$ Q $$</annotation>\\n </semantics></math>-learning. The efficacy of our approach is substantiated through a comprehensive set of simulation results by using a power system example.</p>\\n </div>\",\"PeriodicalId\":50347,\"journal\":{\"name\":\"International Journal of Adaptive Control and Signal Processing\",\"volume\":\"38 7\",\"pages\":\"2340-2368\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Adaptive Control and Signal Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/acs.3807\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Adaptive Control and Signal Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acs.3807","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种针对不确定线性离散时间 (DT) 系统设计的新型最优轨迹跟踪控制方案。与传统的跟踪控制方法相比，我们的方法不再要求参考轨迹与自主动态系统的发电机动态相一致。此外，无论是通过发电机模型还是其他方法，它都不要求事先知道完整的预期轨迹。相反，我们的方法可以动态地纳入参考轨迹的片段（有限视野），并自主学习最佳控制策略来实时跟踪它们。为此，我们通过状态反馈学习时变函数来解决跟踪问题。然后，利用该函数计算最佳反馈增益和明确的时变前馈控制输入，而无需事先了解系统动态或拥有完整的参考轨迹。此外，我们还引入了自适应观测器，将跟踪控制方案的适用范围扩展到无法获得完整状态测量值的情况。我们利用李亚普诺夫理论严格确定了最优自适应控制方法的闭环稳定性，包括有自适应观测器和无自适应观测器两种情况。此外，我们还描述了与期望轨迹已知分量的有限视界长度相关的控制器的最优性。为了进一步提高控制器在多任务环境中的适应性和有效性，我们采用了高效终身学习算法，该算法利用递归最小二乘法中的共享知识库进行多任务学习。我们利用一个电力系统实例，通过一组全面的仿真结果证实了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimal trajectory tracking for uncertain linear discrete-time systems using time-varying Q-learning

This article introduces a novel optimal trajectory tracking control scheme designed for uncertain linear discrete-time (DT) systems. In contrast to traditional tracking control methods, our approach removes the requirement for the reference trajectory to align with the generator dynamics of an autonomous dynamical system. Moreover, it does not demand the complete desired trajectory to be known in advance, whether through the generator model or any other means. Instead, our approach can dynamically incorporate segments (finite horizons) of reference trajectories and autonomously learn an optimal control policy to track them in real time. To achieve this, we address the tracking problem by learning a time-varying $Q$ -function through state feedback. This $Q$ -function is then utilized to calculate the optimal feedback gain and explicitly time-varying feedforward control input, all without the need for prior knowledge of the system dynamics or having the complete reference trajectory in advance. Additionally, we introduce an adaptive observer to extend the applicability of the tracking control scheme to situations where full state measurements are unavailable. We rigorously establish the closed-loop stability of our optimal adaptive control approach, both with and without the adaptive observer, employing Lyapunov theory. Moreover, we characterize the optimality of the controller with respect to the finite horizon length of the known components of the desired trajectory. To further enhance the controller's adaptability and effectiveness in multitask environments, we employ the Efficient Lifelong Learning Algorithm, which leverages a shared knowledge base within the recursive least squares algorithm for multitask $Q$ -learning. The efficacy of our approach is substantiated through a comprehensive set of simulation results by using a power system example.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Adaptive Control and Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

16.10%

发文量

163

审稿时长

5 months

期刊介绍： The International Journal of Adaptive Control and Signal Processing is concerned with the design, synthesis and application of estimators or controllers where adaptive features are needed to cope with uncertainties.Papers on signal processing should also have some relevance to adaptive systems. The journal focus is on model based control design approaches rather than heuristic or rule based control design methods. All papers will be expected to include significant novel material. Both the theory and application of adaptive systems and system identification are areas of interest. Papers on applications can include problems in the implementation of algorithms for real time signal processing and control. The stability, convergence, robustness and numerical aspects of adaptive algorithms are also suitable topics. The related subjects of controller tuning, filtering, networks and switching theory are also of interest. Principal areas to be addressed include: Auto-Tuning, Self-Tuning and Model Reference Adaptive Controllers Nonlinear, Robust and Intelligent Adaptive Controllers Linear and Nonlinear Multivariable System Identification and Estimation Identification of Linear Parameter Varying, Distributed and Hybrid Systems Multiple Model Adaptive Control Adaptive Signal processing Theory and Algorithms Adaptation in Multi-Agent Systems Condition Monitoring Systems Fault Detection and Isolation Methods Fault Detection and Isolation Methods Fault-Tolerant Control (system supervision and diagnosis) Learning Systems and Adaptive Modelling Real Time Algorithms for Adaptive Signal Processing and Control Adaptive Signal Processing and Control Applications Adaptive Cloud Architectures and Networking Adaptive Mechanisms for Internet of Things Adaptive Sliding Mode Control.

期刊最新文献

Issue Information Issue Information Anti Wind‐Up and Robust Data‐Driven Model‐Free Adaptive Control for MIMO Nonlinear Discrete‐Time Systems Separable Synchronous Gradient‐Based Iterative Algorithms for the Nonlinear ExpARX System Random Learning Leads to Faster Convergence in ‘Model‐Free’ ILC: With Application to MIMO Feedforward in Industrial Printing