Adaptive Critic Control With Knowledge Transfer for Uncertain Nonlinear Dynamical Systems: A Reinforcement Learning Approach

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-09-09 DOI:10.1109/TASE.2024.3453926

Liangju Zhang;Kun Zhang;Xiang Peng Xie;Mohammed Chadli

{"title":"Adaptive Critic Control With Knowledge Transfer for Uncertain Nonlinear Dynamical Systems: A Reinforcement Learning Approach","authors":"Liangju Zhang;Kun Zhang;Xiang Peng Xie;Mohammed Chadli","doi":"10.1109/TASE.2024.3453926","DOIUrl":null,"url":null,"abstract":"This paper presents an online transfer heuristic dynamic programming (THDP) control approach for a class of nonlinear discrete systems. The proposed approach integrates transfer learning with adaptive critic control. To design a robust optimal control strategy for the nonlinear discrete systems, we utilize sample data collected from a source task to acquire prior knowledge. This prior knowledge is subsequently used to guide the online control process of nonlinear systems of target tasks. To avoid negative transfer effects and conserve computational resources, we introduce a novel attenuation function with a truncation mechanism. Additionally, we develop a disturbance compensation control mechanism to address uncertainties. Furthermore, we demonstrate that the properties of the uncertain nonlinear systems under robust optimal control, as well as the weight error of neural networks, are ultimately uniformly bounded given certain conditions. Finally, two simulations are conducted to verify the performance of the proposed algorithm. Note to Practitioners—Adaptive dynamic programming (ADP) is one of the main methods to solve the Hamilton-Jacobi-Bellman (HJB) equation. However, when using neural network approximation, it often requires a long time of iteration and a large amount of computational process, wasting a lot of computational resources. For this reason, we propose an ADP control scheme with enhanced detection speed: that is, by learning a class of similar tasks to obtain prior knowledge to assist in the online control of our actual system. At the same time, this paper considers system disturbances, which means that they are more universal and robust. After simulation experiments, it has been proven that this scheme has good performance.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"6752-6761"},"PeriodicalIF":6.4000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10669391/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents an online transfer heuristic dynamic programming (THDP) control approach for a class of nonlinear discrete systems. The proposed approach integrates transfer learning with adaptive critic control. To design a robust optimal control strategy for the nonlinear discrete systems, we utilize sample data collected from a source task to acquire prior knowledge. This prior knowledge is subsequently used to guide the online control process of nonlinear systems of target tasks. To avoid negative transfer effects and conserve computational resources, we introduce a novel attenuation function with a truncation mechanism. Additionally, we develop a disturbance compensation control mechanism to address uncertainties. Furthermore, we demonstrate that the properties of the uncertain nonlinear systems under robust optimal control, as well as the weight error of neural networks, are ultimately uniformly bounded given certain conditions. Finally, two simulations are conducted to verify the performance of the proposed algorithm. Note to Practitioners—Adaptive dynamic programming (ADP) is one of the main methods to solve the Hamilton-Jacobi-Bellman (HJB) equation. However, when using neural network approximation, it often requires a long time of iteration and a large amount of computational process, wasting a lot of computational resources. For this reason, we propose an ADP control scheme with enhanced detection speed: that is, by learning a class of similar tasks to obtain prior knowledge to assist in the online control of our actual system. At the same time, this paper considers system disturbances, which means that they are more universal and robust. After simulation experiments, it has been proven that this scheme has good performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对不确定非线性动态系统的知识转移自适应批判控制：强化学习方法

针对一类非线性离散系统，提出了一种在线迁移启发式动态规划控制方法。该方法将迁移学习与自适应批评控制相结合。为了设计非线性离散系统的鲁棒最优控制策略，我们利用从源任务中收集的样本数据来获取先验知识。该先验知识随后用于指导目标任务非线性系统的在线控制过程。为了避免负传递效应和节约计算资源，我们引入了一种具有截断机制的衰减函数。此外，我们开发了一种干扰补偿控制机制来解决不确定性。进一步证明了在鲁棒最优控制下的不确定非线性系统的性质，以及神经网络的权值误差，在一定条件下最终是一致有界的。最后，通过两个仿真验证了所提算法的性能。自适应动态规划是求解Hamilton-Jacobi-Bellman （HJB）方程的主要方法之一。然而，在使用神经网络逼近时，往往需要长时间的迭代和大量的计算过程，浪费了大量的计算资源。为此，我们提出了一种提高检测速度的ADP控制方案：即通过学习一类相似任务来获得先验知识，以辅助我们实际系统的在线控制。同时，本文还考虑了系统扰动，使其具有更强的通用性和鲁棒性。经过仿真实验，证明了该方案具有良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.