Reinforcement Learning-Based Time-Synchronized Optimized Control for Affine Systems

IEEE transactions on artificial intelligence Pub Date : 2024-06-27 DOI:10.1109/TAI.2024.3420261

Yuxiang Zhang;Xiaoling Liang;Dongyu Li;Shuzhi Sam Ge;Bingzhao Gao;Hong Chen;Tong Heng Lee

{"title":"Reinforcement Learning-Based Time-Synchronized Optimized Control for Affine Systems","authors":"Yuxiang Zhang;Xiaoling Liang;Dongyu Li;Shuzhi Sam Ge;Bingzhao Gao;Hong Chen;Tong Heng Lee","doi":"10.1109/TAI.2024.3420261","DOIUrl":null,"url":null,"abstract":"The approach of (fixed-) time-synchronized control (FTSC) aims at attaining the outcome where all the system state-variables converge to the origin simultaneously/synchronously. This type of outcome can be the highly essential performance desired in various real-world high-precision control applications. Toward this objective, this article proposes and investigates the development of a time-synchronized reinforcement learning algorithm (TSRL) applicable to a particular class of first- and second-order affine nonlinear systems. The approach developed here appropriately incorporates the norm-normalized sign function into the optimal system control design, leveraging on the special properties of this norm-normalized sign function in attaining time-synchronized stability and control. Concurrently, the actor–critic framework in reinforcement learning (RL) is invoked, and the dual quantities of system control and gradient term of the cost function are decomposed with appropriate time-synchronized control items and unknown actor/critic part and to be learned independently. By additionally employing the adaptive dynamic programming technique, the solution of the Hamilton–Jacobi–Bellman equation is iteratively approximated under this actor–critic framework. As an outcome, the proposed TSRL method optimizes the system control while attaining the notable time-synchronized convergence property. The performance and effectiveness of the proposed method are demonstrated to be effectively applicable via detailed numerical studies and on an autonomous vehicle nonlinear system motion control problem.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5216-5231"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10576055/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The approach of (fixed-) time-synchronized control (FTSC) aims at attaining the outcome where all the system state-variables converge to the origin simultaneously/synchronously. This type of outcome can be the highly essential performance desired in various real-world high-precision control applications. Toward this objective, this article proposes and investigates the development of a time-synchronized reinforcement learning algorithm (TSRL) applicable to a particular class of first- and second-order affine nonlinear systems. The approach developed here appropriately incorporates the norm-normalized sign function into the optimal system control design, leveraging on the special properties of this norm-normalized sign function in attaining time-synchronized stability and control. Concurrently, the actor–critic framework in reinforcement learning (RL) is invoked, and the dual quantities of system control and gradient term of the cost function are decomposed with appropriate time-synchronized control items and unknown actor/critic part and to be learned independently. By additionally employing the adaptive dynamic programming technique, the solution of the Hamilton–Jacobi–Bellman equation is iteratively approximated under this actor–critic framework. As an outcome, the proposed TSRL method optimizes the system control while attaining the notable time-synchronized convergence property. The performance and effectiveness of the proposed method are demonstrated to be effectively applicable via detailed numerical studies and on an autonomous vehicle nonlinear system motion control problem.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于强化学习的仿射系统时间同步优化控制

固定）时间同步控制（FTSC）方法旨在实现所有系统状态变量同时/同步收敛到原点的结果。在现实世界的各种高精度控制应用中，这种结果可能是最基本的性能要求。为了实现这一目标，本文提出并研究了一种适用于一阶和二阶仿射非线性系统的时间同步强化学习算法（TSRL）。本文所开发的方法将规范归一化符号函数恰当地融入了最优系统控制设计中，利用这种规范归一化符号函数的特殊性质实现了时间同步稳定性和控制。同时，引用强化学习（RL）中的行为批判框架，将系统控制和成本函数梯度项的双重量分解为适当的时间同步控制项和未知的行为/批判部分，并进行独立学习。此外，还采用了自适应动态编程技术，在此 "行动者-批评者 "框架下迭代逼近汉密尔顿-雅各比-贝尔曼方程的解。结果，所提出的 TSRL 方法在优化系统控制的同时，还获得了显著的时间同步收敛特性。通过详细的数值研究和自主车辆非线性系统运动控制问题，证明了所提方法的性能和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助