Reinforcement Learning-Based Time-Synchronized Optimized Control for Affine Systems

Yuxiang Zhang;Xiaoling Liang;Dongyu Li;Shuzhi Sam Ge;Bingzhao Gao;Hong Chen;Tong Heng Lee
{"title":"Reinforcement Learning-Based Time-Synchronized Optimized Control for Affine Systems","authors":"Yuxiang Zhang;Xiaoling Liang;Dongyu Li;Shuzhi Sam Ge;Bingzhao Gao;Hong Chen;Tong Heng Lee","doi":"10.1109/TAI.2024.3420261","DOIUrl":null,"url":null,"abstract":"The approach of (fixed-) time-synchronized control (FTSC) aims at attaining the outcome where all the system state-variables converge to the origin simultaneously/synchronously. This type of outcome can be the highly essential performance desired in various real-world high-precision control applications. Toward this objective, this article proposes and investigates the development of a time-synchronized reinforcement learning algorithm (TSRL) applicable to a particular class of first- and second-order affine nonlinear systems. The approach developed here appropriately incorporates the norm-normalized sign function into the optimal system control design, leveraging on the special properties of this norm-normalized sign function in attaining time-synchronized stability and control. Concurrently, the actor–critic framework in reinforcement learning (RL) is invoked, and the dual quantities of system control and gradient term of the cost function are decomposed with appropriate time-synchronized control items and unknown actor/critic part and to be learned independently. By additionally employing the adaptive dynamic programming technique, the solution of the Hamilton–Jacobi–Bellman equation is iteratively approximated under this actor–critic framework. As an outcome, the proposed TSRL method optimizes the system control while attaining the notable time-synchronized convergence property. The performance and effectiveness of the proposed method are demonstrated to be effectively applicable via detailed numerical studies and on an autonomous vehicle nonlinear system motion control problem.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10576055/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The approach of (fixed-) time-synchronized control (FTSC) aims at attaining the outcome where all the system state-variables converge to the origin simultaneously/synchronously. This type of outcome can be the highly essential performance desired in various real-world high-precision control applications. Toward this objective, this article proposes and investigates the development of a time-synchronized reinforcement learning algorithm (TSRL) applicable to a particular class of first- and second-order affine nonlinear systems. The approach developed here appropriately incorporates the norm-normalized sign function into the optimal system control design, leveraging on the special properties of this norm-normalized sign function in attaining time-synchronized stability and control. Concurrently, the actor–critic framework in reinforcement learning (RL) is invoked, and the dual quantities of system control and gradient term of the cost function are decomposed with appropriate time-synchronized control items and unknown actor/critic part and to be learned independently. By additionally employing the adaptive dynamic programming technique, the solution of the Hamilton–Jacobi–Bellman equation is iteratively approximated under this actor–critic framework. As an outcome, the proposed TSRL method optimizes the system control while attaining the notable time-synchronized convergence property. The performance and effectiveness of the proposed method are demonstrated to be effectively applicable via detailed numerical studies and on an autonomous vehicle nonlinear system motion control problem.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于强化学习的仿射系统时间同步优化控制
固定)时间同步控制(FTSC)方法旨在实现所有系统状态变量同时/同步收敛到原点的结果。在现实世界的各种高精度控制应用中,这种结果可能是最基本的性能要求。为了实现这一目标,本文提出并研究了一种适用于一阶和二阶仿射非线性系统的时间同步强化学习算法(TSRL)。本文所开发的方法将规范归一化符号函数恰当地融入了最优系统控制设计中,利用这种规范归一化符号函数的特殊性质实现了时间同步稳定性和控制。同时,引用强化学习(RL)中的行为批判框架,将系统控制和成本函数梯度项的双重量分解为适当的时间同步控制项和未知的行为/批判部分,并进行独立学习。此外,还采用了自适应动态编程技术,在此 "行动者-批评者 "框架下迭代逼近汉密尔顿-雅各比-贝尔曼方程的解。结果,所提出的 TSRL 方法在优化系统控制的同时,还获得了显著的时间同步收敛特性。通过详细的数值研究和自主车辆非线性系统运动控制问题,证明了所提方法的性能和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
期刊最新文献
Table of Contents Front Cover IEEE Transactions on Artificial Intelligence Publication Information Front Cover Table of Contents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1