Data-Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control

IF 7.3 1区 工程技术 Q1 AUTOMATION & CONTROL SYSTEMS IEEE/ASME Transactions on Mechatronics Pub Date : 2024-09-23 DOI:10.1109/TMECH.2024.3412275
Cong Li;Fangzhou Liu;Yongchao Wang;Martin Buss
{"title":"Data-Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control","authors":"Cong Li;Fangzhou Liu;Yongchao Wang;Martin Buss","doi":"10.1109/TMECH.2024.3412275","DOIUrl":null,"url":null,"abstract":"The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application toward continuous robotic tracking control, especially for high-dimensional robots. This article proposes a data-informed residual reinforcement learning (DR-RL)-based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward data to construct incremental subsystems that are equivalent model-free representations of the aforementioned decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability toward dynamically changing environments. Our proposed DR-RL-based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.","PeriodicalId":13372,"journal":{"name":"IEEE/ASME Transactions on Mechatronics","volume":"30 3","pages":"1681-1691"},"PeriodicalIF":7.3000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ASME Transactions on Mechatronics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10689563/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application toward continuous robotic tracking control, especially for high-dimensional robots. This article proposes a data-informed residual reinforcement learning (DR-RL)-based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward data to construct incremental subsystems that are equivalent model-free representations of the aforementioned decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability toward dynamically changing environments. Our proposed DR-RL-based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于高维机器人跟踪控制的数据信息残差强化学习
从头开始的强化学习(RL)学习效率低,阻碍了其在机器人连续跟踪控制,特别是高维机器人跟踪控制中的实际应用。本文提出了一种基于数据知情残差强化学习(DR-RL)的机器人跟踪控制方案,适用于高维机器人。提出的DR-RL方法在样本效率和可扩展性方面优于常见的RL方法。具体而言,我们首先将原始机器人解耦为低维机器人子系统;并进一步利用一步向后的数据来构建增量子系统,这些增量子系统是上述解耦机器人子系统的等效无模型表示。制定的增量子系统允许并行学习以减轻计算负荷,并为我们提供机器人运动的数学描述以进行理论分析。然后,我们应用DR-RL在并行学习架构下学习跟踪控制策略,即增量基策略和增量残差策略的组合。增量残差策略使用增量基本策略的指导作为学习初始化,并进一步从与环境的交互中学习,使跟踪控制策略具有对动态变化的环境的适应性。我们提出了基于dr - rl的跟踪控制方案,并对系统稳定性和权值收敛进行了严格的理论分析。在7自由度KUKA iiwa机器人机械臂上进行了数值验证,并在3自由度机器人机械臂上进行了实验验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE/ASME Transactions on Mechatronics
IEEE/ASME Transactions on Mechatronics 工程技术-工程:电子与电气
CiteScore
11.60
自引率
18.80%
发文量
527
审稿时长
7.8 months
期刊介绍: IEEE/ASME Transactions on Mechatronics publishes high quality technical papers on technological advances in mechatronics. A primary purpose of the IEEE/ASME Transactions on Mechatronics is to have an archival publication which encompasses both theory and practice. Papers published in the IEEE/ASME Transactions on Mechatronics disclose significant new knowledge needed to implement intelligent mechatronics systems, from analysis and design through simulation and hardware and software implementation. The Transactions also contains a letters section dedicated to rapid publication of short correspondence items concerning new research results.
期刊最新文献
Pump Flow Compensation and Variable-Gain Sliding Mode Control for Mitigating Temperature-Induced Degradation in Servo Pump-Controlled Systems Contrastive Feature Reasoning for EEG Classification in Asynchronous BCI-Controlled Humanoid Robot Integrated Stereo Vision and Compliance Control With an Underwater Manipulator for Hydraulic Structure Inspection and Maintenance IEEE/ASME Transactions on Mechatronics Publication Information Completely Split-KalmanNet: A Novel Hybrid Model-Based and Data-Driven Method for GNSS/IMU Integration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1