Data-Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control

IF 7.3 1区工程技术 Q1 AUTOMATION & CONTROL SYSTEMS IEEE/ASME Transactions on Mechatronics Pub Date : 2024-09-23 DOI:10.1109/TMECH.2024.3412275

Cong Li;Fangzhou Liu;Yongchao Wang;Martin Buss

{"title":"Data-Informed Residual Reinforcement Learning for High-Dimensional Robotic Tracking Control","authors":"Cong Li;Fangzhou Liu;Yongchao Wang;Martin Buss","doi":"10.1109/TMECH.2024.3412275","DOIUrl":null,"url":null,"abstract":"The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application toward continuous robotic tracking control, especially for high-dimensional robots. This article proposes a data-informed residual reinforcement learning (DR-RL)-based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward data to construct incremental subsystems that are equivalent model-free representations of the aforementioned decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability toward dynamically changing environments. Our proposed DR-RL-based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.","PeriodicalId":13372,"journal":{"name":"IEEE/ASME Transactions on Mechatronics","volume":"30 3","pages":"1681-1691"},"PeriodicalIF":7.3000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ASME Transactions on Mechatronics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10689563/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The learning inefficiency of reinforcement learning (RL) from scratch hinders its practical application toward continuous robotic tracking control, especially for high-dimensional robots. This article proposes a data-informed residual reinforcement learning (DR-RL)-based robotic tracking control scheme applicable to robots with high dimensionality. The proposed DR-RL methodology outperforms common RL methods regarding sample efficiency and scalability. Specifically, we first decouple the original robot into low-dimensional robotic subsystems; and further utilize one-step backward data to construct incremental subsystems that are equivalent model-free representations of the aforementioned decoupled robotic subsystems. The formulated incremental subsystems allow for parallel learning to relieve computation load and offer us mathematical descriptions of robotic movements for conducting theoretical analysis. Then, we apply DR-RL to learn the tracking control policy, a combination of incremental base policy and incremental residual policy, under a parallel learning architecture. The incremental residual policy uses the guidance from the incremental base policy as the learning initialization and further learns from interactions with environments to endow the tracking control policy with adaptability toward dynamically changing environments. Our proposed DR-RL-based tracking control scheme is developed with rigorous theoretical analysis of system stability and weight convergence. The effectiveness of our proposed method is validated numerically on a 7-DoF KUKA iiwa robot manipulator and experimentally on a 3-DoF robot manipulator that would fail for other counterpart RL methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于高维机器人跟踪控制的数据信息残差强化学习

从头开始的强化学习（RL）学习效率低，阻碍了其在机器人连续跟踪控制，特别是高维机器人跟踪控制中的实际应用。本文提出了一种基于数据知情残差强化学习（DR-RL）的机器人跟踪控制方案，适用于高维机器人。提出的DR-RL方法在样本效率和可扩展性方面优于常见的RL方法。具体而言，我们首先将原始机器人解耦为低维机器人子系统；并进一步利用一步向后的数据来构建增量子系统，这些增量子系统是上述解耦机器人子系统的等效无模型表示。制定的增量子系统允许并行学习以减轻计算负荷，并为我们提供机器人运动的数学描述以进行理论分析。然后，我们应用DR-RL在并行学习架构下学习跟踪控制策略，即增量基策略和增量残差策略的组合。增量残差策略使用增量基本策略的指导作为学习初始化，并进一步从与环境的交互中学习，使跟踪控制策略具有对动态变化的环境的适应性。我们提出了基于dr - rl的跟踪控制方案，并对系统稳定性和权值收敛进行了严格的理论分析。在7自由度KUKA iiwa机器人机械臂上进行了数值验证，并在3自由度机器人机械臂上进行了实验验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE/ASME Transactions on Mechatronics 工程技术-工程：电子与电气

CiteScore

11.60

自引率

18.80%

发文量

527

审稿时长

7.8 months

期刊介绍： IEEE/ASME Transactions on Mechatronics publishes high quality technical papers on technological advances in mechatronics. A primary purpose of the IEEE/ASME Transactions on Mechatronics is to have an archival publication which encompasses both theory and practice. Papers published in the IEEE/ASME Transactions on Mechatronics disclose significant new knowledge needed to implement intelligent mechatronics systems, from analysis and design through simulation and hardware and software implementation. The Transactions also contains a letters section dedicated to rapid publication of short correspondence items concerning new research results.