Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses

IF 4.6 2区 计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2024-10-29 DOI:10.1109/LRA.2024.3487484
Qingwei Dong;Peng Zeng;Yunpeng He;Guangxi Wan;Xiaoting Dong
{"title":"Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses","authors":"Qingwei Dong;Peng Zeng;Yunpeng He;Guangxi Wan;Xiaoting Dong","doi":"10.1109/LRA.2024.3487484","DOIUrl":null,"url":null,"abstract":"Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11242-11249"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737442/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
减轻机器人持续学习中的灾难性遗忘:利用记忆感知突触增强的引导式策略搜索方法
复杂的操作场景越来越多地要求工业机器人按顺序解决多个相互关联的问题,以完成复杂的操作任务,这就要求机器人不仅要具备通过与环境互动来学习的能力,还要具备持续学习的能力。目前的深度强化学习方法在帮助机器人学习单个简单操作技能方面已显示出巨大的优势。然而,在统一控制策略下持续学习各种不同任务的灾难性遗忘仍然是一个挑战。在强化学习场景中,冗长的顺序决策轨迹会给机器人带来一个巨大的状态动作搜索空间。此外,低价值的状态-动作样本加剧了强化学习问题中持续学习的难度。在这封信中,我们提出了一种持续强化学习(CRL)方法,以适应机器人的增量多技能学习需求。我们将引导策略搜索(GPS)算法中紧密结合局部策略和全局策略的紧密耦合结构转变为松散耦合结构。这种修改后的结构只有在特定任务的局部策略收敛后才会更新全局策略,从而实现在线学习。在增量学习新任务时,全局策略通过硬参数共享和记忆感知突触(MAS)进行更新,创建特定任务层,同时对与先前任务相关的共享层中的重大参数变化进行惩罚。这种方法减少了过拟合,减轻了机器人 CRL 中的灾难性遗忘。我们在模拟器中的 PR2、UR5 和 Sawyer 机器人以及真实的 UR5 机器人上验证了我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
期刊最新文献
Correction To: “Design Models and Performance Analysis for a Novel Shape Memory Alloy-Actuated Wearable Hand Exoskeleton for Rehabilitation” NavTr: Object-Goal Navigation With Learnable Transformer Queries A Diffusion-Based Data Generator for Training Object Recognition Models in Ultra-Range Distance Position Prediction for Space Teleoperation With SAO-CNN-BiGRU-Attention Algorithm MR-ULINS: A Tightly-Coupled UWB-LiDAR-Inertial Estimator With Multi-Epoch Outlier Rejection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1