Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses

IF 4.6 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2024-10-29 DOI:10.1109/LRA.2024.3487484

Qingwei Dong;Peng Zeng;Yunpeng He;Guangxi Wan;Xiaoting Dong

{"title":"Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses","authors":"Qingwei Dong;Peng Zeng;Yunpeng He;Guangxi Wan;Xiaoting Dong","doi":"10.1109/LRA.2024.3487484","DOIUrl":null,"url":null,"abstract":"Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11242-11249"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737442/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

减轻机器人持续学习中的灾难性遗忘：利用记忆感知突触增强的引导式策略搜索方法

复杂的操作场景越来越多地要求工业机器人按顺序解决多个相互关联的问题，以完成复杂的操作任务，这就要求机器人不仅要具备通过与环境互动来学习的能力，还要具备持续学习的能力。目前的深度强化学习方法在帮助机器人学习单个简单操作技能方面已显示出巨大的优势。然而，在统一控制策略下持续学习各种不同任务的灾难性遗忘仍然是一个挑战。在强化学习场景中，冗长的顺序决策轨迹会给机器人带来一个巨大的状态动作搜索空间。此外，低价值的状态-动作样本加剧了强化学习问题中持续学习的难度。在这封信中，我们提出了一种持续强化学习（CRL）方法，以适应机器人的增量多技能学习需求。我们将引导策略搜索（GPS）算法中紧密结合局部策略和全局策略的紧密耦合结构转变为松散耦合结构。这种修改后的结构只有在特定任务的局部策略收敛后才会更新全局策略，从而实现在线学习。在增量学习新任务时，全局策略通过硬参数共享和记忆感知突触（MAS）进行更新，创建特定任务层，同时对与先前任务相关的共享层中的重大参数变化进行惩罚。这种方法减少了过拟合，减轻了机器人 CRL 中的灾难性遗忘。我们在模拟器中的 PR2、UR5 和 Sawyer 机器人以及真实的 UR5 机器人上验证了我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.