{"title":"减轻机器人持续学习中的灾难性遗忘:利用记忆感知突触增强的引导式策略搜索方法","authors":"Qingwei Dong;Peng Zeng;Yunpeng He;Guangxi Wan;Xiaoting Dong","doi":"10.1109/LRA.2024.3487484","DOIUrl":null,"url":null,"abstract":"Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11242-11249"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses\",\"authors\":\"Qingwei Dong;Peng Zeng;Yunpeng He;Guangxi Wan;Xiaoting Dong\",\"doi\":\"10.1109/LRA.2024.3487484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"9 12\",\"pages\":\"11242-11249\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10737442/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737442/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Mitigating Catastrophic Forgetting in Robot Continual Learning: A Guided Policy Search Approach Enhanced With Memory-Aware Synapses
Complex operational scenarios increasingly demand that industrial robots sequentially resolve multiple interrelated problems to accomplish complex operational tasks, necessitating robots to have the capacity for not only learning through interaction with the environment but also for continual learning. Current deep reinforcement learning methods have demonstrated substantial prowess in enabling robots to learn individual simple operational skills. However, catastrophic forgetting regarding the continual learning of various distinct tasks under a unified control policy remains a challenge. The lengthy sequential decision-making trajectory in reinforcement learning scenarios results in a massive state-action search space for the agent. Moreover, low-value state-action samples exacerbate the difficulty of continuous learning in reinforcement learning problems. In this letter, we propose a Continual Reinforcement Learning (CRL) method that accommodates the incremental multiskill learning demands of robots. We transform the tightly coupled structure in Guided Policy Search (GPS) algorithms, which closely intertwine local and global policies, into a loosely coupled structure. This revised structure updates the global policy only after the local policy for a specific task has converged, enabling online learning. In incrementally learning new tasks, the global policy is updated using hard parameter sharing and Memory Aware Synapses (MAS), creating task-specific layers while penalizing significant parameter changes in shared layers linked to prior tasks. This method reduces overfitting and mitigates catastrophic forgetting in robotic CRL. We validate our method on PR2, UR5 and Sawyer robots in simulators as well as on a real UR5 robot.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.