为高效目标条件强化学习生成高价值子目标

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2024-10-28 DOI:10.1016/j.neunet.2024.106825

Yao Li , YuHui Wang , XiaoYang Tan

{"title":"为高效目标条件强化学习生成高价值子目标","authors":"Yao Li , YuHui Wang , XiaoYang Tan","doi":"10.1016/j.neunet.2024.106825","DOIUrl":null,"url":null,"abstract":"<div><div>Goal-conditioned reinforcement learning is widely used in robot control, manipulating the robot to accomplish specific tasks by maximizing accumulated rewards. However, the useful reward signal is only received when the desired goal is reached, leading to the issue of sparse rewards and affecting the efficiency of policy learning. In this paper, we propose a method to generate highly valued subgoals for efficient goal-conditioned policy learning, enabling the development of smart home robots or automatic pilots in our daily life. The highly valued subgoals are conditioned on the context of the specific tasks and characterized by suitable complexity for efficient goal-conditioned action value learning. The context variable captures the latent representation of the particular tasks, allowing for efficient subgoal generation. Additionally, the goal-conditioned action values regularized by the self-adaptive ranges generate subgoals with suitable complexity. Compared to Hindsight Experience Replay that uniformly samples subgoals from visited trajectories, our method generates the subgoals based on the context of tasks with suitable difficulty for efficient policy training. Experimental results show that our method achieves stable performance in robotic environments compared to baseline methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106825"},"PeriodicalIF":6.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Highly valued subgoal generation for efficient goal-conditioned reinforcement learning\",\"authors\":\"Yao Li , YuHui Wang , XiaoYang Tan\",\"doi\":\"10.1016/j.neunet.2024.106825\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Goal-conditioned reinforcement learning is widely used in robot control, manipulating the robot to accomplish specific tasks by maximizing accumulated rewards. However, the useful reward signal is only received when the desired goal is reached, leading to the issue of sparse rewards and affecting the efficiency of policy learning. In this paper, we propose a method to generate highly valued subgoals for efficient goal-conditioned policy learning, enabling the development of smart home robots or automatic pilots in our daily life. The highly valued subgoals are conditioned on the context of the specific tasks and characterized by suitable complexity for efficient goal-conditioned action value learning. The context variable captures the latent representation of the particular tasks, allowing for efficient subgoal generation. Additionally, the goal-conditioned action values regularized by the self-adaptive ranges generate subgoals with suitable complexity. Compared to Hindsight Experience Replay that uniformly samples subgoals from visited trajectories, our method generates the subgoals based on the context of tasks with suitable difficulty for efficient policy training. Experimental results show that our method achieves stable performance in robotic environments compared to baseline methods.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"181 \",\"pages\":\"Article 106825\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608024007494\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024007494","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

目标条件强化学习广泛应用于机器人控制，通过最大化累积奖励来操纵机器人完成特定任务。然而，只有在达到预期目标时才会收到有用的奖励信号，这就导致了奖励稀疏的问题，影响了策略学习的效率。在本文中，我们提出了一种生成高价值子目标的方法，以实现高效的目标条件策略学习，从而开发出智能家居机器人或日常生活中的自动驾驶员。高价值子目标以特定任务的上下文为条件，具有适合目标条件行动值高效学习的复杂性。上下文变量捕捉了特定任务的潜在表征，从而可以高效地生成子目标。此外，由自适应范围规范化的目标条件行动值可生成具有适当复杂度的子目标。与从访问过的轨迹中均匀采样子目标的 "后见之明经验重放 "相比，我们的方法是根据任务的上下文生成子目标，具有适当的难度，从而实现高效的策略训练。实验结果表明，与基线方法相比，我们的方法在机器人环境中实现了稳定的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Highly valued subgoal generation for efficient goal-conditioned reinforcement learning

Goal-conditioned reinforcement learning is widely used in robot control, manipulating the robot to accomplish specific tasks by maximizing accumulated rewards. However, the useful reward signal is only received when the desired goal is reached, leading to the issue of sparse rewards and affecting the efficiency of policy learning. In this paper, we propose a method to generate highly valued subgoals for efficient goal-conditioned policy learning, enabling the development of smart home robots or automatic pilots in our daily life. The highly valued subgoals are conditioned on the context of the specific tasks and characterized by suitable complexity for efficient goal-conditioned action value learning. The context variable captures the latent representation of the particular tasks, allowing for efficient subgoal generation. Additionally, the goal-conditioned action values regularized by the self-adaptive ranges generate subgoals with suitable complexity. Compared to Hindsight Experience Replay that uniformly samples subgoals from visited trajectories, our method generates the subgoals based on the context of tasks with suitable difficulty for efficient policy training. Experimental results show that our method achieves stable performance in robotic environments compared to baseline methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.