Re-attentive experience replay in off-policy reinforcement learning

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-02-22 DOI:10.1007/s10994-023-06505-8

Wei Wei, Da Wang, Lin Li, Jiye Liang

{"title":"Re-attentive experience replay in off-policy reinforcement learning","authors":"Wei Wei, Da Wang, Lin Li, Jiye Liang","doi":"10.1007/s10994-023-06505-8","DOIUrl":null,"url":null,"abstract":"<p>Experience replay, which stores past samples for reuse, has become a fundamental component of off-policy reinforcement learning. Some pioneering works have indicated that prioritization or reweighting of samples with on-policiness can yield significant performance improvements. However, this method doesn’t pay enough attention to sample diversity, which may result in instability or even long-term performance slumps. In this work, we introduce a novel Re-attention criterion to reevaluate recent experiences, thus benefiting the agent from learning about them. We call this overall algorithm, Re-attentive Experience Replay (RAER). RAER employs a parameter-insensitive dynamic testing technique to enhance the attention of samples generated by policies with promising trends in overall performance. By wisely leveraging diverse samples, RAER fulfills the positive effects of on-policiness while avoiding its potential negative influences. Extensive experiments demonstrate the effectiveness of RAER in improving both performance and stability. Moreover, replacing the on-policiness component of the state-of-the-art approach with RAER can yield significant benefits.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"23 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-023-06505-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Experience replay, which stores past samples for reuse, has become a fundamental component of off-policy reinforcement learning. Some pioneering works have indicated that prioritization or reweighting of samples with on-policiness can yield significant performance improvements. However, this method doesn’t pay enough attention to sample diversity, which may result in instability or even long-term performance slumps. In this work, we introduce a novel Re-attention criterion to reevaluate recent experiences, thus benefiting the agent from learning about them. We call this overall algorithm, Re-attentive Experience Replay (RAER). RAER employs a parameter-insensitive dynamic testing technique to enhance the attention of samples generated by policies with promising trends in overall performance. By wisely leveraging diverse samples, RAER fulfills the positive effects of on-policiness while avoiding its potential negative influences. Extensive experiments demonstrate the effectiveness of RAER in improving both performance and stability. Moreover, replacing the on-policiness component of the state-of-the-art approach with RAER can yield significant benefits.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非政策强化学习中的再注意经验重放

经验重放可以存储过去的样本以供重复使用，已成为非政策强化学习的基本组成部分。一些开创性的工作表明，对具有政策性的样本进行优先排序或重新加权可以显著提高性能。然而，这种方法对样本多样性关注不够，可能会导致性能不稳定甚至长期下滑。在这项工作中，我们引入了一种新颖的 "重新关注"（Re-attention）准则来重新评估最近的经验，从而使代理从学习这些经验中受益。我们将这一整体算法称为 "重新关注经验重放"（RAER）。RAER 采用了对参数不敏感的动态测试技术，以加强对总体性能趋势良好的策略所产生的样本的关注。通过明智地利用各种样本，RAER 在避免政策潜在负面影响的同时，还能发挥政策的积极作用。大量实验证明，RAER 在提高性能和稳定性方面都很有效。此外，用 RAER 取代最先进方法中的 "警戒性 "部分也能产生显著效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.

期刊最新文献

Linear Causal Discovery with Interventional Constraints. Interpretable optimisation-based approach for hyper-box classification. Deep latent force models: ODE-based process convolutions for Bayesian deep learning. Offline reinforcement learning for learning to dispatch for job shop scheduling. Computing the distance between unbalanced distributions: the flat metric.