Enhancing Reinforcement Learning via Transformer-Based State Predictive Representations

IEEE transactions on artificial intelligence Pub Date : 2024-03-21 DOI:10.1109/TAI.2024.3379969

Minsong Liu;Yuanheng Zhu;Yaran Chen;Dongbin Zhao

{"title":"Enhancing Reinforcement Learning via Transformer-Based State Predictive Representations","authors":"Minsong Liu;Yuanheng Zhu;Yaran Chen;Dongbin Zhao","doi":"10.1109/TAI.2024.3379969","DOIUrl":null,"url":null,"abstract":"Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there still remain significant challenges in achieving a comprehensive understanding and learning of information within long sequences. Motivated by this, we introduce a transformer-based state predictive representations (TSPR)\n<xref>1</xref>\n<fn><label>1</label>Our code will be released at <uri>https://github.com/gourmet-liu/TSPR</uri></fn>\n auxiliary task that promotes better representation learning through self-supervised goals. Specifically, we design a transformer-based predictive model to establish unidirectional and bidirectional prediction tasks for predicting state representations within the latent space. TSPR effectively exploits contextual information within sequences to learn more informative state representations, thereby contributing to the enhancement of policy training in RL. Extensive experiments demonstrate that the combination of TSPR with off-policy RL algorithms leads to a substantial improvement in the sample efficiency of RL. Furthermore, TSPR outperforms state-of-the-art sample-efficient RL methods on both the multiple continuous control (DMControl) and discrete control(Atari) tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 9","pages":"4364-4375"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10477774/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Enhancing state representations can effectively mitigate the issue of low sample efficiency in reinforcement learning (RL) within high-dimensional input environments. Existing methods attempt to improve sample efficiency by learning predictive state representations from sequence data. However, there still remain significant challenges in achieving a comprehensive understanding and learning of information within long sequences. Motivated by this, we introduce a transformer-based state predictive representations (TSPR) ¹ ¹

Our code will be released at https://github.com/gourmet-liu/TSPR

auxiliary task that promotes better representation learning through self-supervised goals. Specifically, we design a transformer-based predictive model to establish unidirectional and bidirectional prediction tasks for predicting state representations within the latent space. TSPR effectively exploits contextual information within sequences to learn more informative state representations, thereby contributing to the enhancement of policy training in RL. Extensive experiments demonstrate that the combination of TSPR with off-policy RL algorithms leads to a substantial improvement in the sample efficiency of RL. Furthermore, TSPR outperforms state-of-the-art sample-efficient RL methods on both the multiple continuous control (DMControl) and discrete control(Atari) tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过基于变压器的状态预测表示加强强化学习

在高维输入环境中，增强状态表征可以有效缓解强化学习（RL）中样本效率低的问题。现有方法试图通过从序列数据中学习预测性状态表征来提高采样效率。然而，要全面理解和学习长序列中的信息，仍然存在巨大挑战。受此启发，我们引入了基于变压器的状态预测表征（TSPR）11我们的代码将在 https://github.com/gourmet-liu/TSPR 发布，该辅助任务通过自我监督目标促进更好的表征学习。具体来说，我们设计了一个基于变压器的预测模型，以建立单向和双向预测任务，用于预测潜空间内的状态表征。TSPR 能有效利用序列中的上下文信息来学习信息量更大的状态表征，从而有助于增强 RL 中的策略训练。大量实验证明，将 TSPR 与非策略 RL 算法相结合，可大幅提高 RL 的采样效率。此外，TSPR 在多重连续控制（DMControl）和离散控制（Atari）任务上的表现都优于最先进的样本效率 RL 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量