Long-Term Feature Extraction via Frequency Prediction for Efficient Reinforcement Learning

Jie Wang;Mingxuan Ye;Yufei Kuang;Rui Yang;Wengang Zhou;Houqiang Li;Feng Wu
{"title":"Long-Term Feature Extraction via Frequency Prediction for Efficient Reinforcement Learning","authors":"Jie Wang;Mingxuan Ye;Yufei Kuang;Rui Yang;Wengang Zhou;Houqiang Li;Feng Wu","doi":"10.1109/TPAMI.2025.3529264","DOIUrl":null,"url":null,"abstract":"Sample efficiency remains a key challenge for the deployment of deep reinforcement learning (RL) in real-world scenarios. A common approach is to learn efficient representations through future prediction tasks, facilitating the agent to make farsighted decisions that benefit its long-term performance. Existing methods extract predictive features by predicting multi-step future state signals. However, they do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we introduce a new perspective that leverages the frequency domain of state sequences to extract the underlying patterns in time series data. We theoretically show that state sequences contain structural information closely tied to policy performance and signal regularity and analyze the fitness of the frequency domain for extracting these two types of structural information. Inspired by that, we propose a novel representation learning method, <bold>S</b>tate Sequences <bold>P</b>rediction via <bold>F</b>ourier Transform (SPF), which extracts long-term features by predicting the Fourier transform of infinite-step future state sequences. The appealing features of our frequency prediction objective include: 1) simple to implement due to a recursive relationship; 2) providing an upper bound on the performance difference between the optimal policy and the latent policy in the representation space. Experiments on standard and goal-conditioned RL tasks demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3094-3110"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10839463/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sample efficiency remains a key challenge for the deployment of deep reinforcement learning (RL) in real-world scenarios. A common approach is to learn efficient representations through future prediction tasks, facilitating the agent to make farsighted decisions that benefit its long-term performance. Existing methods extract predictive features by predicting multi-step future state signals. However, they do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we introduce a new perspective that leverages the frequency domain of state sequences to extract the underlying patterns in time series data. We theoretically show that state sequences contain structural information closely tied to policy performance and signal regularity and analyze the fitness of the frequency domain for extracting these two types of structural information. Inspired by that, we propose a novel representation learning method, State Sequences Prediction via Fourier Transform (SPF), which extracts long-term features by predicting the Fourier transform of infinite-step future state sequences. The appealing features of our frequency prediction objective include: 1) simple to implement due to a recursive relationship; 2) providing an upper bound on the performance difference between the optimal policy and the latent policy in the representation space. Experiments on standard and goal-conditioned RL tasks demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过频率预测提取长期特征,实现高效强化学习
样本效率仍然是在现实场景中部署深度强化学习(RL)的关键挑战。一种常见的方法是通过未来预测任务学习有效的表示,促进智能体做出有利于其长期性能的有远见的决策。现有方法通过预测多步未来状态信号来提取预测特征。然而,它们没有充分利用序列状态信号中固有的结构信息,这些信息可以潜在地提高长期决策的质量,但在时域上难以识别。为了解决这个问题,我们引入了一种新的视角,利用状态序列的频域来提取时间序列数据中的潜在模式。我们从理论上证明了状态序列包含与策略性能和信号规律性密切相关的结构信息,并分析了提取这两类结构信息的频域适应度。受此启发,我们提出了一种新的表征学习方法——基于傅里叶变换的状态序列预测(SPF),该方法通过预测无限步未来状态序列的傅里叶变换来提取长期特征。我们的频率预测目标的吸引人的特点包括:1)由于递归关系,易于实现;2)给出表征空间中最优策略与潜在策略性能差异的上界。在标准和目标条件强化学习任务上的实验表明,所提出的方法在样本效率和性能方面优于几种最先进的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GrowSP++: Growing Superpoints and Primitives for Unsupervised 3D Semantic Segmentation. Unsupervised Gaze Representation Learning by Switching Features. H2OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers. MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection. Parse Trees Guided LLM Prompt Compression.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1