Understanding world models through multi-step pruning policy via reinforcement learning

IF 8.1 1区 计算机科学 N/A COMPUTER SCIENCE, INFORMATION SYSTEMS Information Sciences Pub Date : 2024-08-22 DOI:10.1016/j.ins.2024.121361
{"title":"Understanding world models through multi-step pruning policy via reinforcement learning","authors":"","doi":"10.1016/j.ins.2024.121361","DOIUrl":null,"url":null,"abstract":"<div><p>In model-based reinforcement learning, the conventional approach to addressing world model bias is to use gradient optimization methods. However, using a singular policy from gradient optimization methods in response to a world model bias inevitably results in an inherently biased policy. This is because of constraints on the imperfect and dynamic data of state-action pairs. The gap between the world model and the real environment can never be completely eliminated. This article introduces a novel approach that explores a variety of policies instead of focusing on either world model bias or singular policy bias. Specifically, we introduce the Multi-Step Pruning Policy (MSPP), which aims to reduce redundant actions and compress the action and state spaces. This approach encourages a different perspective within the same world model. To achieve this, we use multiple pruning policies in parallel and integrate their outputs using the cross-entropy method. Additionally, we provide a convergence analysis of the pruning policy theory in tabular form and an updated parameter theoretical framework. In the experimental section, the newly proposed MSPP method demonstrates a comprehensive understanding of the world model and outperforms existing state-of-the-art model-based reinforcement learning baseline techniques.</p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524012751","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"N/A","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In model-based reinforcement learning, the conventional approach to addressing world model bias is to use gradient optimization methods. However, using a singular policy from gradient optimization methods in response to a world model bias inevitably results in an inherently biased policy. This is because of constraints on the imperfect and dynamic data of state-action pairs. The gap between the world model and the real environment can never be completely eliminated. This article introduces a novel approach that explores a variety of policies instead of focusing on either world model bias or singular policy bias. Specifically, we introduce the Multi-Step Pruning Policy (MSPP), which aims to reduce redundant actions and compress the action and state spaces. This approach encourages a different perspective within the same world model. To achieve this, we use multiple pruning policies in parallel and integrate their outputs using the cross-entropy method. Additionally, we provide a convergence analysis of the pruning policy theory in tabular form and an updated parameter theoretical framework. In the experimental section, the newly proposed MSPP method demonstrates a comprehensive understanding of the world model and outperforms existing state-of-the-art model-based reinforcement learning baseline techniques.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过强化学习的多步剪枝策略了解世界模型
在基于模型的强化学习中,解决世界模型偏差的传统方法是使用梯度优化方法。然而,使用梯度优化方法中的奇异策略来应对世界模型偏差,不可避免地会导致策略本身存在偏差。这是因为状态-行动配对的不完美和动态数据存在限制。世界模型与真实环境之间的差距永远无法完全消除。本文介绍了一种探索多种政策的新方法,而不是只关注世界模型偏差或单一政策偏差。具体来说,我们引入了多步剪枝策略(MSPP),旨在减少冗余动作,压缩动作和状态空间。这种方法鼓励在同一世界模型中采用不同的视角。为此,我们并行使用多个剪枝策略,并使用交叉熵方法整合它们的输出。此外,我们还以表格形式提供了剪枝策略理论的收敛性分析和更新的参数理论框架。在实验部分,新提出的 MSPP 方法展示了对世界模型的全面理解,并优于现有最先进的基于模型的强化学习基线技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
期刊最新文献
Ex-RL: Experience-based reinforcement learning Editorial Board Joint consensus kernel learning and adaptive hypergraph regularization for graph-based clustering RT-DIFTWD: A novel data-driven intuitionistic fuzzy three-way decision model with regret theory Granular correlation-based label-specific feature augmentation for multi-label classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1