Incremental model-based reinforcement learning with model constraint

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2025-02-08 DOI:10.1016/j.neunet.2025.107245
Zhiyou Yang , Mingsheng Fu , Hong Qu , Fan Li , Shuqing Shi , Wang Hu
{"title":"Incremental model-based reinforcement learning with model constraint","authors":"Zhiyou Yang ,&nbsp;Mingsheng Fu ,&nbsp;Hong Qu ,&nbsp;Fan Li ,&nbsp;Shuqing Shi ,&nbsp;Wang Hu","doi":"10.1016/j.neunet.2025.107245","DOIUrl":null,"url":null,"abstract":"<div><div>In model-based reinforcement learning (RL) approaches, the estimated model of a real environment is learned with limited data and then utilized for policy optimization. As a result, the policy optimization process in model-based RL is influenced by both policy and estimated model updates. In practice, previous model-based RL methods only perform incremental policy constraint to policy updates, which cannot assure the complete incremental updates, thereby limiting the algorithm’s performance. To address this issue, we propose an incremental model-based RL update scheme by analyzing the policy optimization procedure of model-based RL. This scheme includes both an incremental model constraint that guarantees incremental updates to the estimated model, and an incremental policy constraint that ensures incremental updates to the policy. Further, we establish a performance bound incorporating the incremental model-based RL update scheme between the real environment and the estimated model, which can assure non-decreasing policy performance improvement in the real environment. To implement the incremental model-based RL update scheme, we develop a simple and efficient model-based RL algorithm known as <strong>IMPO</strong> (<strong>I</strong>ncremental <strong>M</strong>odel-based <strong>P</strong>olicy <strong>O</strong>ptimization), which leverages previous knowledge to enhance stability during the learning process. Experimental results across various control benchmarks demonstrate that IMPO significantly outperforms previous state-of-the-art model-based RL methods in terms of overall performance and sample efficiency.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107245"},"PeriodicalIF":6.3000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025001248","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In model-based reinforcement learning (RL) approaches, the estimated model of a real environment is learned with limited data and then utilized for policy optimization. As a result, the policy optimization process in model-based RL is influenced by both policy and estimated model updates. In practice, previous model-based RL methods only perform incremental policy constraint to policy updates, which cannot assure the complete incremental updates, thereby limiting the algorithm’s performance. To address this issue, we propose an incremental model-based RL update scheme by analyzing the policy optimization procedure of model-based RL. This scheme includes both an incremental model constraint that guarantees incremental updates to the estimated model, and an incremental policy constraint that ensures incremental updates to the policy. Further, we establish a performance bound incorporating the incremental model-based RL update scheme between the real environment and the estimated model, which can assure non-decreasing policy performance improvement in the real environment. To implement the incremental model-based RL update scheme, we develop a simple and efficient model-based RL algorithm known as IMPO (Incremental Model-based Policy Optimization), which leverages previous knowledge to enhance stability during the learning process. Experimental results across various control benchmarks demonstrate that IMPO significantly outperforms previous state-of-the-art model-based RL methods in terms of overall performance and sample efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于模型约束的增量模型强化学习
在基于模型的强化学习(RL)方法中,使用有限的数据学习真实环境的估计模型,然后将其用于策略优化。因此,基于模型的强化学习中的策略优化过程同时受到策略更新和估计模型更新的影响。在实践中,以往基于模型的强化学习方法仅对策略更新进行增量策略约束,无法保证完全的增量更新,从而限制了算法的性能。为了解决这一问题,我们通过分析基于模型的强化学习的策略优化过程,提出了一种基于模型的增量式强化学习更新方案。该方案既包括保证对估计模型进行增量更新的增量模型约束,也包括确保对策略进行增量更新的增量策略约束。此外,我们在真实环境和估计模型之间建立了一个包含基于增量模型的RL更新方案的性能边界,以确保在真实环境中策略性能的不下降改进。为了实现基于增量模型的RL更新方案,我们开发了一种简单有效的基于模型的RL算法,称为IMPO(增量模型策略优化),该算法利用先前的知识来增强学习过程中的稳定性。各种控制基准的实验结果表明,IMPO在整体性能和样本效率方面明显优于以前最先进的基于模型的RL方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
期刊最新文献
Incremental multi-subreservoirs echo state network control for uncertain aeration process. CMMDL: Cross-modal multi-domain learning method for image fusion. RepAttn3D: Re-parameterizing 3D attention with spatiotemporal augmentation for video understanding. Transforming tabular data into images for deep learning models Trainable-parameter-free structural-diversity message passing for graph neural networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1