Mild evaluation policy via dataset constraint for offline reinforcement learning

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-02-17 DOI:10.1016/j.eswa.2025.126842

Xue Li , Xinghong Ling

{"title":"Mild evaluation policy via dataset constraint for offline reinforcement learning","authors":"Xue Li , Xinghong Ling","doi":"10.1016/j.eswa.2025.126842","DOIUrl":null,"url":null,"abstract":"<div><div>Offline reinforcement learning (RL) agents seek optimal policies from fixed datasets. Policy constraints are typically employed to adhere to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions. Conventional approaches apply identical constraints for both value learning and test time inference. However, the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a mild evaluation policy via dataset constraint (MEDC) for test time inference with a more constrained target policy for value estimation. MEDC introduces a dual-policy constraint, comprising a target policy and an evaluation policy. The evaluation policy regularize the policy towards the nearest state–action pair, with behavior cloning performed on the target policy. The distributional shift is effectively addressed through the combination of dataset constraint and behavior cloning. The TD3 is employed to direct the policy in selecting actions that maximize the return. Moreover, MEDC achieves state-of-the-art performance compared with existing methods on the D4RL datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"274 ","pages":"Article 126842"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425004646","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Offline reinforcement learning (RL) agents seek optimal policies from fixed datasets. Policy constraints are typically employed to adhere to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions. Conventional approaches apply identical constraints for both value learning and test time inference. However, the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a mild evaluation policy via dataset constraint (MEDC) for test time inference with a more constrained target policy for value estimation. MEDC introduces a dual-policy constraint, comprising a target policy and an evaluation policy. The evaluation policy regularize the policy towards the nearest state–action pair, with behavior cloning performed on the target policy. The distributional shift is effectively addressed through the combination of dataset constraint and behavior cloning. The TD3 is employed to direct the policy in selecting actions that maximize the return. Moreover, MEDC achieves state-of-the-art performance compared with existing methods on the D4RL datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.