A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2023-12-07 DOI:10.1007/s12559-023-10226-4
Zhenghongyuan Ni, Ye Jin, Peng Liu, Wei Zhao
{"title":"A Novel Heuristic Exploration Method Based on Action Effectiveness Constraints to Relieve Loop Enhancement Effect in Reinforcement Learning with Sparse Rewards","authors":"Zhenghongyuan Ni, Ye Jin, Peng Liu, Wei Zhao","doi":"10.1007/s12559-023-10226-4","DOIUrl":null,"url":null,"abstract":"<p>In realistic sparse reward tasks, existing theoretical methods cannot be effectively applied due to the low sampling probability ofrewarded episodes. Profound research on methods based on intrinsic rewards has been conducted to address this issue, but exploration with sparse rewards remains a great challenge. This paper describes the loop enhancement effect in exploration processes with sparse rewards. After each fully trained iteration, the execution probability of ineffective actions is higher than thatof other suboptimal actions, which violates biological habitual behavior principles and is not conducive to effective training. This paper proposes corresponding theorems of relieving the loop enhancement effect in the exploration process with sparse rewards and a heuristic exploration method based on action effectiveness constraints (AEC), which improves policy training efficiency by relieving the loop enhancement effect. Inspired by the fact that animals form habitual behaviors and goal-directed behaviors through the dorsolateral striatum and dorsomedial striatum. The function of the dorsolateral striatum is simulated by an action effectiveness evaluation mechanism (A2EM), which aims to reduce the rate of ineffective samples and improve episode reward expectations. The function of the dorsomedial striatum is simulated by an agent policy network, which aims to achieve task goals. The iterative training of A2EM and the policy forms the AEC model structure. A2EM provides effective samples for the agent policy; the agent policy provides training constraints for A2EM. The experimental results show that A2EM can relieve the loop enhancement effect and has good interpretability and generalizability. AEC enables agents to effectively reduce the loop rate in samples, can collect more effective samples, and improve the efficiency of policy training. The performance of AEC demonstrates the effectiveness of a biological heuristic approach that simulates the function of the dorsal striatum. This approach can be used to improve the robustness of agent exploration with sparse rewards.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"46 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-023-10226-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In realistic sparse reward tasks, existing theoretical methods cannot be effectively applied due to the low sampling probability ofrewarded episodes. Profound research on methods based on intrinsic rewards has been conducted to address this issue, but exploration with sparse rewards remains a great challenge. This paper describes the loop enhancement effect in exploration processes with sparse rewards. After each fully trained iteration, the execution probability of ineffective actions is higher than thatof other suboptimal actions, which violates biological habitual behavior principles and is not conducive to effective training. This paper proposes corresponding theorems of relieving the loop enhancement effect in the exploration process with sparse rewards and a heuristic exploration method based on action effectiveness constraints (AEC), which improves policy training efficiency by relieving the loop enhancement effect. Inspired by the fact that animals form habitual behaviors and goal-directed behaviors through the dorsolateral striatum and dorsomedial striatum. The function of the dorsolateral striatum is simulated by an action effectiveness evaluation mechanism (A2EM), which aims to reduce the rate of ineffective samples and improve episode reward expectations. The function of the dorsomedial striatum is simulated by an agent policy network, which aims to achieve task goals. The iterative training of A2EM and the policy forms the AEC model structure. A2EM provides effective samples for the agent policy; the agent policy provides training constraints for A2EM. The experimental results show that A2EM can relieve the loop enhancement effect and has good interpretability and generalizability. AEC enables agents to effectively reduce the loop rate in samples, can collect more effective samples, and improve the efficiency of policy training. The performance of AEC demonstrates the effectiveness of a biological heuristic approach that simulates the function of the dorsal striatum. This approach can be used to improve the robustness of agent exploration with sparse rewards.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于行动效果约束的新型启发式探索方法,用于缓解奖励稀疏的强化学习中的循环增强效应
在现实的稀疏奖励任务中,由于奖励情节的抽样概率较低,现有的理论方法无法有效应用。为了解决这个问题,人们对基于内在奖励的方法进行了深入研究,但在奖励稀少的情况下进行探索仍然是一个巨大的挑战。本文介绍了奖励稀疏的探索过程中的循环增强效应。在每次完全训练迭代后,无效动作的执行概率高于其他次优动作的执行概率,这违反了生物习惯行为原理,不利于有效训练。本文提出了在奖励稀疏的探索过程中缓解循环增强效应的相应定理和基于行动有效性约束(AEC)的启发式探索方法,通过缓解循环增强效应提高了策略训练效率。灵感来源于动物通过背外侧纹状体和背内侧纹状体形成习惯性行为和目标定向行为的事实。背外侧纹状体的功能是通过行动有效性评估机制(A2EM)来模拟的,其目的是降低无效样本的比率并改善情节奖励预期。背内侧纹状体的功能由代理策略网络模拟,旨在实现任务目标。A2EM 和策略的迭代训练构成了 AEC 模型结构。A2EM 为代理策略提供有效样本;代理策略为 A2EM 提供训练约束。实验结果表明,A2EM 可以缓解循环增强效应,并具有良好的可解释性和可推广性。AEC 能使代理有效降低样本的循环率,收集更多有效样本,提高策略训练的效率。AEC 的性能证明了模拟背侧纹状体功能的生物启发式方法的有效性。这种方法可用于提高奖励稀疏的代理探索的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Cognitive Computation
Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES
CiteScore
9.30
自引率
3.70%
发文量
116
审稿时长
>12 weeks
期刊介绍: Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.
期刊最新文献
A Joint Network for Low-Light Image Enhancement Based on Retinex Incorporating Template-Based Contrastive Learning into Cognitively Inspired, Low-Resource Relation Extraction A Novel Cognitive Rough Approach for Severity Analysis of Autistic Children Using Spherical Fuzzy Bipolar Soft Sets Cognitively Inspired Three-Way Decision Making and Bi-Level Evolutionary Optimization for Mobile Cybersecurity Threats Detection: A Case Study on Android Malware Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1