Accelerating Deep Reinforcement Learning Under the Guidance of Adaptive Fuzzy Logic Rules

Min Wang, Xingzhong Wang, Wei Luo, Yixue Huang, Yuanqiang Yu
{"title":"Accelerating Deep Reinforcement Learning Under the Guidance of Adaptive Fuzzy Logic Rules","authors":"Min Wang, Xingzhong Wang, Wei Luo, Yixue Huang, Yuanqiang Yu","doi":"10.1109/phm2022-london52454.2022.00068","DOIUrl":null,"url":null,"abstract":"While Deep Reinforcement Learning (DRL) has emerged as a prospective method to many tough tasks, it remains laborious to train DRL agents with a handful of data collection and high sample efficiency. In this paper, we present an Adaptive Fuzzy Reinforcement Learning framework (AFuRL) for accelerating the learning process by incorporating adaptive fuzzy logic rules, enabling DRL agents to improve the efficiency of exploring the state space. In AFuRL, the DRL agent first leverages prior fuzzy logic rules designed especially for the actor-critic framework to learn some near-optimal policies, then further improves these policies by automatically generating adaptive fuzzy rules from state-action pairs. Ultimately, the RL algorithm is applied to refine the rough policy obtained by a fuzzy controller. We demonstrate the validity of AFuRL in both discrete and continuous control tasks, where our method surpasses DRL algorithms by a substantial margin. The experiment results show that AFuRL can find superior policies in comparison with imitation-based and some prior knowledge-based approaches.","PeriodicalId":269605,"journal":{"name":"2022 Prognostics and Health Management Conference (PHM-2022 London)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Prognostics and Health Management Conference (PHM-2022 London)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/phm2022-london52454.2022.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While Deep Reinforcement Learning (DRL) has emerged as a prospective method to many tough tasks, it remains laborious to train DRL agents with a handful of data collection and high sample efficiency. In this paper, we present an Adaptive Fuzzy Reinforcement Learning framework (AFuRL) for accelerating the learning process by incorporating adaptive fuzzy logic rules, enabling DRL agents to improve the efficiency of exploring the state space. In AFuRL, the DRL agent first leverages prior fuzzy logic rules designed especially for the actor-critic framework to learn some near-optimal policies, then further improves these policies by automatically generating adaptive fuzzy rules from state-action pairs. Ultimately, the RL algorithm is applied to refine the rough policy obtained by a fuzzy controller. We demonstrate the validity of AFuRL in both discrete and continuous control tasks, where our method surpasses DRL algorithms by a substantial margin. The experiment results show that AFuRL can find superior policies in comparison with imitation-based and some prior knowledge-based approaches.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自适应模糊逻辑规则指导下加速深度强化学习
虽然深度强化学习(DRL)已经成为许多艰巨任务的一种有前景的方法,但由于数据收集量少,样本效率高,训练DRL智能体仍然很费力。在本文中,我们提出了一个自适应模糊强化学习框架(AFuRL),通过引入自适应模糊逻辑规则来加速学习过程,使DRL代理能够提高探索状态空间的效率。在AFuRL中,DRL代理首先利用专门为参与者-批评者框架设计的先验模糊逻辑规则来学习一些接近最优的策略,然后通过从状态-动作对中自动生成自适应模糊规则来进一步改进这些策略。最后,应用强化学习算法对模糊控制器得到的粗糙策略进行细化。我们证明了AFuRL在离散和连续控制任务中的有效性,我们的方法在很大程度上超过了DRL算法。实验结果表明,与基于模仿和一些基于先验知识的方法相比,AFuRL可以找到更优的策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Defending Against Adversarial Attacks on Time- series with Selective Classification Fault diagnosis of fire control system based on genetic algorithm optimized BP neural network Monitoring and Mitigating Ionosphere threats in GNSS Space Environment Science A Relation Prediction Method for Industrial Knowledge Graph with Complex Relations Condition Monitoring of Wind Turbine Main Bearing Using SCADA Data and Informed by the Principle of Energy Conservation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1