Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2024-10-29 DOI:10.1109/TNNLS.2024.3481887

Botao Dong;Longyang Huang;Ning Pang;Hongtian Chen;Weidong Zhang

{"title":"Historical Decision-Making Regularized Maximum Entropy Reinforcement Learning","authors":"Botao Dong;Longyang Huang;Ning Pang;Hongtian Chen;Weidong Zhang","doi":"10.1109/TNNLS.2024.3481887","DOIUrl":null,"url":null,"abstract":"The challenge of the exploration-exploitation dilemma persists in off-policy reinforcement learning (RL) algorithms, impeding the improvement of policy performance and sample efficiency. To tackle this challenge, a novel historical decision-making regularized maximum entropy (HDMRME) RL algorithm is developed to strike the balance between exploration and exploitation. Built upon the maximum entropy RL framework, the historical decision-making regularization method is proposed to enhance the exploitation capability of RL policies. The theoretical analysis involves proving the convergence of HDMRME, investigating the tradeoff between exploration and exploitation of HDMRME, examining the disparity between the Q-function learned through HDMRME and the classic one, and analyzing the suboptimality of the trained policy. The performance of HDMRME is evaluated across various continuous-action control tasks from Mujoco and OpenAI Gym platforms. Comparative experiments demonstrate that HDMRME exhibits superior sample efficiency and achieves more competitive performance compared with other state-of-the-art RL algorithms.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 7","pages":"13446-13459"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737895/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The challenge of the exploration-exploitation dilemma persists in off-policy reinforcement learning (RL) algorithms, impeding the improvement of policy performance and sample efficiency. To tackle this challenge, a novel historical decision-making regularized maximum entropy (HDMRME) RL algorithm is developed to strike the balance between exploration and exploitation. Built upon the maximum entropy RL framework, the historical decision-making regularization method is proposed to enhance the exploitation capability of RL policies. The theoretical analysis involves proving the convergence of HDMRME, investigating the tradeoff between exploration and exploitation of HDMRME, examining the disparity between the Q-function learned through HDMRME and the classic one, and analyzing the suboptimality of the trained policy. The performance of HDMRME is evaluated across various continuous-action control tasks from Mujoco and OpenAI Gym platforms. Comparative experiments demonstrate that HDMRME exhibits superior sample efficiency and achieves more competitive performance compared with other state-of-the-art RL algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

历史决策正则化最大熵强化学习

在非策略强化学习（RL）算法中，探索-利用困境的挑战一直存在，阻碍了策略性能和样本效率的提高。为了解决这一问题，开发了一种新的历史决策正则化最大熵（HDMRME） RL算法来平衡勘探和开采。在最大熵强化学习框架的基础上，提出了历史决策正则化方法来增强强化学习策略的开发能力。理论分析包括证明HDMRME的收敛性，研究HDMRME的探索和利用之间的权衡，检查通过HDMRME学习的q函数与经典q函数之间的差异，以及分析训练策略的次优性。HDMRME的性能在Mujoco和OpenAI Gym平台的各种连续动作控制任务中进行了评估。对比实验表明，与其他先进的强化学习算法相比，HDMRME具有优越的采样效率和更具竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.