On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains

IF 8.7 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Systems Man Cybernetics-Systems Pub Date : 2024-12-04 DOI:10.1109/TSMC.2024.3505613
Bang Giang Le;Viet Cuong Ta
{"title":"On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains","authors":"Bang Giang Le;Viet Cuong Ta","doi":"10.1109/TSMC.2024.3505613","DOIUrl":null,"url":null,"abstract":"Soft actor-critic (SAC) is a reinforcement learning algorithm that employs the maximum entropy framework to train a stochastic policy. This work examines a specific failure case of SAC where the stochastic policy is trained to maximize the expected entropy from a sparse reward environment. We demonstrate that the over-exploration of SAC can make the entropy temperature collapse, followed by unstable updates to the actor. Based on our analyses, we introduce Reg-SAC, an improved version of SAC, to mitigate the detrimental effects of the entropy temperature on the learning stability of the stochastic policy. Reg-SAC incorporates a clipping value to prevent the entropy temperature collapse and regularizes the gradient updates of the policy via Kullback-Leibler divergence. Through experiments on discrete benchmarks, our proposed Reg-SAC outperforms the standard SAC in spare-reward grid world environments while it is able to maintain competitive performance in the dense-reward Atari benchmark. The results highlight that our regularized version makes the stochastic policy of SAC more stable in discrete-action domains.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1425-1438"},"PeriodicalIF":8.7000,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10777063/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Soft actor-critic (SAC) is a reinforcement learning algorithm that employs the maximum entropy framework to train a stochastic policy. This work examines a specific failure case of SAC where the stochastic policy is trained to maximize the expected entropy from a sparse reward environment. We demonstrate that the over-exploration of SAC can make the entropy temperature collapse, followed by unstable updates to the actor. Based on our analyses, we introduce Reg-SAC, an improved version of SAC, to mitigate the detrimental effects of the entropy temperature on the learning stability of the stochastic policy. Reg-SAC incorporates a clipping value to prevent the entropy temperature collapse and regularizes the gradient updates of the policy via Kullback-Leibler divergence. Through experiments on discrete benchmarks, our proposed Reg-SAC outperforms the standard SAC in spare-reward grid world environments while it is able to maintain competitive performance in the dense-reward Atari benchmark. The results highlight that our regularized version makes the stochastic policy of SAC more stable in discrete-action domains.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
离散行为域软行为评价的正则化方法有效性研究
软行为者批评(SAC)是一种采用最大熵框架来训练随机策略的强化学习算法。这项工作考察了SAC的一个特定失败案例,其中随机策略被训练为从稀疏奖励环境中最大化期望熵。我们证明了SAC的过度探索会导致熵温崩溃,随后会对行动者进行不稳定的更新。在此基础上,我们引入了一种改进的SAC - Reg-SAC,以减轻熵温对随机策略学习稳定性的不利影响。regg - sac采用了一个剪切值来防止熵温崩溃,并通过Kullback-Leibler散度对策略的梯度更新进行了正则化。通过在离散基准测试上的实验,我们提出的Reg-SAC在低奖励网格环境中优于标准SAC,同时能够在高奖励Atari基准测试中保持竞争性能。结果表明,我们的正则化版本使SAC的随机策略在离散作用域中更加稳定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Systems Man Cybernetics-Systems
IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS
CiteScore
18.50
自引率
11.50%
发文量
812
审稿时长
6 months
期刊介绍: The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.
期刊最新文献
Introducing IEEE Collabratec Introducing IEEE Collabratec TechRxiv: Share Your Preprint Research With the World! IEEE Systems, Man, and Cybernetics Society Information IEEE Systems, Man, and Cybernetics Society Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1