强化学习的信息论状态变量选择

Charles Westphal, Stephen Hailes, Mirco Musolesi
{"title":"强化学习的信息论状态变量选择","authors":"Charles Westphal, Stephen Hailes, Mirco Musolesi","doi":"arxiv-2401.11512","DOIUrl":null,"url":null,"abstract":"Identifying the most suitable variables to represent the state is a\nfundamental challenge in Reinforcement Learning (RL). These variables must\nefficiently capture the information necessary for making optimal decisions. In\norder to address this problem, in this paper, we introduce the Transfer Entropy\nRedundancy Criterion (TERC), an information-theoretic criterion, which\ndetermines if there is \\textit{entropy transferred} from state variables to\nactions during training. We define an algorithm based on TERC that provably\nexcludes variables from the state that have no effect on the final performance\nof the agent, resulting in more sample efficient learning. Experimental results\nshow that this speed-up is present across three different algorithm classes\n(represented by tabular Q-learning, Actor-Critic, and Proximal Policy\nOptimization (PPO)) in a variety of environments. Furthermore, to highlight the\ndifferences between the proposed methodology and the current state-of-the-art\nfeature selection approaches, we present a series of controlled experiments on\nsynthetic data, before generalizing to real-world decision-making tasks. We\nalso introduce a representation of the problem that compactly captures the\ntransfer of information from state variables to actions as Bayesian networks.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Information-Theoretic State Variable Selection for Reinforcement Learning\",\"authors\":\"Charles Westphal, Stephen Hailes, Mirco Musolesi\",\"doi\":\"arxiv-2401.11512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying the most suitable variables to represent the state is a\\nfundamental challenge in Reinforcement Learning (RL). These variables must\\nefficiently capture the information necessary for making optimal decisions. In\\norder to address this problem, in this paper, we introduce the Transfer Entropy\\nRedundancy Criterion (TERC), an information-theoretic criterion, which\\ndetermines if there is \\\\textit{entropy transferred} from state variables to\\nactions during training. We define an algorithm based on TERC that provably\\nexcludes variables from the state that have no effect on the final performance\\nof the agent, resulting in more sample efficient learning. Experimental results\\nshow that this speed-up is present across three different algorithm classes\\n(represented by tabular Q-learning, Actor-Critic, and Proximal Policy\\nOptimization (PPO)) in a variety of environments. Furthermore, to highlight the\\ndifferences between the proposed methodology and the current state-of-the-art\\nfeature selection approaches, we present a series of controlled experiments on\\nsynthetic data, before generalizing to real-world decision-making tasks. We\\nalso introduce a representation of the problem that compactly captures the\\ntransfer of information from state variables to actions as Bayesian networks.\",\"PeriodicalId\":501433,\"journal\":{\"name\":\"arXiv - CS - Information Theory\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2401.11512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.11512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

确定最适合代表状态的变量是强化学习(RL)中的一项基本挑战。这些变量必须能有效捕捉做出最优决策所需的信息。为了解决这个问题,我们在本文中引入了转移熵冗余准则(TERC),这是一个信息论准则,它决定了在训练过程中是否存在从状态变量到动作的文本{熵转移}。我们定义了一种基于 TERC 的算法,它可以证明排除了对代理的最终性能没有影响的状态变量,从而提高了学习的样本效率。实验结果表明,在各种环境下,这种速度提升体现在三种不同的算法类别中(以表格 Q-learning、行为批判(Actor-Critic)和近端策略优化(Proximal PolicyOptimization,PPO)为代表)。此外,为了突出所提出的方法与当前最先进的特征选择方法之间的差异,我们先在合成数据上进行了一系列受控实验,然后再推广到现实世界的决策任务中。我们还介绍了一种问题表示法,它以贝叶斯网络的形式紧凑地捕捉了从状态变量到行动的信息传递。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Information-Theoretic State Variable Selection for Reinforcement Learning
Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massive MIMO CSI Feedback using Channel Prediction: How to Avoid Machine Learning at UE? Reverse em-problem based on Bregman divergence and its application to classical and quantum information theory From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation Electrochemical Communication in Bacterial Biofilms: A Study on Potassium Stimulation and Signal Transmission Semantics-Empowered Space-Air-Ground-Sea Integrated Network: New Paradigm, Frameworks, and Challenges
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1