强化学习的信息论状态变量选择

arXiv - CS - Information Theory Pub Date : 2024-01-21 DOI:arxiv-2401.11512

Charles Westphal, Stephen Hailes, Mirco Musolesi

{"title":"强化学习的信息论状态变量选择","authors":"Charles Westphal, Stephen Hailes, Mirco Musolesi","doi":"arxiv-2401.11512","DOIUrl":null,"url":null,"abstract":"Identifying the most suitable variables to represent the state is a\nfundamental challenge in Reinforcement Learning (RL). These variables must\nefficiently capture the information necessary for making optimal decisions. In\norder to address this problem, in this paper, we introduce the Transfer Entropy\nRedundancy Criterion (TERC), an information-theoretic criterion, which\ndetermines if there is \\textit{entropy transferred} from state variables to\nactions during training. We define an algorithm based on TERC that provably\nexcludes variables from the state that have no effect on the final performance\nof the agent, resulting in more sample efficient learning. Experimental results\nshow that this speed-up is present across three different algorithm classes\n(represented by tabular Q-learning, Actor-Critic, and Proximal Policy\nOptimization (PPO)) in a variety of environments. Furthermore, to highlight the\ndifferences between the proposed methodology and the current state-of-the-art\nfeature selection approaches, we present a series of controlled experiments on\nsynthetic data, before generalizing to real-world decision-making tasks. We\nalso introduce a representation of the problem that compactly captures the\ntransfer of information from state variables to actions as Bayesian networks.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Information-Theoretic State Variable Selection for Reinforcement Learning\",\"authors\":\"Charles Westphal, Stephen Hailes, Mirco Musolesi\",\"doi\":\"arxiv-2401.11512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying the most suitable variables to represent the state is a\\nfundamental challenge in Reinforcement Learning (RL). These variables must\\nefficiently capture the information necessary for making optimal decisions. In\\norder to address this problem, in this paper, we introduce the Transfer Entropy\\nRedundancy Criterion (TERC), an information-theoretic criterion, which\\ndetermines if there is \\\\textit{entropy transferred} from state variables to\\nactions during training. We define an algorithm based on TERC that provably\\nexcludes variables from the state that have no effect on the final performance\\nof the agent, resulting in more sample efficient learning. Experimental results\\nshow that this speed-up is present across three different algorithm classes\\n(represented by tabular Q-learning, Actor-Critic, and Proximal Policy\\nOptimization (PPO)) in a variety of environments. Furthermore, to highlight the\\ndifferences between the proposed methodology and the current state-of-the-art\\nfeature selection approaches, we present a series of controlled experiments on\\nsynthetic data, before generalizing to real-world decision-making tasks. We\\nalso introduce a representation of the problem that compactly captures the\\ntransfer of information from state variables to actions as Bayesian networks.\",\"PeriodicalId\":501433,\"journal\":{\"name\":\"arXiv - CS - Information Theory\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2401.11512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.11512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

确定最适合代表状态的变量是强化学习（RL）中的一项基本挑战。这些变量必须能有效捕捉做出最优决策所需的信息。为了解决这个问题，我们在本文中引入了转移熵冗余准则（TERC），这是一个信息论准则，它决定了在训练过程中是否存在从状态变量到动作的文本{熵转移}。我们定义了一种基于 TERC 的算法，它可以证明排除了对代理的最终性能没有影响的状态变量，从而提高了学习的样本效率。实验结果表明，在各种环境下，这种速度提升体现在三种不同的算法类别中（以表格 Q-learning、行为批判（Actor-Critic）和近端策略优化（Proximal PolicyOptimization，PPO）为代表）。此外，为了突出所提出的方法与当前最先进的特征选择方法之间的差异，我们先在合成数据上进行了一系列受控实验，然后再推广到现实世界的决策任务中。我们还介绍了一种问题表示法，它以贝叶斯网络的形式紧凑地捕捉了从状态变量到行动的信息传递。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Information-Theoretic State Variable Selection for Reinforcement Learning

Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Theory

自引率

0.00%

发文量