{"title":"强化学习的信息论状态变量选择","authors":"Charles Westphal, Stephen Hailes, Mirco Musolesi","doi":"arxiv-2401.11512","DOIUrl":null,"url":null,"abstract":"Identifying the most suitable variables to represent the state is a\nfundamental challenge in Reinforcement Learning (RL). These variables must\nefficiently capture the information necessary for making optimal decisions. In\norder to address this problem, in this paper, we introduce the Transfer Entropy\nRedundancy Criterion (TERC), an information-theoretic criterion, which\ndetermines if there is \\textit{entropy transferred} from state variables to\nactions during training. We define an algorithm based on TERC that provably\nexcludes variables from the state that have no effect on the final performance\nof the agent, resulting in more sample efficient learning. Experimental results\nshow that this speed-up is present across three different algorithm classes\n(represented by tabular Q-learning, Actor-Critic, and Proximal Policy\nOptimization (PPO)) in a variety of environments. Furthermore, to highlight the\ndifferences between the proposed methodology and the current state-of-the-art\nfeature selection approaches, we present a series of controlled experiments on\nsynthetic data, before generalizing to real-world decision-making tasks. We\nalso introduce a representation of the problem that compactly captures the\ntransfer of information from state variables to actions as Bayesian networks.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Information-Theoretic State Variable Selection for Reinforcement Learning\",\"authors\":\"Charles Westphal, Stephen Hailes, Mirco Musolesi\",\"doi\":\"arxiv-2401.11512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying the most suitable variables to represent the state is a\\nfundamental challenge in Reinforcement Learning (RL). These variables must\\nefficiently capture the information necessary for making optimal decisions. In\\norder to address this problem, in this paper, we introduce the Transfer Entropy\\nRedundancy Criterion (TERC), an information-theoretic criterion, which\\ndetermines if there is \\\\textit{entropy transferred} from state variables to\\nactions during training. We define an algorithm based on TERC that provably\\nexcludes variables from the state that have no effect on the final performance\\nof the agent, resulting in more sample efficient learning. Experimental results\\nshow that this speed-up is present across three different algorithm classes\\n(represented by tabular Q-learning, Actor-Critic, and Proximal Policy\\nOptimization (PPO)) in a variety of environments. Furthermore, to highlight the\\ndifferences between the proposed methodology and the current state-of-the-art\\nfeature selection approaches, we present a series of controlled experiments on\\nsynthetic data, before generalizing to real-world decision-making tasks. We\\nalso introduce a representation of the problem that compactly captures the\\ntransfer of information from state variables to actions as Bayesian networks.\",\"PeriodicalId\":501433,\"journal\":{\"name\":\"arXiv - CS - Information Theory\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2401.11512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.11512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Information-Theoretic State Variable Selection for Reinforcement Learning
Identifying the most suitable variables to represent the state is a
fundamental challenge in Reinforcement Learning (RL). These variables must
efficiently capture the information necessary for making optimal decisions. In
order to address this problem, in this paper, we introduce the Transfer Entropy
Redundancy Criterion (TERC), an information-theoretic criterion, which
determines if there is \textit{entropy transferred} from state variables to
actions during training. We define an algorithm based on TERC that provably
excludes variables from the state that have no effect on the final performance
of the agent, resulting in more sample efficient learning. Experimental results
show that this speed-up is present across three different algorithm classes
(represented by tabular Q-learning, Actor-Critic, and Proximal Policy
Optimization (PPO)) in a variety of environments. Furthermore, to highlight the
differences between the proposed methodology and the current state-of-the-art
feature selection approaches, we present a series of controlled experiments on
synthetic data, before generalizing to real-world decision-making tasks. We
also introduce a representation of the problem that compactly captures the
transfer of information from state variables to actions as Bayesian networks.