Jesse P Geerts, Samuel J Gershman, Neil Burgess, Kimberly L Stachenfeld
{"title":"A probabilistic successor representation for context-dependent learning.","authors":"Jesse P Geerts, Samuel J Gershman, Neil Burgess, Kimberly L Stachenfeld","doi":"10.1037/rev0000414","DOIUrl":null,"url":null,"abstract":"<p><p>Two of the main impediments to learning complex tasks are that relationships between different stimuli, including rewards, can be uncertain and context-dependent. Reinforcement learning (RL) provides a framework for learning, by predicting total future reward directly (model-free RL), or via predictions of future states (model-based RL). Within this framework, \"successor representation\" (SR) predicts total future occupancy of all states. A recent theoretical proposal suggests that the hippocampus encodes the SR in order to facilitate prediction of future reward. However, this proposal does not take into account how learning should adapt under uncertainty and switches of context. Here, we introduce a theory of learning SRs using prediction errors which includes optimally balancing uncertainty in new observations versus existing knowledge. We then generalize that approach to a multicontext setting, allowing the model to learn and maintain multiple task-specific SRs and infer which one to use at any moment based on the accuracy of its predictions. Thus, the context used for predictions can be determined by both the contents of the states themselves and the distribution of transitions between them. This probabilistic SR model captures animal behavior in tasks which require contextual memory and generalization, and unifies previous SR theory with hippocampal-dependent contextual decision-making. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":21016,"journal":{"name":"Psychological review","volume":" ","pages":"578-597"},"PeriodicalIF":5.1000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological review","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/rev0000414","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/5/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Two of the main impediments to learning complex tasks are that relationships between different stimuli, including rewards, can be uncertain and context-dependent. Reinforcement learning (RL) provides a framework for learning, by predicting total future reward directly (model-free RL), or via predictions of future states (model-based RL). Within this framework, "successor representation" (SR) predicts total future occupancy of all states. A recent theoretical proposal suggests that the hippocampus encodes the SR in order to facilitate prediction of future reward. However, this proposal does not take into account how learning should adapt under uncertainty and switches of context. Here, we introduce a theory of learning SRs using prediction errors which includes optimally balancing uncertainty in new observations versus existing knowledge. We then generalize that approach to a multicontext setting, allowing the model to learn and maintain multiple task-specific SRs and infer which one to use at any moment based on the accuracy of its predictions. Thus, the context used for predictions can be determined by both the contents of the states themselves and the distribution of transitions between them. This probabilistic SR model captures animal behavior in tasks which require contextual memory and generalization, and unifies previous SR theory with hippocampal-dependent contextual decision-making. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
学习复杂任务的两个主要障碍是,不同刺激(包括奖励)之间的关系可能是不确定的,并且与情境有关。强化学习(RL)通过直接预测未来总奖励(无模型强化学习)或通过预测未来状态(基于模型强化学习)为学习提供了一个框架。在这一框架内,"后继表征"(SR)预测所有状态的未来总占用率。最近的一项理论建议认为,海马编码 SR 是为了促进对未来奖励的预测。然而,这一建议并没有考虑到学习应如何适应不确定性和情境的转换。在这里,我们将介绍一种利用预测误差学习 SR 的理论,其中包括在新的观察结果与现有知识之间实现不确定性的最佳平衡。然后,我们将这种方法推广到多情境设置中,允许模型学习和维护多个特定任务的 SR,并根据预测的准确性推断在任何时刻使用哪一个。因此,预测所使用的情境可以由状态本身的内容和状态之间的转换分布来决定。这种概率SR模型捕捉到了需要情境记忆和概括的任务中的动物行为,并将以前的SR理论与依赖海马的情境决策统一起来。(PsycInfo Database Record (c) 2024 APA, 版权所有)。
期刊介绍:
Psychological Review publishes articles that make important theoretical contributions to any area of scientific psychology, including systematic evaluation of alternative theories.