Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation.

Brain and neuroscience advances Pub Date : 2021-04-09 eCollection Date: 2021-01-01 DOI:10.1177/2398212820975634
Charline Tessereau, Reuben O'Dea, Stephen Coombes, Tobias Bast
{"title":"Reinforcement learning approaches to hippocampus-dependent flexible spatial navigation.","authors":"Charline Tessereau, Reuben O'Dea, Stephen Coombes, Tobias Bast","doi":"10.1177/2398212820975634","DOIUrl":null,"url":null,"abstract":"<p><p>Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor-critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor-critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor-critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor-critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.</p>","PeriodicalId":72444,"journal":{"name":"Brain and neuroscience advances","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/2398212820975634","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain and neuroscience advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/2398212820975634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Humans and non-human animals show great flexibility in spatial navigation, including the ability to return to specific locations based on as few as one single experience. To study spatial navigation in the laboratory, watermaze tasks, in which rats have to find a hidden platform in a pool of cloudy water surrounded by spatial cues, have long been used. Analogous tasks have been developed for human participants using virtual environments. Spatial learning in the watermaze is facilitated by the hippocampus. In particular, rapid, one-trial, allocentric place learning, as measured in the delayed-matching-to-place variant of the watermaze task, which requires rodents to learn repeatedly new locations in a familiar environment, is hippocampal dependent. In this article, we review some computational principles, embedded within a reinforcement learning framework, that utilise hippocampal spatial representations for navigation in watermaze tasks. We consider which key elements underlie their efficacy, and discuss their limitations in accounting for hippocampus-dependent navigation, both in terms of behavioural performance (i.e. how well do they reproduce behavioural measures of rapid place learning) and neurobiological realism (i.e. how well do they map to neurobiological substrates involved in rapid place learning). We discuss how an actor-critic architecture, enabling simultaneous assessment of the value of the current location and of the optimal direction to follow, can reproduce one-trial place learning performance as shown on watermaze and virtual delayed-matching-to-place tasks by rats and humans, respectively, if complemented with map-like place representations. The contribution of actor-critic mechanisms to delayed-matching-to-place performance is consistent with neurobiological findings implicating the striatum and hippocampo-striatal interaction in delayed-matching-to-place performance, given that the striatum has been associated with actor-critic mechanisms. Moreover, we illustrate that hierarchical computations embedded within an actor-critic architecture may help to account for aspects of flexible spatial navigation. The hierarchical reinforcement learning approach separates trajectory control via a temporal-difference error from goal selection via a goal prediction error and may account for flexible, trial-specific, navigation to familiar goal locations, as required in some arm-maze place memory tasks, although it does not capture one-trial learning of new goal locations, as observed in open field, including watermaze and virtual, delayed-matching-to-place tasks. Future models of one-shot learning of new goal locations, as observed on delayed-matching-to-place tasks, should incorporate hippocampal plasticity mechanisms that integrate new goal information with allocentric place representation, as such mechanisms are supported by substantial empirical evidence.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
海马体依赖灵活空间导航的强化学习方法。
人类和非人类动物在空间导航方面表现出极大的灵活性,包括根据一次经验返回特定位置的能力。为了在实验室中研究空间导航,水迷宫任务早就被使用了。在水迷宫任务中,老鼠必须在被空间线索包围的浑浊水中找到一个隐藏的平台。已经为使用虚拟环境的人类参与者开发了类似的任务。水迷宫中的空间学习是由海马体促进的。特别是,在水迷宫任务的延迟匹配-位置变体中测量的快速,一次试验,异中心位置学习,需要啮齿动物在熟悉的环境中反复学习新的位置,是海马体依赖的。在本文中,我们回顾了一些嵌入在强化学习框架中的计算原理,这些原理利用海马体空间表征在水迷宫任务中进行导航。我们考虑了它们的功效背后的关键因素,并讨论了它们在解释海马体依赖导航方面的局限性,包括行为表现(即它们在多大程度上再现了快速地点学习的行为测量)和神经生物学现实性(即它们在多大程度上映射了涉及快速地点学习的神经生物学基础)。我们讨论了行动者-评论家架构如何能够同时评估当前位置和最佳方向的价值,如果辅以类似地图的位置表示,则可以分别再现大鼠和人类在水迷宫和虚拟延迟匹配到位置任务中所显示的单次位置学习性能。鉴于纹状体与行为者批评机制有关,行为者批评机制对延迟匹配到位置表现的贡献与神经生物学发现一致,纹状体和海马纹状体相互作用在延迟匹配到位置表现中。此外,我们还说明了嵌入在参与者-评论家架构中的分层计算可能有助于解释灵活空间导航的各个方面。分层强化学习方法通过时间差异误差将轨迹控制与通过目标预测误差的目标选择分离开来,并可能解释一些手臂迷宫位置记忆任务所需的灵活的,特定于试验的导航到熟悉的目标位置,尽管它不能捕获新目标位置的单次学习,如在开放领域中观察到的,包括水迷宫和虚拟的,延迟匹配到位置的任务。未来的一次性学习新目标位置的模型,如在延迟位置匹配任务中观察到的,应该纳入海马体可塑性机制,将新目标信息与异中心位置表征结合起来,因为这种机制得到了大量经验证据的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
8 weeks
期刊最新文献
Not all plaques are created equal: Uncovering a unique molecular signature in Alzheimer's disease. Children aged 5-13 years show adult-like disgust avoidance, but not proto-nausea. Brain mechanisms of temporal processing in impulsivity: Relevance to attention-deficit hyperactivity disorder. Does theta synchronicity of sensory information enhance associative memory? Replicating the theta-induced memory effect. A multimodal approach connecting cortical and behavioural responses to the visual continuity illusion.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1