Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Artificial Intelligence Pub Date : 2022-11-01 DOI:10.1016/j.artint.2022.103770
Rex G. Liu, Michael J. Frank
{"title":"Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning","authors":"Rex G. Liu,&nbsp;Michael J. Frank","doi":"10.1016/j.artint.2022.103770","DOIUrl":null,"url":null,"abstract":"<div><p>A hallmark of human intelligence, but challenging for reinforcement learning<span> (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model<span><span> of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet<span> process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this </span></span>clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain.</span></span></p></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"312 ","pages":"Article 103770"},"PeriodicalIF":5.1000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370222001102","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

A hallmark of human intelligence, but challenging for reinforcement learning (RL) agents, is the ability to compositionally generalise, that is, to recompose familiar knowledge components in novel ways to solve new problems. For instance, when navigating in a city, one needs to know the location of the destination and how to operate a vehicle to get there, whether it be pedalling a bike or operating a car. In RL, these correspond to the reward function and transition function, respectively. To compositionally generalize, these two components need to be transferable independently of each other: multiple modes of transport can reach the same goal, and any given mode can be used to reach multiple destinations. Yet there are also instances where it can be helpful to learn and transfer entire structures, jointly representing goals and transitions, particularly whenever these recur in natural tasks (e.g., given a suggestion to get ice cream, one might prefer to bike, even in new towns). Prior theoretical work has explored how, in model-based RL, agents can learn and generalize task components (transition and reward functions). But a satisfactory account for how a single agent can simultaneously satisfy the two competing demands is still lacking. Here, we propose a hierarchical RL agent that learns and transfers individual task components as well as entire structures (particular compositions of components) by inferring both through a non-parametric Bayesian model of the task. It maintains a factorised representation of task components through a hierarchical Dirichlet process, but it also represents different possible covariances between these components through a standard Dirichlet process. We validate our approach on a variety of navigation tasks covering a wide range of statistical correlations between task components and show that it can also improve generalisation and transfer in more complex, hierarchical tasks with goal/subgoal structures. Finally, we end with a discussion of our work including how this clustering algorithm could conceivably be implemented by cortico-striatal gating circuits in the brain.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分层聚类优化了灵活强化学习任务结构的组合性和表达性之间的权衡
人类智能的一个标志,但对强化学习(RL)代理具有挑战性,是组合泛化的能力,即以新颖的方式重组熟悉的知识组件以解决新问题。例如,在城市中导航时,人们需要知道目的地的位置以及如何操作车辆到达目的地,无论是骑自行车还是驾驶汽车。在强化学习中,这些分别对应于奖励函数和转移函数。从组合上概括,这两个组成部分需要相互独立地可转移:多种运输方式可以到达同一个目标,任何给定的运输方式都可以到达多个目的地。然而,也有一些例子可以帮助学习和转移整个结构,共同代表目标和过渡,特别是当这些在自然任务中反复出现时(例如,给一个建议去买冰淇淋,一个人可能更喜欢骑自行车,即使在新城镇)。先前的理论工作已经探索了在基于模型的强化学习中,智能体如何学习和概括任务组件(过渡和奖励函数)。但是,对于单个代理人如何同时满足两种相互竞争的需求,仍然缺乏一个令人满意的解释。在这里,我们提出了一个分层强化学习代理,它通过任务的非参数贝叶斯模型来推断,学习和转移单个任务组件以及整个结构(组件的特定组成)。它通过分层狄利克雷过程维护任务组件的分解表示,但它也通过标准狄利克雷过程表示这些组件之间不同的可能协方差。我们在涵盖任务组件之间广泛的统计相关性的各种导航任务上验证了我们的方法,并表明它还可以改善更复杂的、具有目标/子目标结构的分层任务的泛化和迁移。最后,我们以讨论我们的工作结束,包括如何通过大脑中的皮质纹状体门控回路实现这种聚类算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial Intelligence
Artificial Intelligence 工程技术-计算机:人工智能
CiteScore
11.20
自引率
1.40%
发文量
118
审稿时长
8 months
期刊介绍: The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.
期刊最新文献
Lifted action models learning from partial traces Human-AI coevolution Editorial Board Separate but equal: Equality in belief propagation for single-cycle graphs Generative models for grid-based and image-based pathfinding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1