An algorithmic account for how humans efficiently learn, transfer, and compose hierarchically structured decision policies

IF 2.8 1区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Cognition Pub Date : 2024-10-04 DOI:10.1016/j.cognition.2024.105967

Jing-Jing Li , Anne G.E. Collins

{"title":"An algorithmic account for how humans efficiently learn, transfer, and compose hierarchically structured decision policies","authors":"Jing-Jing Li , Anne G.E. Collins","doi":"10.1016/j.cognition.2024.105967","DOIUrl":null,"url":null,"abstract":"<div><div>Learning structures that effectively abstract decision policies is key to the flexibility of human intelligence. Previous work has shown that humans use hierarchically structured policies to efficiently navigate complex and dynamic environments. However, the computational processes that support the learning and construction of such policies remain insufficiently understood. To address this question, we tested 1026 human participants, who made over 1 million choices combined, in a decision-making task where they could learn, transfer, and recompose multiple sets of hierarchical policies. We propose a novel algorithmic account for the learning processes underlying observed human behavior. We show that humans rely on compressed policies over states in early learning, which gradually unfold into hierarchical representations via meta-learning and Bayesian inference. Our modeling evidence suggests that these hierarchical policies are structured in a temporally backward, rather than forward, fashion. Taken together, these algorithmic architectures characterize how the interplay between reinforcement learning, policy compression, meta-learning, and working memory supports structured decision-making and compositionality in a resource-rational way.</div></div>","PeriodicalId":48455,"journal":{"name":"Cognition","volume":"254 ","pages":"Article 105967"},"PeriodicalIF":2.8000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognition","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010027724002531","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Learning structures that effectively abstract decision policies is key to the flexibility of human intelligence. Previous work has shown that humans use hierarchically structured policies to efficiently navigate complex and dynamic environments. However, the computational processes that support the learning and construction of such policies remain insufficiently understood. To address this question, we tested 1026 human participants, who made over 1 million choices combined, in a decision-making task where they could learn, transfer, and recompose multiple sets of hierarchical policies. We propose a novel algorithmic account for the learning processes underlying observed human behavior. We show that humans rely on compressed policies over states in early learning, which gradually unfold into hierarchical representations via meta-learning and Bayesian inference. Our modeling evidence suggests that these hierarchical policies are structured in a temporally backward, rather than forward, fashion. Taken together, these algorithmic architectures characterize how the interplay between reinforcement learning, policy compression, meta-learning, and working memory supports structured decision-making and compositionality in a resource-rational way.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从算法角度解释人类如何高效地学习、转移和组成分层结构的决策政策。

学习有效抽象决策政策的结构是人类智能灵活性的关键。以往的研究表明，人类使用分层结构的策略来高效地驾驭复杂多变的环境。然而，人们对支持学习和构建此类政策的计算过程仍然了解不足。为了解决这个问题，我们在一项决策任务中对 1026 名人类参与者进行了测试，他们总共做出了 100 多万个选择，在这项任务中，他们可以学习、转移和重新组合多套分层策略。我们为观察到的人类行为背后的学习过程提出了一种新的算法解释。我们的研究表明，人类在早期学习中依赖于对状态的压缩策略，这些策略通过元学习和贝叶斯推理逐渐扩展为分层表征。我们的建模证据表明，这些分层策略是以时间上向后而非向前的方式构建的。综合来看，这些算法架构描述了强化学习、策略压缩、元学习和工作记忆之间的相互作用如何以资源合理的方式支持结构化决策和组合性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cognition PSYCHOLOGY, EXPERIMENTAL-

CiteScore

6.40

自引率

5.90%

发文量

283

期刊介绍： Cognition is an international journal that publishes theoretical and experimental papers on the study of the mind. It covers a wide variety of subjects concerning all the different aspects of cognition, ranging from biological and experimental studies to formal analysis. Contributions from the fields of psychology, neuroscience, linguistics, computer science, mathematics, ethology and philosophy are welcome in this journal provided that they have some bearing on the functioning of the mind. In addition, the journal serves as a forum for discussion of social and political aspects of cognitive science.

期刊最新文献

The impact of rhythm on visual attention disengagement in newborns and 2-month-old infants Exploring power-law behavior in human gaze shifts across tasks and populations Editorial Board Altercentric bias in preverbal infants' encoding of object kind Rational choices elicit stronger sense of agency in brain and behavior