黑猩猩(Pan troglodytes)的无状态转换强化学习。

IF 1.9 4区 心理学 Q3 BEHAVIORAL SCIENCES Learning & Behavior Pub Date : 2023-12-01 Epub Date: 2023-06-27 DOI:10.3758/s13420-023-00591-3
Yutaro Sato, Yutaka Sakai, Satoshi Hirata
{"title":"黑猩猩(Pan troglodytes)的无状态转换强化学习。","authors":"Yutaro Sato, Yutaka Sakai, Satoshi Hirata","doi":"10.3758/s13420-023-00591-3","DOIUrl":null,"url":null,"abstract":"<p><p>The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.</p>","PeriodicalId":49914,"journal":{"name":"Learning & Behavior","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).\",\"authors\":\"Yutaro Sato, Yutaka Sakai, Satoshi Hirata\",\"doi\":\"10.3758/s13420-023-00591-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.</p>\",\"PeriodicalId\":49914,\"journal\":{\"name\":\"Learning & Behavior\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Learning & Behavior\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13420-023-00591-3\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/6/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BEHAVIORAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Learning & Behavior","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13420-023-00591-3","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BEHAVIORAL SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

行动的结果往往在延迟后出现。从延迟结果中学习适当行动的一种解决方案是依靠状态转换链。另一种不依赖于状态转换的解决方案是使用资格追踪(ET),通过瞬时记忆将当前结果与过去的多个行动直接连接起来。先前的研究表明,人类(智人)在一项行为任务中学习到了适当的行动,在这项任务中,基于 ET 的解决方案是有效的,而基于过渡的解决方案则无效。这表明,ET 可用于人类的学习系统。然而,目前还没有研究对非人类动物进行过类似的行为任务研究。根据之前的一项人类研究,我们为非人类动物设计了一项任务。在每次试验中,参与者从三种刺激类型中随机选择两种刺激中的一种,这三种刺激类型分别是:与立即提供的食物奖励相关的刺激、与数次试验后提供的奖励相关的刺激以及与无奖励相关的刺激。所呈现的刺激不会因参与者的选择而改变。为了使总奖励最大化,参与者必须学习与延迟奖励相关的刺激物的价值。五只黑猩猩(Pan troglodytes)使用触摸屏完成了这项任务。两只黑猩猩能够成功学习,这表明学习过程中涉及了不依赖于状态转换的学习机制。本研究提出了一项行为任务,并提供了黑猩猩的实证数据,从而扩展了之前的 ET 研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).

The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants' choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Learning & Behavior
Learning & Behavior 医学-动物学
CiteScore
2.90
自引率
5.60%
发文量
50
审稿时长
>12 weeks
期刊介绍: Learning & Behavior publishes experimental and theoretical contributions and critical reviews concerning fundamental processes of learning and behavior in nonhuman and human animals. Topics covered include sensation, perception, conditioning, learning, attention, memory, motivation, emotion, development, social behavior, and comparative investigations.
期刊最新文献
Variation in animal architecture: Genes, environment, and culture. Implicit knowledge of words in dogs. Are crows smart? Let them count the ways. Do elephants really never forget? What we know about elephant memory and a call for further investigation. Route learning and transport of resources during colony relocation in Australian desert ants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1