石头剪刀布游戏中猴子的学习和决策

Daeyeol Lee, Benjamin P. McGreevy, Dominic J. Barraclough
{"title":"石头剪刀布游戏中猴子的学习和决策","authors":"Daeyeol Lee,&nbsp;Benjamin P. McGreevy,&nbsp;Dominic J. Barraclough","doi":"10.1016/j.cogbrainres.2005.07.003","DOIUrl":null,"url":null,"abstract":"<div><p>Game theory provides a solution to the problem of finding a set of optimal decision-making strategies in a group. However, people seldom play such optimal strategies and adjust their strategies based on their experience. Accordingly, many theories postulate a set of variables related to the probabilities of choosing various strategies and describe how such variables are dynamically updated. In reinforcement learning, these value functions are updated based on the outcome of the player's choice, whereas belief learning allows the value functions of all available choices to be updated according to the choices of other players. We investigated the nature of learning process in monkeys playing a competitive game with ternary choices, using a rock–paper–scissors game. During the baseline condition in which the computer selected its targets randomly, each animal displayed biases towards some targets. When the computer exploited the pattern of animal's choice sequence but not its reward history, the animal's choice was still systematically biased by the previous choice of the computer. This bias was reduced when the computer exploited both the choice and reward histories of the animal. Compared to simple models of reinforcement learning or belief learning, these adaptive processes were better described by a model that incorporated the features of both models. These results suggest that stochastic decision-making strategies in primates during social interactions might be adjusted according to both actual and hypothetical payoffs.</p></div>","PeriodicalId":100287,"journal":{"name":"Cognitive Brain Research","volume":"25 2","pages":"Pages 416-430"},"PeriodicalIF":0.0000,"publicationDate":"2005-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.cogbrainres.2005.07.003","citationCount":"115","resultStr":"{\"title\":\"Learning and decision making in monkeys during a rock–paper–scissors game\",\"authors\":\"Daeyeol Lee,&nbsp;Benjamin P. McGreevy,&nbsp;Dominic J. Barraclough\",\"doi\":\"10.1016/j.cogbrainres.2005.07.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Game theory provides a solution to the problem of finding a set of optimal decision-making strategies in a group. However, people seldom play such optimal strategies and adjust their strategies based on their experience. Accordingly, many theories postulate a set of variables related to the probabilities of choosing various strategies and describe how such variables are dynamically updated. In reinforcement learning, these value functions are updated based on the outcome of the player's choice, whereas belief learning allows the value functions of all available choices to be updated according to the choices of other players. We investigated the nature of learning process in monkeys playing a competitive game with ternary choices, using a rock–paper–scissors game. During the baseline condition in which the computer selected its targets randomly, each animal displayed biases towards some targets. When the computer exploited the pattern of animal's choice sequence but not its reward history, the animal's choice was still systematically biased by the previous choice of the computer. This bias was reduced when the computer exploited both the choice and reward histories of the animal. Compared to simple models of reinforcement learning or belief learning, these adaptive processes were better described by a model that incorporated the features of both models. These results suggest that stochastic decision-making strategies in primates during social interactions might be adjusted according to both actual and hypothetical payoffs.</p></div>\",\"PeriodicalId\":100287,\"journal\":{\"name\":\"Cognitive Brain Research\",\"volume\":\"25 2\",\"pages\":\"Pages 416-430\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.cogbrainres.2005.07.003\",\"citationCount\":\"115\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Brain Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926641005001953\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Brain Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926641005001953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 115

摘要

博弈论为在群体中寻找一组最优决策策略的问题提供了一种解决方案。然而,人们很少采取这种最优策略,并根据自己的经验调整策略。因此,许多理论假设了一组与选择各种策略的概率相关的变量,并描述了这些变量是如何动态更新的。在强化学习中,这些价值函数是基于玩家选择的结果而更新的,而信念学习则允许所有可用选择的价值函数根据其他玩家的选择而更新。我们研究了猴子在玩一个有三重选择的竞争性游戏时的学习过程的本质,使用的是石头剪刀布游戏。在计算机随机选择目标的基线条件下,每只动物都表现出对某些目标的偏见。当计算机利用动物的选择序列模式而不是其奖励历史时,动物的选择仍然系统地受到计算机先前选择的偏见。当计算机利用动物的选择和奖励历史时,这种偏见就会减少。与简单的强化学习或信念学习模型相比,这些自适应过程可以通过结合这两种模型的特征的模型来更好地描述。这些结果表明,灵长类动物在社会互动中的随机决策策略可能会根据实际和假设的回报进行调整。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learning and decision making in monkeys during a rock–paper–scissors game

Game theory provides a solution to the problem of finding a set of optimal decision-making strategies in a group. However, people seldom play such optimal strategies and adjust their strategies based on their experience. Accordingly, many theories postulate a set of variables related to the probabilities of choosing various strategies and describe how such variables are dynamically updated. In reinforcement learning, these value functions are updated based on the outcome of the player's choice, whereas belief learning allows the value functions of all available choices to be updated according to the choices of other players. We investigated the nature of learning process in monkeys playing a competitive game with ternary choices, using a rock–paper–scissors game. During the baseline condition in which the computer selected its targets randomly, each animal displayed biases towards some targets. When the computer exploited the pattern of animal's choice sequence but not its reward history, the animal's choice was still systematically biased by the previous choice of the computer. This bias was reduced when the computer exploited both the choice and reward histories of the animal. Compared to simple models of reinforcement learning or belief learning, these adaptive processes were better described by a model that incorporated the features of both models. These results suggest that stochastic decision-making strategies in primates during social interactions might be adjusted according to both actual and hypothetical payoffs.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial Board Author Index Optic flow dominates visual scene polarity in causing adaptive modification of locomotor trajectory Partial unilateral inactivation of the dorsal hippocampus impairs spatial memory in the MWM Accessing world knowledge: Evidence from N400 and reaction time priming
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1