遗传算法如何通过选择劣势策略来获得更大的收益来学习旅行者困境

M. Pace
{"title":"遗传算法如何通过选择劣势策略来获得更大的收益来学习旅行者困境","authors":"M. Pace","doi":"10.1109/CIG.2009.5286474","DOIUrl":null,"url":null,"abstract":"In game theory, the Traveler's Dilemma (abbreviated TD) is a non-zero-sum 1 game in which two players attempt to maximize their own payoff without deliberately willing to damage the opponent. In the classical formulation of this problem, game theory predicts that, if both players are purely rational, they will always choose the strategy corresponding to the Nash equilibrium for the game. However, when played experimentally, most human players select much higher values (usually close to $100), deviating strongly from the Nash equilibrium and obtaining, on average, much higher rewards. In this paper we analyze the behaviour of a genetic algorithm that, by repeatedly playing the game, evolves the strategy in order to maximize the payoffs. In the algorithm, the population has no a priori knowledge about the game. The fitness function rewards the individuals who obtain high payoffs at the end of each game session. We demonstrate that, when it is possible to assign to each strategy a probability measure, then the search for good strategies can be effectively translated into a problem of search in a measure space using, for example, genetic algorithms. Furthermore, the codification of the genome as a probability distribution allows the analysis of common crossover and mutation operators in the uncommon case where the genome is a probability measure.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"How a genetic algorithm learns to play Traveler's Dilemma by choosing dominated strategies to achieve greater payoffs\",\"authors\":\"M. Pace\",\"doi\":\"10.1109/CIG.2009.5286474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In game theory, the Traveler's Dilemma (abbreviated TD) is a non-zero-sum 1 game in which two players attempt to maximize their own payoff without deliberately willing to damage the opponent. In the classical formulation of this problem, game theory predicts that, if both players are purely rational, they will always choose the strategy corresponding to the Nash equilibrium for the game. However, when played experimentally, most human players select much higher values (usually close to $100), deviating strongly from the Nash equilibrium and obtaining, on average, much higher rewards. In this paper we analyze the behaviour of a genetic algorithm that, by repeatedly playing the game, evolves the strategy in order to maximize the payoffs. In the algorithm, the population has no a priori knowledge about the game. The fitness function rewards the individuals who obtain high payoffs at the end of each game session. We demonstrate that, when it is possible to assign to each strategy a probability measure, then the search for good strategies can be effectively translated into a problem of search in a measure space using, for example, genetic algorithms. Furthermore, the codification of the genome as a probability distribution allows the analysis of common crossover and mutation operators in the uncommon case where the genome is a probability measure.\",\"PeriodicalId\":358795,\"journal\":{\"name\":\"2009 IEEE Symposium on Computational Intelligence and Games\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Symposium on Computational Intelligence and Games\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2009.5286474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Symposium on Computational Intelligence and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2009.5286474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

在博弈论中,旅行者困境(简称TD)是一种非零和博弈,在这种博弈中,两名参与者试图在不故意伤害对手的情况下最大化自己的收益。在这个问题的经典表述中,博弈论预测,如果双方都是纯理性的,他们总是会选择与博弈的纳什均衡相对应的策略。然而,当进行实验时,大多数人类玩家会选择更高的价值(通常接近100美元),这大大偏离了纳什均衡,并获得了更高的奖励。在本文中,我们分析了遗传算法的行为,通过反复玩游戏,进化策略以最大化收益。在算法中,总体对游戏没有先验知识。适应度函数奖励在每个游戏回合结束时获得高收益的个体。我们证明,当可以为每个策略分配一个概率度量时,那么搜索好的策略可以有效地转化为使用例如遗传算法在度量空间中的搜索问题。此外,基因组作为概率分布的编码允许在基因组是概率度量的不常见情况下分析常见的交叉和突变操作符。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How a genetic algorithm learns to play Traveler's Dilemma by choosing dominated strategies to achieve greater payoffs
In game theory, the Traveler's Dilemma (abbreviated TD) is a non-zero-sum 1 game in which two players attempt to maximize their own payoff without deliberately willing to damage the opponent. In the classical formulation of this problem, game theory predicts that, if both players are purely rational, they will always choose the strategy corresponding to the Nash equilibrium for the game. However, when played experimentally, most human players select much higher values (usually close to $100), deviating strongly from the Nash equilibrium and obtaining, on average, much higher rewards. In this paper we analyze the behaviour of a genetic algorithm that, by repeatedly playing the game, evolves the strategy in order to maximize the payoffs. In the algorithm, the population has no a priori knowledge about the game. The fitness function rewards the individuals who obtain high payoffs at the end of each game session. We demonstrate that, when it is possible to assign to each strategy a probability measure, then the search for good strategies can be effectively translated into a problem of search in a measure space using, for example, genetic algorithms. Furthermore, the codification of the genome as a probability distribution allows the analysis of common crossover and mutation operators in the uncommon case where the genome is a probability measure.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal strategy selection of non-player character on real time strategy game using a speciated evolutionary algorithm Formal analysis and algorithms for extracting coordinate systems of games Evolving driving controllers using Genetic Programming CHANCEPROBCUT: Forward pruning in chance nodes Evolving coordinated spatial tactics for autonomous entities using influence maps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1