Rationality of Learning Algorithms in Repeated Normal-Form Games

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS IEEE Control Systems Letters Pub Date : 2024-10-25 DOI:10.1109/LCSYS.2024.3486631
Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta
{"title":"Rationality of Learning Algorithms in Repeated Normal-Form Games","authors":"Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta","doi":"10.1109/LCSYS.2024.3486631","DOIUrl":null,"url":null,"abstract":"Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n. We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10735356/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant ${\mathrm { c}}\geq 1$ . We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given ${\mathrm { c}}\geq 1$ and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
重复正态博弈中学习算法的合理性
众所周知,对于特定类别的博弈,如果所有代理人都采用相同的学习算法,许多学习算法都会趋于均衡。然而,当博弈主体是自利的,一个自然的问题就是博弈主体是否有动机单方面转向另一种学习算法。我们用算法的合理性比率来表示这种动机,即代理人通过单方面偏离学习算法所能获得的最高报酬与遵循该算法所能获得的报酬之比。我们将一种学习算法定义为 c-理性算法,如果它的理性比率至多为 c,则无论博弈情况如何。我们证明,对于任意常数 ${mathrm { c}}\geq 1$ 而言,流行的学习算法(如虚构博弈和后悔匹配)都不是 c-理性的。我们还证明,如果一个代理只能观察到其他代理的行动而不能观察到他们的回报,那么就存在不存在 c-理性算法的博弈。然后,我们提出了一个可以建立在任何现有学习算法基础上的框架,并在温和的假设条件下确定了我们提出的算法:(i) 对于给定的 ${\mathrm { c}\geq 1$ 是 c-合理的;(ii) 如果所有代理人都遵循它,那么代理人的策略就会高概率地收敛到均衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Control Systems Letters
IEEE Control Systems Letters Mathematics-Control and Optimization
CiteScore
4.40
自引率
13.30%
发文量
471
期刊最新文献
Rationality of Learning Algorithms in Repeated Normal-Form Games Impact of Opinion on Disease Transmission With Waterborne Pathogen and Stubborn Community Numerical and Lyapunov-Based Investigation of the Effect of Stenosis on Blood Transport Stability Using a Control-Theoretic PDE Model of Cardiovascular Flow Almost Sure Convergence and Non-Asymptotic Concentration Bounds for Stochastic Mirror Descent Algorithm Opinion Dynamics With Set-Based Confidence: Convergence Criteria and Periodic Solutions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1