Response-based approachability with applications to generalized no-regret problems

A. Bernstein, N. Shimkin
{"title":"Response-based approachability with applications to generalized no-regret problems","authors":"A. Bernstein, N. Shimkin","doi":"10.5555/2789272.2831138","DOIUrl":null,"url":null,"abstract":"Blackwell's theory of approachability provides fundamental results for repeated games with vector-valued payoffs, which have been usefully applied in the theory of learning in games, and in devising online learning algorithms in the adversarial setup. A target set S is approachable by a player (the agent) in such a game if he can ensure that the average payoff vector converges to S, no matter what the opponent does. Blackwell provided two equivalent conditions for a convex set to be approachable. Standard approachability algorithms rely on the primal condition, which is a geometric separation condition, and essentially require to compute at each stage a projection direction from a certain point to S. Here we introduce an approachability algorithm that relies on Blackwell's dual condition, which requires the agent to have a feasible response to each mixed action of the opponent, namely a mixed action such that the expected payoff vector belongs to S. Thus, rather than projections, the proposed algorithm relies on computing the response to a certain action of the opponent at each stage. We demonstrate the utility of the proposed approach by applying it to certain generalizations of the classical regret minimization problem, which incorporate side constraints, reward-to-cost criteria, and so-called global cost functions. In these extensions, computation of the projection is generally complex while the response is readily obtainable.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"198 1","pages":"747-773"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2789272.2831138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Blackwell's theory of approachability provides fundamental results for repeated games with vector-valued payoffs, which have been usefully applied in the theory of learning in games, and in devising online learning algorithms in the adversarial setup. A target set S is approachable by a player (the agent) in such a game if he can ensure that the average payoff vector converges to S, no matter what the opponent does. Blackwell provided two equivalent conditions for a convex set to be approachable. Standard approachability algorithms rely on the primal condition, which is a geometric separation condition, and essentially require to compute at each stage a projection direction from a certain point to S. Here we introduce an approachability algorithm that relies on Blackwell's dual condition, which requires the agent to have a feasible response to each mixed action of the opponent, namely a mixed action such that the expected payoff vector belongs to S. Thus, rather than projections, the proposed algorithm relies on computing the response to a certain action of the opponent at each stage. We demonstrate the utility of the proposed approach by applying it to certain generalizations of the classical regret minimization problem, which incorporate side constraints, reward-to-cost criteria, and so-called global cost functions. In these extensions, computation of the projection is generally complex while the response is readily obtainable.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
应用程序的基于响应的可接近性,以解决一般化的无遗憾问题
Blackwell的可接近性理论为具有向量值回报的重复博弈提供了基本结果,这些结果已被有效地应用于博弈学习理论,以及在对抗性设置中设计在线学习算法。在这样的博弈中,如果玩家(代理)能够确保平均收益向量收敛于S,那么无论对手做什么,他都可以接近目标集S。Blackwell给出了凸集可逼近的两个等价条件。标准的可接近性算法依赖于原始条件,即几何分离条件,本质上要求在每个阶段计算从某一点到s的投影方向。在这里,我们引入一种基于Blackwell对偶条件的可接近性算法,该算法要求agent对对手的每个混合动作都有可行的响应,即期望收益向量属于s的混合动作。该算法依赖于计算每个阶段对对手某一动作的响应。我们通过将所提出的方法应用于经典后悔最小化问题的某些推广来证明其实用性,该问题包含了侧约束、奖励-成本标准和所谓的全局成本函数。在这些扩展中,投影的计算通常很复杂,而响应很容易得到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Scalable Computation of Causal Bounds A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Adaptive False Discovery Rate Control with Privacy Guarantee Fairlearn: Assessing and Improving Fairness of AI Systems Generalization Bounds for Adversarial Contrastive Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1