Response-based approachability with applications to generalized no-regret problems

J. Mach. Learn. Res. Pub Date : 2015-01-01 DOI:10.5555/2789272.2831138

A. Bernstein, N. Shimkin

{"title":"Response-based approachability with applications to generalized no-regret problems","authors":"A. Bernstein, N. Shimkin","doi":"10.5555/2789272.2831138","DOIUrl":null,"url":null,"abstract":"Blackwell's theory of approachability provides fundamental results for repeated games with vector-valued payoffs, which have been usefully applied in the theory of learning in games, and in devising online learning algorithms in the adversarial setup. A target set S is approachable by a player (the agent) in such a game if he can ensure that the average payoff vector converges to S, no matter what the opponent does. Blackwell provided two equivalent conditions for a convex set to be approachable. Standard approachability algorithms rely on the primal condition, which is a geometric separation condition, and essentially require to compute at each stage a projection direction from a certain point to S. Here we introduce an approachability algorithm that relies on Blackwell's dual condition, which requires the agent to have a feasible response to each mixed action of the opponent, namely a mixed action such that the expected payoff vector belongs to S. Thus, rather than projections, the proposed algorithm relies on computing the response to a certain action of the opponent at each stage. We demonstrate the utility of the proposed approach by applying it to certain generalizations of the classical regret minimization problem, which incorporate side constraints, reward-to-cost criteria, and so-called global cost functions. In these extensions, computation of the projection is generally complex while the response is readily obtainable.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"198 1","pages":"747-773"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2789272.2831138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Blackwell's theory of approachability provides fundamental results for repeated games with vector-valued payoffs, which have been usefully applied in the theory of learning in games, and in devising online learning algorithms in the adversarial setup. A target set S is approachable by a player (the agent) in such a game if he can ensure that the average payoff vector converges to S, no matter what the opponent does. Blackwell provided two equivalent conditions for a convex set to be approachable. Standard approachability algorithms rely on the primal condition, which is a geometric separation condition, and essentially require to compute at each stage a projection direction from a certain point to S. Here we introduce an approachability algorithm that relies on Blackwell's dual condition, which requires the agent to have a feasible response to each mixed action of the opponent, namely a mixed action such that the expected payoff vector belongs to S. Thus, rather than projections, the proposed algorithm relies on computing the response to a certain action of the opponent at each stage. We demonstrate the utility of the proposed approach by applying it to certain generalizations of the classical regret minimization problem, which incorporate side constraints, reward-to-cost criteria, and so-called global cost functions. In these extensions, computation of the projection is generally complex while the response is readily obtainable.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

应用程序的基于响应的可接近性，以解决一般化的无遗憾问题

Blackwell的可接近性理论为具有向量值回报的重复博弈提供了基本结果，这些结果已被有效地应用于博弈学习理论，以及在对抗性设置中设计在线学习算法。在这样的博弈中，如果玩家(代理)能够确保平均收益向量收敛于S，那么无论对手做什么，他都可以接近目标集S。Blackwell给出了凸集可逼近的两个等价条件。标准的可接近性算法依赖于原始条件，即几何分离条件，本质上要求在每个阶段计算从某一点到s的投影方向。在这里，我们引入一种基于Blackwell对偶条件的可接近性算法，该算法要求agent对对手的每个混合动作都有可行的响应，即期望收益向量属于s的混合动作。该算法依赖于计算每个阶段对对手某一动作的响应。我们通过将所提出的方法应用于经典后悔最小化问题的某些推广来证明其实用性，该问题包含了侧约束、奖励-成本标准和所谓的全局成本函数。在这些扩展中，投影的计算通常很复杂，而响应很容易得到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量

期刊最新文献

Scalable Computation of Causal Bounds A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Adaptive False Discovery Rate Control with Privacy Guarantee Fairlearn: Assessing and Improving Fairness of AI Systems Generalization Bounds for Adversarial Contrastive Learning