Adversarial Group Linear Bandits and Its Application to Collaborative Edge Inference

IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Pub Date : 2023-05-17 DOI:10.1109/INFOCOM53939.2023.10228900

Yin-Hae Huang, Letian Zhang, J. Xu

{"title":"Adversarial Group Linear Bandits and Its Application to Collaborative Edge Inference","authors":"Yin-Hae Huang, Letian Zhang, J. Xu","doi":"10.1109/INFOCOM53939.2023.10228900","DOIUrl":null,"url":null,"abstract":"Multi-armed bandits is a classical sequential decision-making under uncertainty problem. The majority of existing works study bandits problems in either the stochastic reward regime or the adversarial reward regime, but the intersection of these two regimes is much less investigated. In this paper, we study a new bandits problem, called adversarial group linear bandits (AGLB), that features reward generation as a joint outcome of both the stochastic process and the adversarial behavior. In particular, the reward that the learner receives is not only a noisy linear function of the arm that the learner selects within a group but also depends on the group-level attack decision by the adversary. Such problems are present in many real-world applications, e.g., collaborative edge inference and multi-site online ad placement. To combat the uncertainty in the coupled stochastic and adversarial rewards, we develop a new bandits algorithm, called EXPUCB, which marries the classical LinUCB and EXP3 algorithms, and prove its sublinear regret. We apply EXPUCB to the collaborative edge inference problem and evaluate its performance. Extensive simulation results verify the superior learning ability of EXPUCB under coupled stochastic and adversarial rewards.","PeriodicalId":387707,"journal":{"name":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM53939.2023.10228900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Multi-armed bandits is a classical sequential decision-making under uncertainty problem. The majority of existing works study bandits problems in either the stochastic reward regime or the adversarial reward regime, but the intersection of these two regimes is much less investigated. In this paper, we study a new bandits problem, called adversarial group linear bandits (AGLB), that features reward generation as a joint outcome of both the stochastic process and the adversarial behavior. In particular, the reward that the learner receives is not only a noisy linear function of the arm that the learner selects within a group but also depends on the group-level attack decision by the adversary. Such problems are present in many real-world applications, e.g., collaborative edge inference and multi-site online ad placement. To combat the uncertainty in the coupled stochastic and adversarial rewards, we develop a new bandits algorithm, called EXPUCB, which marries the classical LinUCB and EXP3 algorithms, and prove its sublinear regret. We apply EXPUCB to the collaborative edge inference problem and evaluate its performance. Extensive simulation results verify the superior learning ability of EXPUCB under coupled stochastic and adversarial rewards.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对抗性群体线性强盗及其在协同边缘推理中的应用

多武装盗匪是典型的不确定问题下的顺序决策。现有的大多数研究都是在随机奖励制度或对抗奖励制度下研究强盗问题，但对这两种制度的交集的研究却很少。在本文中，我们研究了一种新的强盗问题，称为对抗群体线性强盗(AGLB)，其特征是奖励生成是随机过程和对抗行为的共同结果。特别是，学习者获得的奖励不仅是学习者在群体中选择的手臂的噪声线性函数，而且还取决于对手的群体级攻击决策。这样的问题存在于许多现实世界的应用中，例如，协作边缘推理和多站点在线广告放置。为了克服随机和对抗性奖励耦合中的不确定性，我们开发了一种新的强盗算法，称为EXPUCB，它结合了经典的LinUCB和EXP3算法，并证明了它的次线性后悔。我们将EXPUCB应用于协同边缘推理问题，并对其性能进行了评价。大量的仿真结果验证了EXPUCB在随机和对抗耦合奖励下的优越学习能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE INFOCOM 2023 - IEEE Conference on Computer Communications

自引率

0.00%

发文量