无向图上的鲁棒多智能体强盗

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2022-02-28 DOI:10.48550/arXiv.2203.00076

Daniel Vial, S. Shakkottai, R. Srikant

{"title":"无向图上的鲁棒多智能体强盗","authors":"Daniel Vial, S. Shakkottai, R. Srikant","doi":"10.48550/arXiv.2203.00076","DOIUrl":null,"url":null,"abstract":"We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Robust Multi-Agent Bandits Over Undirected Graphs\",\"authors\":\"Daniel Vial, S. Shakkottai, R. Srikant\",\"doi\":\"10.48550/arXiv.2203.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).\",\"PeriodicalId\":426760,\"journal\":{\"name\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2203.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2203.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

我们考虑一个多智能体多臂强盗设置，其中n个诚实的智能体在网络上合作以最小化遗憾，但m个恶意的智能体可以任意破坏学习。假设网络是完全图，在这种设置下，现有算法会产生O((m + K/n) łog (T) / Δ)的遗憾，其中K为臂的数量，Δ为臂的间隙。对于m łl K，这比单代理基线后悔0 (Kłog(T)/Δ)有所改善。在这项工作中，我们展示了在完全图的情况下，情况更加模糊。特别是，我们证明，如果在无向线图上使用最先进的算法，诚实的代理可能会遭受(几乎)线性遗憾，直到时间在K和n上是双指数。鉴于这个负面结果，我们提出了一种新的算法，其中第i个代理在任何连接和无向图上都有遗憾O((dmal(i) + K/n) łog(T)/Δ)，其中dmal(i)是i的邻居是恶意的数量。因此，我们将现有的遗憾边界推广到完全图(其中dmal(i) = m)之外，并表明恶意代理的影响完全是局部的(从某种意义上说，只有dmal(i)直接连接到i的恶意代理影响其长期遗憾)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust Multi-Agent Bandits Over Undirected Graphs

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量