{"title":"限制决斗强盗边缘智力","authors":"Shangshang Wang;Ziyu Shao;Yang Yang","doi":"10.1109/TNSE.2024.3524362","DOIUrl":null,"url":null,"abstract":"Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a <italic>Constrained Two-Dueling Bandit (CTDB)</i> algorithm. We prove that CTDB achieves a <inline-formula><tex-math>$O(1/V + \\sqrt{\\log T / T})$</tex-math></inline-formula> round-averaged regret over the horizon <inline-formula><tex-math>$T$</tex-math></inline-formula> while keeping a budgeted cost where <inline-formula><tex-math>$V$</tex-math></inline-formula> is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"12 2","pages":"1126-1136"},"PeriodicalIF":7.9000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Constrained Dueling Bandits for Edge Intelligence\",\"authors\":\"Shangshang Wang;Ziyu Shao;Yang Yang\",\"doi\":\"10.1109/TNSE.2024.3524362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a <italic>Constrained Two-Dueling Bandit (CTDB)</i> algorithm. We prove that CTDB achieves a <inline-formula><tex-math>$O(1/V + \\\\sqrt{\\\\log T / T})$</tex-math></inline-formula> round-averaged regret over the horizon <inline-formula><tex-math>$T$</tex-math></inline-formula> while keeping a budgeted cost where <inline-formula><tex-math>$V$</tex-math></inline-formula> is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.\",\"PeriodicalId\":54229,\"journal\":{\"name\":\"IEEE Transactions on Network Science and Engineering\",\"volume\":\"12 2\",\"pages\":\"1126-1136\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2024-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Network Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10818641/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818641/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
Bandit被认为是不确定条件下在线决策问题的经典分析工具,例如,在工人可靠性未知的情况下,众包系统的任务分配。在传统的设置中,智能体从一组手臂中进行选择,通过定量奖励反馈来平衡开发和探索之间的权衡。尽管土匪很受欢迎,但它们的实际实施可能会遇到这样的问题:1)获得定量奖励是一个非常重要的问题,例如,评估工人的完成质量(奖励)需要领域专家建立指标;2)预算代理与选择武器的成本不匹配,例如众包平台(代理)应该向工人支付完成任务的费用(成本)。为了解决这些问题,1)我们使用决斗强盗通过定性两两比较而不是定量奖励来学习不确定性,例如,一个工人是否比另一个工人在分配的任务中表现得更好;2)我们利用在线控制,在选择武器时保证在预算内的成本。通过整合在线学习和在线控制,我们提出了一种约束双决斗强盗(CTDB)算法。我们证明CTDB在保持预算成本($V$是一个常数参数)的情况下,在最小化遗憾和约束满足之间取得了$O(1/V + \sqrt{\log T / T})$轮平均遗憾$T$。我们使用合成数据集和真实数据集进行了广泛的模拟,以证明CTDB优于基线的性能。
Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a Constrained Two-Dueling Bandit (CTDB) algorithm. We prove that CTDB achieves a $O(1/V + \sqrt{\log T / T})$ round-averaged regret over the horizon $T$ while keeping a budgeted cost where $V$ is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.
期刊介绍:
The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.