{"title":"Constrained Dueling Bandits for Edge Intelligence","authors":"Shangshang Wang;Ziyu Shao;Yang Yang","doi":"10.1109/TNSE.2024.3524362","DOIUrl":null,"url":null,"abstract":"Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a <italic>Constrained Two-Dueling Bandit (CTDB)</i> algorithm. We prove that CTDB achieves a <inline-formula><tex-math>$O(1/V + \\sqrt{\\log T / T})$</tex-math></inline-formula> round-averaged regret over the horizon <inline-formula><tex-math>$T$</tex-math></inline-formula> while keeping a budgeted cost where <inline-formula><tex-math>$V$</tex-math></inline-formula> is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"12 2","pages":"1126-1136"},"PeriodicalIF":6.7000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818641/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a Constrained Two-Dueling Bandit (CTDB) algorithm. We prove that CTDB achieves a $O(1/V + \sqrt{\log T / T})$ round-averaged regret over the horizon $T$ while keeping a budgeted cost where $V$ is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.
期刊介绍:
The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.