Constrained Dueling Bandits for Edge Intelligence

IF 7.9 2区 计算机科学 Q1 ENGINEERING, MULTIDISCIPLINARY IEEE Transactions on Network Science and Engineering Pub Date : 2024-12-30 DOI:10.1109/TNSE.2024.3524362
Shangshang Wang;Ziyu Shao;Yang Yang
{"title":"Constrained Dueling Bandits for Edge Intelligence","authors":"Shangshang Wang;Ziyu Shao;Yang Yang","doi":"10.1109/TNSE.2024.3524362","DOIUrl":null,"url":null,"abstract":"Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a <italic>Constrained Two-Dueling Bandit (CTDB)</i> algorithm. We prove that CTDB achieves a <inline-formula><tex-math>$O(1/V + \\sqrt{\\log T / T})$</tex-math></inline-formula> round-averaged regret over the horizon <inline-formula><tex-math>$T$</tex-math></inline-formula> while keeping a budgeted cost where <inline-formula><tex-math>$V$</tex-math></inline-formula> is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"12 2","pages":"1126-1136"},"PeriodicalIF":7.9000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818641/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Bandit is acknowledged as a classical analytic tool for the online decision-making problem under uncertainty, e.g., task assignment for crowdsourcing systems given the unknown reliability of workers. In the conventional setup, an agent selects from a set of arms across rounds to balance the exploitation-exploration tradeoff using quantitive reward feedback. Despite bandits' popularity, their practical implementation may run into concerns like 1) obtaining the quantitive reward is a non-trivial problem, e.g., evaluating workers' completion quality (reward) requires domain experts to set up metrics; 2) mismatch between the budgeted agent and costs for selecting arms, e.g., the crowdsourcing platform (agent) should offer payments (cost) to workers to complete tasks. To address such concerns, 1) we employ dueling bandits to learn the uncertainties via qualitative pairwise comparisons rather than quantitive rewards, e.g., whether a worker performs better on the assigned task than the other; 2) we utilize online control to guarantee a within-budget cost while selecting arms. By integrating online learning and online control, we propose a Constrained Two-Dueling Bandit (CTDB) algorithm. We prove that CTDB achieves a $O(1/V + \sqrt{\log T / T})$ round-averaged regret over the horizon $T$ while keeping a budgeted cost where $V$ is a constant parameter balancing the tradeoff between regret minimization and constraint satisfaction. We conduct extensive simulations with synthetic and real-world datasets to demonstrate the outperformance of CTDB over baselines.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
限制决斗强盗边缘智力
Bandit被认为是不确定条件下在线决策问题的经典分析工具,例如,在工人可靠性未知的情况下,众包系统的任务分配。在传统的设置中,智能体从一组手臂中进行选择,通过定量奖励反馈来平衡开发和探索之间的权衡。尽管土匪很受欢迎,但它们的实际实施可能会遇到这样的问题:1)获得定量奖励是一个非常重要的问题,例如,评估工人的完成质量(奖励)需要领域专家建立指标;2)预算代理与选择武器的成本不匹配,例如众包平台(代理)应该向工人支付完成任务的费用(成本)。为了解决这些问题,1)我们使用决斗强盗通过定性两两比较而不是定量奖励来学习不确定性,例如,一个工人是否比另一个工人在分配的任务中表现得更好;2)我们利用在线控制,在选择武器时保证在预算内的成本。通过整合在线学习和在线控制,我们提出了一种约束双决斗强盗(CTDB)算法。我们证明CTDB在保持预算成本($V$是一个常数参数)的情况下,在最小化遗憾和约束满足之间取得了$O(1/V + \sqrt{\log T / T})$轮平均遗憾$T$。我们使用合成数据集和真实数据集进行了广泛的模拟,以证明CTDB优于基线的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Network Science and Engineering
IEEE Transactions on Network Science and Engineering Engineering-Control and Systems Engineering
CiteScore
12.60
自引率
9.10%
发文量
393
期刊介绍: The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.
期刊最新文献
Collective Decision-Making Over Nonlinear Potential Fields Movable Antenna-Enabled Secure Transmission for Active RIS-Aided ISAC Systems AI-Empowered Smart Contract Vulnerability Detection for Decentralized Blockchain Systems Deep Learning for Velocity and Range Estimation of Drones in Urban Occlusion Environments RF-Based Identification Framework Against Unauthorized UAV Networking in Low-Altitude Economy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1