Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities

Xinzhe Fu, E. Modiano
{"title":"Joint Learning and Control in Stochastic Queueing Networks with Unknown Utilities","authors":"Xinzhe Fu, E. Modiano","doi":"10.1145/3570619","DOIUrl":null,"url":null,"abstract":"We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher un at server sm, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities. We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3570619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher un at server sm, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizon T. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities. We first show that the expected utility of the optimal dynamic policy is upper bounded by T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
未知效用随机排队网络的联合学习与控制
研究了一类随机排队网络的最优控制问题,其中一组作业调度程序与一组具有队列的并行服务器相连。作业到达调度程序,并按照某种路由策略路由到服务器。作业的到达过程和服务器的服务过程都是随机的,到达率和服务率都是未知的。在服务器sm上的调度程序完成每个作业后,获得一个均值未知的随机效用。我们试图设计一个控制策略,在调度器上做出路由决策,在服务器上做出调度决策,以最大限度地提高有限时间范围t结束时获得的总效用。策略的性能通过后悔来衡量,后悔被定义为总期望效用与可访问到达率、服务率和底层效用的最优动态策略的差异。我们首先证明了最优动态策略的期望效用的上限是T乘以静态线性规划的解,其中优化变量对应于从调度器到服务器的作业率,可行性区域由到达率和服务率参数化。接下来,我们提出了一个最优控制问题的策略,它是一个学习算法和控制策略的集成。学习算法基于最优控制问题中可用的信息来学习静态线性规划的最优极值点解。控制策略混合了调度端基于优先级和联合最短队列的路由,服务器端基于优先级的调度,根据学习算法提供的极值点解生成的图形结构进行决策。我们证明了我们的策略实现了对数后悔,而现有技术应用于最优控制问题将导致Ω(√T)-后悔。理论分析进一步与模拟相辅相成,以评估我们的政策的实证表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.20
自引率
0.00%
发文量
0
期刊最新文献
A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs POMACS V7, N2, June 2023 Editorial SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1