单服务器队列系统中基于学习的最优接纳控制

Q1 Mathematics Stochastic Systems Pub Date : 2024-01-05 DOI:10.1287/stsy.2022.0042

Asaf Cohen, Vijay Subramanian, Yili Zhang

{"title":"单服务器队列系统中基于学习的最优接纳控制","authors":"Asaf Cohen, Vijay Subramanian, Yili Zhang","doi":"10.1287/stsy.2022.0042","DOIUrl":null,"url":null,"abstract":"We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].","PeriodicalId":36337,"journal":{"name":"Stochastic Systems","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning-Based Optimal Admission Control in a Single-Server Queuing System\",\"authors\":\"Asaf Cohen, Vijay Subramanian, Yili Zhang\",\"doi\":\"10.1287/stsy.2022.0042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].\",\"PeriodicalId\":36337,\"journal\":{\"name\":\"Stochastic Systems\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stochastic Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/stsy.2022.0042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/stsy.2022.0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

摘要

我们考虑的是一个具有未知服务率和到达率的 M/M/1 排队系统中的长期平均利润最大化接纳控制问题。调度员在服务完成后收取固定奖励，并对排队等候的顾客强制执行单位时间成本，调度员根据对系统排队长度的完整历史观察，在到达时决定是否接纳到达的顾客。Naor [Naor P (1969) The regulation of queue size by levying tolls.Econometrica 37(1):15-24] 表明，如果模型的所有参数都是已知的，那么使用静态阈值策略是最优的：如果队列长度小于预定阈值，则接纳，否则不接纳。我们提出了一种基于学习的调度算法，并描述了其与 Naor [Naor P (1969) The regulation of queue size by levying tolls.经济计量学》37（1）：15-24]。我们证明，当所有全信息最优阈值都不为零时，该算法的遗憾值为 O(1)，而在全信息最优阈值为 0（即最优策略是拒绝所有到达者）的情况下，对于任何指定的[公式：见正文]遗憾值，其中 N 是到达者的数量，该算法的遗憾值为[公式：见正文]：A. Cohen 由美国国家科学基金会 [Grant DMS-2006305] 提供部分资助。V. Subramanian 部分获得了美国国家科学基金会 [CCF-2008130, ECCS-2038416, CNS-1955777 和 CMMI-2240981] 的资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning-Based Optimal Admission Control in a Single-Server Queuing System

We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Stochastic Systems Decision Sciences-Statistics, Probability and Uncertainty

CiteScore

3.70

自引率

0.00%

发文量