{"title":"单服务器队列系统中基于学习的最优接纳控制","authors":"Asaf Cohen, Vijay Subramanian, Yili Zhang","doi":"10.1287/stsy.2022.0042","DOIUrl":null,"url":null,"abstract":"We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].","PeriodicalId":36337,"journal":{"name":"Stochastic Systems","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning-Based Optimal Admission Control in a Single-Server Queuing System\",\"authors\":\"Asaf Cohen, Vijay Subramanian, Yili Zhang\",\"doi\":\"10.1287/stsy.2022.0042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].\",\"PeriodicalId\":36337,\"journal\":{\"name\":\"Stochastic Systems\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stochastic Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/stsy.2022.0042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/stsy.2022.0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
摘要
我们考虑的是一个具有未知服务率和到达率的 M/M/1 排队系统中的长期平均利润最大化接纳控制问题。调度员在服务完成后收取固定奖励,并对排队等候的顾客强制执行单位时间成本,调度员根据对系统排队长度的完整历史观察,在到达时决定是否接纳到达的顾客。Naor [Naor P (1969) The regulation of queue size by levying tolls.Econometrica 37(1):15-24] 表明,如果模型的所有参数都是已知的,那么使用静态阈值策略是最优的:如果队列长度小于预定阈值,则接纳,否则不接纳。我们提出了一种基于学习的调度算法,并描述了其与 Naor [Naor P (1969) The regulation of queue size by levying tolls.经济计量学》37(1):15-24]。我们证明,当所有全信息最优阈值都不为零时,该算法的遗憾值为 O(1),而在全信息最优阈值为 0(即最优策略是拒绝所有到达者)的情况下,对于任何指定的[公式:见正文]遗憾值,其中 N 是到达者的数量,该算法的遗憾值为[公式:见正文]:A. Cohen 由美国国家科学基金会 [Grant DMS-2006305] 提供部分资助。V. Subramanian 部分获得了美国国家科学基金会 [CCF-2008130, ECCS-2038416, CNS-1955777 和 CMMI-2240981] 的资助。
Learning-Based Optimal Admission Control in a Single-Server Queuing System
We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].