首页 > 最新文献

Mathematics of Operations Research最新文献

英文 中文
Corruption-Robust Exploration in Episodic Reinforcement Learning 情节强化学习中的腐败-稳健探索
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-23 DOI: 10.1287/moor.2021.0202
Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning.Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .
我们开始研究在底层系统的奖励和过渡概率都受到对抗性破坏的情况下的偶发强化学习(RL),并扩展了最近针对多臂匪徒特例的研究成果。我们提供了一个框架,通过对行动消除原理的补充,修正了现有强化学习方法在面对不确定性时基于乐观主义的积极探索。重要的是,我们的框架规避了在 RL 环境中天真地应用行动消除所带来的主要挑战,这一点通过我们展示的一个下限得到了正式体现。我们的框架能产生高效的算法,这些算法(a)在没有腐败的情况下能达到近乎最优的遗憾值,(b)能适应未知程度的腐败,并能保证遗憾值在所遇到的腐败总量中优雅地递减。为了展示我们方法的通用性,我们推导出了表格设置(其中状态和行动都是有限的)和线性马尔可夫决策过程设置(其中动态和奖励采用线性底层表示)的结果。值得注意的是,我们的研究首次提供了亚线性遗憾保证,这种保证可以在偶发强化学习的匪徒反馈模型中适应任何偏离纯独立同分布转换的情况:在线附录见 https://doi.org/10.1287/moor.2021.0202 。
{"title":"Corruption-Robust Exploration in Episodic Reinforcement Learning","authors":"Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun","doi":"10.1287/moor.2021.0202","DOIUrl":"https://doi.org/10.1287/moor.2021.0202","url":null,"abstract":"We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning.Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"61 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme 带观察成本的马尔可夫决策过程:框架与惩罚方案计算
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-23 DOI: 10.1287/moor.2023.0172
Christoph Reisinger, Jonathan Tam
We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].
我们考虑的是马尔可夫决策过程,其中链的状态只在选定的观察时间和成本中给出。最优策略涉及观察时间和后续行动值的优化。我们考虑了有限视界和贴现无限视界问题,以及参数不确定性的扩展问题。通过将观测时间作为增强马尔可夫系统的一部分,值函数满足准变量不等式(QVI)系统。这类 QVI 可视为互连障碍问题的扩展。我们证明了这类 QVI 的比较原理,这意味着我们提出的问题的解具有唯一性。然后利用惩罚方法获得任意精确的解。最后,我们对三个应用进行了数值实验,以说明我们的框架:J. Tam 由工程与物理科学研究委员会 [Grant 2269738] 资助。
{"title":"Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme","authors":"Christoph Reisinger, Jonathan Tam","doi":"10.1287/moor.2023.0172","DOIUrl":"https://doi.org/10.1287/moor.2023.0172","url":null,"abstract":"We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"26 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions 连续预算拍卖中无悔学习的流动福利保证
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-14 DOI: 10.1287/moor.2023.0274
Giannis Fikioris, Éva Tardos
We study the liquid welfare in sequential first-price auctions with budgeted buyers. We use a behavioral model for the buyers, assuming a learning style guarantee: the utility of each buyer is within a [Formula: see text] factor ([Formula: see text]) of the utility achievable by shading their value with the same factor at each iteration. We show a [Formula: see text] price of anarchy for liquid welfare when valuations are additive. This is in stark contrast to sequential second-price auctions, where the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare, even when [Formula: see text]. We prove a lower bound of [Formula: see text] on the liquid welfare loss under the given assumption in first-price auctions. Our liquid welfare results extend when buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of [Formula: see text] compared with the guarantee for the additive case.Funding: G. Fikioris is supported in part by the Air Force Office of Scientific Research [Grants FA9550-19-1-0183 and FA9550-23-1-0068], the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program, and the Onassis Foundation [Scholarship ID F ZS 068-1/2022-2023]. É. Tardos is supported in part by the NSF [Grant CCF-1408673] and AFOSR [Grants FA9550-19-1-0183, FA9550-23-1-0410, and FA9550-23-1-0068].
我们研究的是有预算买方的连续一口价拍卖中的流动福利。我们使用买方的行为模型,假定学习方式保证:每个买方的效用都在每次迭代时用相同的系数([公式:见正文])遮蔽其价值所能达到的效用的[公式:见正文]系数([公式:见正文])之内。我们展示了当价值相加时,流动福利的[公式:见正文]无政府价格。这与顺序二次价格拍卖形成了鲜明对比,在顺序二次价格拍卖中,即使是在[公式:见正文]的情况下,所得的流动福利也可以任意小于最大流动福利。我们证明了在给定假设下第一价格拍卖的流动福利损失的下限[公式:见正文]。我们的流动福利结果在买方对跨迭代赢得的项目集具有次模态估值时得到了扩展,与加法情况下的保证相比,[公式:见正文]的无政府状态价格约束稍差:G. Fikioris 的部分研究得到了空军科学研究办公室 [Grants FA9550-19-1-0183 and FA9550-23-1-0068] 、美国国防部 (Department of Defense) 的国防科学与工程研究生 (National Defense Science & Engineering Graduate (NDSEG) Fellowship Program) 以及奥纳西斯基金会 (Onassis Foundation) [Scholarship ID F ZS 068-1/2022-2023] 的支持。É.Tardos 部分获得了美国国家科学基金会 (NSF) [CCF-1408673 号基金] 和美国空军航天研究局 (AFOSR) [FA9550-19-1-0183 号基金、FA9550-23-1-0410 号基金和 FA9550-23-1-0068 号基金] 的资助。
{"title":"Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions","authors":"Giannis Fikioris, Éva Tardos","doi":"10.1287/moor.2023.0274","DOIUrl":"https://doi.org/10.1287/moor.2023.0274","url":null,"abstract":"We study the liquid welfare in sequential first-price auctions with budgeted buyers. We use a behavioral model for the buyers, assuming a learning style guarantee: the utility of each buyer is within a [Formula: see text] factor ([Formula: see text]) of the utility achievable by shading their value with the same factor at each iteration. We show a [Formula: see text] price of anarchy for liquid welfare when valuations are additive. This is in stark contrast to sequential second-price auctions, where the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare, even when [Formula: see text]. We prove a lower bound of [Formula: see text] on the liquid welfare loss under the given assumption in first-price auctions. Our liquid welfare results extend when buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of [Formula: see text] compared with the guarantee for the additive case.Funding: G. Fikioris is supported in part by the Air Force Office of Scientific Research [Grants FA9550-19-1-0183 and FA9550-23-1-0068], the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program, and the Onassis Foundation [Scholarship ID F ZS 068-1/2022-2023]. É. Tardos is supported in part by the NSF [Grant CCF-1408673] and AFOSR [Grants FA9550-19-1-0183, FA9550-23-1-0410, and FA9550-23-1-0068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"32 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Theory of Alternating Paths and Blossoms from the Perspective of Minimum Length 从最小长度的角度看交替路径和花朵理论
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-07 DOI: 10.1287/moor.2020.0388
Vijay V. Vazirani
The Micali–Vazirani (MV) algorithm for finding a maximum cardinality matching in general graphs, which was published in 1980, remains to this day the most efficient known algorithm for the problem. The current paper gives the first complete and correct proof of this algorithm. The MV algorithm resorts to finding minimum-length augmenting paths. However, such paths fail to satisfy an elementary property, called breadth first search honesty in this paper. In the absence of this property, an exponential time algorithm appears to be called for—just for finding one such path. On the other hand, the MV algorithm accomplishes this and additional tasks in linear time. The saving grace is the various “footholds” offered by the underlying structure, which the algorithm uses in order to perform its key tasks efficiently. The theory expounded in this paper elucidates this rich structure and yields a proof of correctness of the algorithm. It may also be of independent interest as a set of well-knit graph-theoretic facts.Funding: This work was supported in part by the National Science Foundation [Grant CCF-2230414].
Micali-Vazirani (MV) 算法于 1980 年发表,用于寻找一般图中的最大卡方匹配,至今仍是该问题已知的最有效算法。本文首次给出了该算法的完整而正确的证明。MV 算法依赖于寻找最小长度的增强路径。然而,这种路径不满足一个基本属性,本文称之为 "广度优先搜索诚实"。如果不具备这一特性,似乎需要一种指数时间算法--只为找到一条这样的路径。另一方面,MV 算法可以在线性时间内完成这项任务和其他任务。它的优势在于底层结构提供的各种 "支点",算法利用这些 "支点 "高效地完成关键任务。本文阐述的理论阐明了这一丰富的结构,并得出了算法的正确性证明。作为一套严密的图论事实,它也可能具有独立的意义:本研究部分得到了美国国家科学基金会 [CCF-2230414] 的资助。
{"title":"A Theory of Alternating Paths and Blossoms from the Perspective of Minimum Length","authors":"Vijay V. Vazirani","doi":"10.1287/moor.2020.0388","DOIUrl":"https://doi.org/10.1287/moor.2020.0388","url":null,"abstract":"The Micali–Vazirani (MV) algorithm for finding a maximum cardinality matching in general graphs, which was published in 1980, remains to this day the most efficient known algorithm for the problem. The current paper gives the first complete and correct proof of this algorithm. The MV algorithm resorts to finding minimum-length augmenting paths. However, such paths fail to satisfy an elementary property, called breadth first search honesty in this paper. In the absence of this property, an exponential time algorithm appears to be called for—just for finding one such path. On the other hand, the MV algorithm accomplishes this and additional tasks in linear time. The saving grace is the various “footholds” offered by the underlying structure, which the algorithm uses in order to perform its key tasks efficiently. The theory expounded in this paper elucidates this rich structure and yields a proof of correctness of the algorithm. It may also be of independent interest as a set of well-knit graph-theoretic facts.Funding: This work was supported in part by the National Science Foundation [Grant CCF-2230414].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"41 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140933967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is There a Golden Parachute in Sannikov’s Principal–Agent Problem? 桑尼科夫的委托代理问题中有金降落伞吗?
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-06 DOI: 10.1287/moor.2022.0305
Dylan Possamaï, Nizar Touzi
This paper provides a complete review of the continuous-time optimal contracting problem introduced by Sannikov in the extended context allowing for possibly different discount rates for both parties. The agent’s problem is to seek for optimal effort given the compensation scheme proposed by the principal over a random horizon. Then, given the optimal agent’s response, the principal determines the best compensation scheme in terms of running payment, retirement, and lump-sum payment at retirement. A golden parachute is a situation where the agent ceases any effort at some positive stopping time and receives a payment afterward, possibly under the form of a lump-sum payment or of a continuous stream of payments. We show that a golden parachute only exists in certain specific circumstances. This is in contrast with the results claimed by Sannikov, where the only requirement is a positive agent’s marginal cost of effort at zero. In the general case, we prove that an agent with positive reservation utility is either never retired by the principal or retired above some given threshold (as in Sannikov’s solution). We show that different discount factors induce a facelifted utility function, which allows us to reduce the analysis to a setting similar to the equal-discount rates one. Finally, we also confirm that an agent with small reservation utility does have an informational rent, meaning that the principal optimally offers him a contract with strictly higher utility than his participation value.
本文全面回顾了桑尼科夫提出的连续时间最优契约问题,并对其进行了扩展,允许双方采用可能不同的贴现率。代理人的问题是根据委托人提出的补偿方案,在随机时间范围内寻求最优努力。然后,委托人根据代理人的最佳反应,确定最佳补偿方案,包括运行付款、退休和退休时的一次性付款。金降落伞是指代理人在某个正停止时间停止任何努力,并在之后获得一笔付款,可能是一次性付款,也可能是连续的付款流。我们的研究表明,金降落伞只存在于某些特定情况下。这与桑尼科夫提出的结果不同,桑尼科夫提出的结果只要求代理人的边际努力成本为零。在一般情况下,我们证明具有正保留效用的代理人要么永远不会被委托人解雇,要么被解雇的时间超过某个给定的临界点(如桑尼科夫的解决方案)。我们证明,不同的贴现率会导致不同的效用函数,这使我们可以将分析简化为类似于等贴现率的情况。最后,我们还证实,保留效用较小的代理人确实有信息租金,这意味着委托人会以最优方式向其提供效用严格高于其参与价值的合同。
{"title":"Is There a Golden Parachute in Sannikov’s Principal–Agent Problem?","authors":"Dylan Possamaï, Nizar Touzi","doi":"10.1287/moor.2022.0305","DOIUrl":"https://doi.org/10.1287/moor.2022.0305","url":null,"abstract":"This paper provides a complete review of the continuous-time optimal contracting problem introduced by Sannikov in the extended context allowing for possibly different discount rates for both parties. The agent’s problem is to seek for optimal effort given the compensation scheme proposed by the principal over a random horizon. Then, given the optimal agent’s response, the principal determines the best compensation scheme in terms of running payment, retirement, and lump-sum payment at retirement. A golden parachute is a situation where the agent ceases any effort at some positive stopping time and receives a payment afterward, possibly under the form of a lump-sum payment or of a continuous stream of payments. We show that a golden parachute only exists in certain specific circumstances. This is in contrast with the results claimed by Sannikov, where the only requirement is a positive agent’s marginal cost of effort at zero. In the general case, we prove that an agent with positive reservation utility is either never retired by the principal or retired above some given threshold (as in Sannikov’s solution). We show that different discount factors induce a facelifted utility function, which allows us to reduce the analysis to a setting similar to the equal-discount rates one. Finally, we also confirm that an agent with small reservation utility does have an informational rent, meaning that the principal optimally offers him a contract with strictly higher utility than his participation value.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"33 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning and Balancing Unknown Loads in Large-Scale Systems 学习和平衡大规模系统中的未知负载
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-03 DOI: 10.1287/moor.2021.0212
Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden
Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogeneous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools, while in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time when the normalized offered load per server pool is suitably bounded and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows us to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows us to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.Funding: The work in this paper was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek [Gravitation Grant NETWORKS-024.002.003 and Vici Grant 202.068].
考虑一个由相同服务器池组成的系统,在这个系统中,服务时间呈指数分布的任务以时间同构泊松过程的形式到达。在一个内部控制环中,使用一个准入阈值将接收到的任务分配给服务器池,而在一个外部控制环中,一个学习方案会随着时间的推移调整该阈值,使其与系统的未知提供负载保持一致。在多服务器系统中,我们证明了当每个服务器池的归一化提供负载有适当界限时,学习方案会在一定时间间隔内达到平衡,从而实现负载的均衡分配。此外,当具有考克斯分布式服务时间的任务以恒定的速度到达,并且只使用系统中的任务总数来调整阈值时,我们也得出了类似的结果。本文开发的新颖证明技术不同于传统的流体极限分析,它允许我们处理第一学习方案的快速变化,这种快速变化是由占用过程中大小消失的偏移引发的。此外,我们的方法允许我们描述具有考克斯分布式服务时间的系统的渐近行为,而无需依赖详细状态描述符的流体极限:本文的研究工作得到了荷兰科学研究组织(Nederlandse Organisatie voor Wetenschappelijk Onderzoek)[引力资助 NETWORKS-024.002.003 和 Vici 资助 202.068]的支持。
{"title":"Learning and Balancing Unknown Loads in Large-Scale Systems","authors":"Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden","doi":"10.1287/moor.2021.0212","DOIUrl":"https://doi.org/10.1287/moor.2021.0212","url":null,"abstract":"Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogeneous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools, while in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time when the normalized offered load per server pool is suitably bounded and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows us to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows us to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.Funding: The work in this paper was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek [Gravitation Grant NETWORKS-024.002.003 and Vici Grant 202.068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"18 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating a Function and Its Derivatives Under a Smoothness Condition 在平滑条件下估算函数及其导数
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-05-02 DOI: 10.1287/moor.2020.0161
Eunji Lim
We consider the problem of estimating an unknown function [Formula: see text] and its partial derivatives from a noisy data set of n observations, where we make no assumptions about [Formula: see text] except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of [Formula: see text] in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as [Formula: see text]. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.
在这种情况下,[公式:见正文] 的一个自然候选估计子是满足一定平稳性条件的最佳拟合数据集。这个估计值可以看作是最小二乘估计值,它受制于某个平滑度量的上限。另一种有用的估计器是最小化平滑度的估计器,它受制于平方误差平均值的上限。我们证明了这两个估计器可以作为二次方程程序的解来计算,建立了这些估计器及其偏导数的一致性,并研究了收敛率[公式:见正文]。在估算股票期权价值及其二阶导数作为标的股票价格函数时,我们用数字说明了估算器的有效性。
{"title":"Estimating a Function and Its Derivatives Under a Smoothness Condition","authors":"Eunji Lim","doi":"10.1287/moor.2020.0161","DOIUrl":"https://doi.org/10.1287/moor.2020.0161","url":null,"abstract":"We consider the problem of estimating an unknown function [Formula: see text] and its partial derivatives from a noisy data set of n observations, where we make no assumptions about [Formula: see text] except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of [Formula: see text] in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as [Formula: see text]. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"40 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correlated Equilibria for Mean Field Games with Progressive Strategies 具有渐进策略的均势博弈的相关均衡点
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-04-29 DOI: 10.1287/moor.2022.0357
Ofelia Bonesini, Luciano Campi, Markus Fischer
In a discrete space and time framework, we study the mean field game limit for a class of symmetric N-player games based on the notion of correlated equilibrium. We give a definition of correlated solution that allows us to construct approximate N-player correlated equilibria that are robust with respect to progressive deviations. We illustrate our definition by way of an example with explicit solutions.Funding: O. Bonesini acknowledges financial support from Engineering and Physical Sciences Research Council [Grant EP/T032146/1]. M. Fischer acknowledges partial support through the University of Padua [Research Project BIRD229791 “Stochastic mean field control and the Schrödinger problem”].
在离散时空框架下,我们基于相关均衡的概念研究了一类对称 N 人博弈的均场博弈极限。我们给出了相关解的定义,它允许我们构建近似的 N 人相关均衡,这种均衡在渐进偏差方面是稳健的。我们通过一个有明确解的例子来说明我们的定义:O. Bonesini 感谢工程与物理科学研究委员会的资助[Grant EP/T032146/1]。M. Fischer 感谢帕多瓦大学[研究项目 BIRD229791 "随机均值场控制和薛定谔问题"]的部分资助。
{"title":"Correlated Equilibria for Mean Field Games with Progressive Strategies","authors":"Ofelia Bonesini, Luciano Campi, Markus Fischer","doi":"10.1287/moor.2022.0357","DOIUrl":"https://doi.org/10.1287/moor.2022.0357","url":null,"abstract":"In a discrete space and time framework, we study the mean field game limit for a class of symmetric N-player games based on the notion of correlated equilibrium. We give a definition of correlated solution that allows us to construct approximate N-player correlated equilibria that are robust with respect to progressive deviations. We illustrate our definition by way of an example with explicit solutions.Funding: O. Bonesini acknowledges financial support from Engineering and Physical Sciences Research Council [Grant EP/T032146/1]. M. Fischer acknowledges partial support through the University of Padua [Research Project BIRD229791 “Stochastic mean field control and the Schrödinger problem”].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"54 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convexification of Bilinear Terms over Network Polytopes 网络多边形上双线性项的凸化
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-04-22 DOI: 10.1287/moor.2023.0001
Erfan Khademnia, Danial Davarnia
It is well-known that the McCormick relaxation for the bilinear constraint z = xy gives the convex hull over the box domains for x and y. In network applications where the domain of bilinear variables is described by a network polytope, the McCormick relaxation, also referred to as linearization, fails to provide the convex hull and often leads to poor dual bounds. We study the convex hull of the set containing bilinear constraints [Formula: see text] where xi represents the arc-flow variable in a network polytope, and yj is in a simplex. For the case where the simplex contains a single y variable, we introduce a systematic procedure to obtain the convex hull of the above set in the original space of variables, and show that all facet-defining inequalities of the convex hull can be obtained explicitly through identifying a special tree structure in the underlying network. For the generalization where the simplex contains multiple y variables, we design a constructive procedure to obtain an important class of facet-defining inequalities for the convex hull of the underlying bilinear set that is characterized by a special forest structure in the underlying network. Computational experiments conducted on different applications show the effectiveness of the proposed methods in improving the dual bounds obtained from alternative techniques.Funding: This work was supported by Air Force Office of Scientific Research [Grant FA9550-23-1-0183]; National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 2338641].Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0001 .
众所周知,双线性约束 z = xy 的麦考密克松弛法给出了 x 和 y 的盒域上的凸壳。在网络应用中,双线性变量的域是由网络多面体描述的,麦考密克松弛法(也称为线性化)无法提供凸壳,并且经常导致较差的对偶约束。我们研究的是包含双线性约束的集合的凸环[公式:见正文],其中 xi 代表网络多面体中的弧流变量,yj 位于简单形中。对于单纯形包含单个 y 变量的情况,我们引入了一个系统化的过程,以获得上述集合在原始变量空间中的凸壳,并证明凸壳的所有面定义不等式都可以通过识别底层网络中的特殊树结构显式地获得。对于单纯形包含多个 y 变量的广义情况,我们设计了一种构造过程,以获得一类重要的、以底层网络中的特殊森林结构为特征的底层双线性集合的凸面定义不等式。在不同应用中进行的计算实验表明,所提出的方法能有效改善通过其他技术获得的对偶约束:这项工作得到了空军科学研究办公室[FA9550-23-1-0183 号资助]和美国国家科学基金会民用、机械和制造创新部[2338641 号资助]的支持:在线附录见 https://doi.org/10.1287/moor.2023.0001 。
{"title":"Convexification of Bilinear Terms over Network Polytopes","authors":"Erfan Khademnia, Danial Davarnia","doi":"10.1287/moor.2023.0001","DOIUrl":"https://doi.org/10.1287/moor.2023.0001","url":null,"abstract":"It is well-known that the McCormick relaxation for the bilinear constraint z = xy gives the convex hull over the box domains for x and y. In network applications where the domain of bilinear variables is described by a network polytope, the McCormick relaxation, also referred to as linearization, fails to provide the convex hull and often leads to poor dual bounds. We study the convex hull of the set containing bilinear constraints [Formula: see text] where x<jats:sub>i</jats:sub> represents the arc-flow variable in a network polytope, and y<jats:sub>j</jats:sub> is in a simplex. For the case where the simplex contains a single y variable, we introduce a systematic procedure to obtain the convex hull of the above set in the original space of variables, and show that all facet-defining inequalities of the convex hull can be obtained explicitly through identifying a special tree structure in the underlying network. For the generalization where the simplex contains multiple y variables, we design a constructive procedure to obtain an important class of facet-defining inequalities for the convex hull of the underlying bilinear set that is characterized by a special forest structure in the underlying network. Computational experiments conducted on different applications show the effectiveness of the proposed methods in improving the dual bounds obtained from alternative techniques.Funding: This work was supported by Air Force Office of Scientific Research [Grant FA9550-23-1-0183]; National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 2338641].Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0001 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"10 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation 线性随机逼近的 Polyak-Ruppert 平均迭代的有限时间高概率边界
IF 1.7 3区 数学 Q2 MATHEMATICS, APPLIED Pub Date : 2024-04-16 DOI: 10.1287/moor.2022.0179
Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased observations [Formula: see text]. We consider here the case where [Formula: see text] is an a sequence of independent and identically distributed random variables sequence or a uniformly geometrically ergodic Markov chain. We derive pth moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak–Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time [Formula: see text] of the underlying chain and the norm of the noise variables. We emphasize that our result requires the LSA step size to scale only with logarithm of the problem dimension d.Funding: The work of A. Durmus and E. Moulines was partly supported by [Grant ANR-19-CHIA-0002]. This project received funding from the European Research Council [ERC-SyG OCEAN Grant 101071601]. The research of A. Naumov and S. Samsonov was prepared within the framework of the HSE University Basic Research Program.
本文对具有固定步长的线性随机逼近(LSA)算法进行了有限时间分析,该算法是统计学和机器学习的核心方法。LSA 用于计算 d 维线性系统[公式:见正文]的近似解,其中[公式:见正文]只能通过(渐近)无偏观测[公式:见正文]来估计。在此,我们考虑[公式:见正文]是独立且同分布随机变量序列或均匀几何遍历马尔可夫链的情况。我们推导出 LSA 及其 Polyak-Ruppert 平均版本所定义迭代的 pth 矩和高概率偏差边界。我们得到的 LSA 平均迭代的有限时间实例相关界限是尖锐的,因为我们得到的前导项与局部渐近最小极限相吻合。此外,我们的边界余项与底层链的混合时间[公式:见正文]和噪声变量的规范有紧密联系。我们强调,我们的结果要求 LSA 步长仅与问题维度 d.Funding 的对数成比例:A. Durmus 和 E. Moulines 的工作得到了[ANR-19-CHIA-0002 号资助]的部分支持。本项目得到了欧洲研究理事会 [ERC-SyG OCEAN Grant 101071601] 的资助。A. Naumov 和 S. Samsonov 的研究是在 HSE 大学基础研究计划框架内进行的。
{"title":"Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation","authors":"Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov","doi":"10.1287/moor.2022.0179","DOIUrl":"https://doi.org/10.1287/moor.2022.0179","url":null,"abstract":"This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased observations [Formula: see text]. We consider here the case where [Formula: see text] is an a sequence of independent and identically distributed random variables sequence or a uniformly geometrically ergodic Markov chain. We derive pth moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak–Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time [Formula: see text] of the underlying chain and the norm of the noise variables. We emphasize that our result requires the LSA step size to scale only with logarithm of the problem dimension d.Funding: The work of A. Durmus and E. Moulines was partly supported by [Grant ANR-19-CHIA-0002]. This project received funding from the European Research Council [ERC-SyG OCEAN Grant 101071601]. The research of A. Naumov and S. Samsonov was prepared within the framework of the HSE University Basic Research Program.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"185 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Mathematics of Operations Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1