Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning.Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .
{"title":"Corruption-Robust Exploration in Episodic Reinforcement Learning","authors":"Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun","doi":"10.1287/moor.2021.0202","DOIUrl":"https://doi.org/10.1287/moor.2021.0202","url":null,"abstract":"We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning.Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"61 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].
我们考虑的是马尔可夫决策过程,其中链的状态只在选定的观察时间和成本中给出。最优策略涉及观察时间和后续行动值的优化。我们考虑了有限视界和贴现无限视界问题,以及参数不确定性的扩展问题。通过将观测时间作为增强马尔可夫系统的一部分,值函数满足准变量不等式(QVI)系统。这类 QVI 可视为互连障碍问题的扩展。我们证明了这类 QVI 的比较原理,这意味着我们提出的问题的解具有唯一性。然后利用惩罚方法获得任意精确的解。最后,我们对三个应用进行了数值实验,以说明我们的框架:J. Tam 由工程与物理科学研究委员会 [Grant 2269738] 资助。
{"title":"Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme","authors":"Christoph Reisinger, Jonathan Tam","doi":"10.1287/moor.2023.0172","DOIUrl":"https://doi.org/10.1287/moor.2023.0172","url":null,"abstract":"We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"26 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the liquid welfare in sequential first-price auctions with budgeted buyers. We use a behavioral model for the buyers, assuming a learning style guarantee: the utility of each buyer is within a [Formula: see text] factor ([Formula: see text]) of the utility achievable by shading their value with the same factor at each iteration. We show a [Formula: see text] price of anarchy for liquid welfare when valuations are additive. This is in stark contrast to sequential second-price auctions, where the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare, even when [Formula: see text]. We prove a lower bound of [Formula: see text] on the liquid welfare loss under the given assumption in first-price auctions. Our liquid welfare results extend when buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of [Formula: see text] compared with the guarantee for the additive case.Funding: G. Fikioris is supported in part by the Air Force Office of Scientific Research [Grants FA9550-19-1-0183 and FA9550-23-1-0068], the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program, and the Onassis Foundation [Scholarship ID F ZS 068-1/2022-2023]. É. Tardos is supported in part by the NSF [Grant CCF-1408673] and AFOSR [Grants FA9550-19-1-0183, FA9550-23-1-0410, and FA9550-23-1-0068].
{"title":"Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions","authors":"Giannis Fikioris, Éva Tardos","doi":"10.1287/moor.2023.0274","DOIUrl":"https://doi.org/10.1287/moor.2023.0274","url":null,"abstract":"We study the liquid welfare in sequential first-price auctions with budgeted buyers. We use a behavioral model for the buyers, assuming a learning style guarantee: the utility of each buyer is within a [Formula: see text] factor ([Formula: see text]) of the utility achievable by shading their value with the same factor at each iteration. We show a [Formula: see text] price of anarchy for liquid welfare when valuations are additive. This is in stark contrast to sequential second-price auctions, where the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare, even when [Formula: see text]. We prove a lower bound of [Formula: see text] on the liquid welfare loss under the given assumption in first-price auctions. Our liquid welfare results extend when buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of [Formula: see text] compared with the guarantee for the additive case.Funding: G. Fikioris is supported in part by the Air Force Office of Scientific Research [Grants FA9550-19-1-0183 and FA9550-23-1-0068], the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program, and the Onassis Foundation [Scholarship ID F ZS 068-1/2022-2023]. É. Tardos is supported in part by the NSF [Grant CCF-1408673] and AFOSR [Grants FA9550-19-1-0183, FA9550-23-1-0410, and FA9550-23-1-0068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"32 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Micali–Vazirani (MV) algorithm for finding a maximum cardinality matching in general graphs, which was published in 1980, remains to this day the most efficient known algorithm for the problem. The current paper gives the first complete and correct proof of this algorithm. The MV algorithm resorts to finding minimum-length augmenting paths. However, such paths fail to satisfy an elementary property, called breadth first search honesty in this paper. In the absence of this property, an exponential time algorithm appears to be called for—just for finding one such path. On the other hand, the MV algorithm accomplishes this and additional tasks in linear time. The saving grace is the various “footholds” offered by the underlying structure, which the algorithm uses in order to perform its key tasks efficiently. The theory expounded in this paper elucidates this rich structure and yields a proof of correctness of the algorithm. It may also be of independent interest as a set of well-knit graph-theoretic facts.Funding: This work was supported in part by the National Science Foundation [Grant CCF-2230414].
{"title":"A Theory of Alternating Paths and Blossoms from the Perspective of Minimum Length","authors":"Vijay V. Vazirani","doi":"10.1287/moor.2020.0388","DOIUrl":"https://doi.org/10.1287/moor.2020.0388","url":null,"abstract":"The Micali–Vazirani (MV) algorithm for finding a maximum cardinality matching in general graphs, which was published in 1980, remains to this day the most efficient known algorithm for the problem. The current paper gives the first complete and correct proof of this algorithm. The MV algorithm resorts to finding minimum-length augmenting paths. However, such paths fail to satisfy an elementary property, called breadth first search honesty in this paper. In the absence of this property, an exponential time algorithm appears to be called for—just for finding one such path. On the other hand, the MV algorithm accomplishes this and additional tasks in linear time. The saving grace is the various “footholds” offered by the underlying structure, which the algorithm uses in order to perform its key tasks efficiently. The theory expounded in this paper elucidates this rich structure and yields a proof of correctness of the algorithm. It may also be of independent interest as a set of well-knit graph-theoretic facts.Funding: This work was supported in part by the National Science Foundation [Grant CCF-2230414].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"41 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140933967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper provides a complete review of the continuous-time optimal contracting problem introduced by Sannikov in the extended context allowing for possibly different discount rates for both parties. The agent’s problem is to seek for optimal effort given the compensation scheme proposed by the principal over a random horizon. Then, given the optimal agent’s response, the principal determines the best compensation scheme in terms of running payment, retirement, and lump-sum payment at retirement. A golden parachute is a situation where the agent ceases any effort at some positive stopping time and receives a payment afterward, possibly under the form of a lump-sum payment or of a continuous stream of payments. We show that a golden parachute only exists in certain specific circumstances. This is in contrast with the results claimed by Sannikov, where the only requirement is a positive agent’s marginal cost of effort at zero. In the general case, we prove that an agent with positive reservation utility is either never retired by the principal or retired above some given threshold (as in Sannikov’s solution). We show that different discount factors induce a facelifted utility function, which allows us to reduce the analysis to a setting similar to the equal-discount rates one. Finally, we also confirm that an agent with small reservation utility does have an informational rent, meaning that the principal optimally offers him a contract with strictly higher utility than his participation value.
{"title":"Is There a Golden Parachute in Sannikov’s Principal–Agent Problem?","authors":"Dylan Possamaï, Nizar Touzi","doi":"10.1287/moor.2022.0305","DOIUrl":"https://doi.org/10.1287/moor.2022.0305","url":null,"abstract":"This paper provides a complete review of the continuous-time optimal contracting problem introduced by Sannikov in the extended context allowing for possibly different discount rates for both parties. The agent’s problem is to seek for optimal effort given the compensation scheme proposed by the principal over a random horizon. Then, given the optimal agent’s response, the principal determines the best compensation scheme in terms of running payment, retirement, and lump-sum payment at retirement. A golden parachute is a situation where the agent ceases any effort at some positive stopping time and receives a payment afterward, possibly under the form of a lump-sum payment or of a continuous stream of payments. We show that a golden parachute only exists in certain specific circumstances. This is in contrast with the results claimed by Sannikov, where the only requirement is a positive agent’s marginal cost of effort at zero. In the general case, we prove that an agent with positive reservation utility is either never retired by the principal or retired above some given threshold (as in Sannikov’s solution). We show that different discount factors induce a facelifted utility function, which allows us to reduce the analysis to a setting similar to the equal-discount rates one. Finally, we also confirm that an agent with small reservation utility does have an informational rent, meaning that the principal optimally offers him a contract with strictly higher utility than his participation value.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"33 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden
Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogeneous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools, while in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time when the normalized offered load per server pool is suitably bounded and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows us to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows us to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.Funding: The work in this paper was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek [Gravitation Grant NETWORKS-024.002.003 and Vici Grant 202.068].
考虑一个由相同服务器池组成的系统,在这个系统中,服务时间呈指数分布的任务以时间同构泊松过程的形式到达。在一个内部控制环中,使用一个准入阈值将接收到的任务分配给服务器池,而在一个外部控制环中,一个学习方案会随着时间的推移调整该阈值,使其与系统的未知提供负载保持一致。在多服务器系统中,我们证明了当每个服务器池的归一化提供负载有适当界限时,学习方案会在一定时间间隔内达到平衡,从而实现负载的均衡分配。此外,当具有考克斯分布式服务时间的任务以恒定的速度到达,并且只使用系统中的任务总数来调整阈值时,我们也得出了类似的结果。本文开发的新颖证明技术不同于传统的流体极限分析,它允许我们处理第一学习方案的快速变化,这种快速变化是由占用过程中大小消失的偏移引发的。此外,我们的方法允许我们描述具有考克斯分布式服务时间的系统的渐近行为,而无需依赖详细状态描述符的流体极限:本文的研究工作得到了荷兰科学研究组织(Nederlandse Organisatie voor Wetenschappelijk Onderzoek)[引力资助 NETWORKS-024.002.003 和 Vici 资助 202.068]的支持。
{"title":"Learning and Balancing Unknown Loads in Large-Scale Systems","authors":"Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden","doi":"10.1287/moor.2021.0212","DOIUrl":"https://doi.org/10.1287/moor.2021.0212","url":null,"abstract":"Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogeneous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools, while in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time when the normalized offered load per server pool is suitably bounded and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows us to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows us to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.Funding: The work in this paper was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek [Gravitation Grant NETWORKS-024.002.003 and Vici Grant 202.068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"18 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of estimating an unknown function [Formula: see text] and its partial derivatives from a noisy data set of n observations, where we make no assumptions about [Formula: see text] except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of [Formula: see text] in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as [Formula: see text]. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.
{"title":"Estimating a Function and Its Derivatives Under a Smoothness Condition","authors":"Eunji Lim","doi":"10.1287/moor.2020.0161","DOIUrl":"https://doi.org/10.1287/moor.2020.0161","url":null,"abstract":"We consider the problem of estimating an unknown function [Formula: see text] and its partial derivatives from a noisy data set of n observations, where we make no assumptions about [Formula: see text] except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of [Formula: see text] in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as [Formula: see text]. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"40 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a discrete space and time framework, we study the mean field game limit for a class of symmetric N-player games based on the notion of correlated equilibrium. We give a definition of correlated solution that allows us to construct approximate N-player correlated equilibria that are robust with respect to progressive deviations. We illustrate our definition by way of an example with explicit solutions.Funding: O. Bonesini acknowledges financial support from Engineering and Physical Sciences Research Council [Grant EP/T032146/1]. M. Fischer acknowledges partial support through the University of Padua [Research Project BIRD229791 “Stochastic mean field control and the Schrödinger problem”].
在离散时空框架下,我们基于相关均衡的概念研究了一类对称 N 人博弈的均场博弈极限。我们给出了相关解的定义,它允许我们构建近似的 N 人相关均衡,这种均衡在渐进偏差方面是稳健的。我们通过一个有明确解的例子来说明我们的定义:O. Bonesini 感谢工程与物理科学研究委员会的资助[Grant EP/T032146/1]。M. Fischer 感谢帕多瓦大学[研究项目 BIRD229791 "随机均值场控制和薛定谔问题"]的部分资助。
{"title":"Correlated Equilibria for Mean Field Games with Progressive Strategies","authors":"Ofelia Bonesini, Luciano Campi, Markus Fischer","doi":"10.1287/moor.2022.0357","DOIUrl":"https://doi.org/10.1287/moor.2022.0357","url":null,"abstract":"In a discrete space and time framework, we study the mean field game limit for a class of symmetric N-player games based on the notion of correlated equilibrium. We give a definition of correlated solution that allows us to construct approximate N-player correlated equilibria that are robust with respect to progressive deviations. We illustrate our definition by way of an example with explicit solutions.Funding: O. Bonesini acknowledges financial support from Engineering and Physical Sciences Research Council [Grant EP/T032146/1]. M. Fischer acknowledges partial support through the University of Padua [Research Project BIRD229791 “Stochastic mean field control and the Schrödinger problem”].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"54 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is well-known that the McCormick relaxation for the bilinear constraint z = xy gives the convex hull over the box domains for x and y. In network applications where the domain of bilinear variables is described by a network polytope, the McCormick relaxation, also referred to as linearization, fails to provide the convex hull and often leads to poor dual bounds. We study the convex hull of the set containing bilinear constraints [Formula: see text] where xi represents the arc-flow variable in a network polytope, and yj is in a simplex. For the case where the simplex contains a single y variable, we introduce a systematic procedure to obtain the convex hull of the above set in the original space of variables, and show that all facet-defining inequalities of the convex hull can be obtained explicitly through identifying a special tree structure in the underlying network. For the generalization where the simplex contains multiple y variables, we design a constructive procedure to obtain an important class of facet-defining inequalities for the convex hull of the underlying bilinear set that is characterized by a special forest structure in the underlying network. Computational experiments conducted on different applications show the effectiveness of the proposed methods in improving the dual bounds obtained from alternative techniques.Funding: This work was supported by Air Force Office of Scientific Research [Grant FA9550-23-1-0183]; National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 2338641].Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0001 .
众所周知,双线性约束 z = xy 的麦考密克松弛法给出了 x 和 y 的盒域上的凸壳。在网络应用中,双线性变量的域是由网络多面体描述的,麦考密克松弛法(也称为线性化)无法提供凸壳,并且经常导致较差的对偶约束。我们研究的是包含双线性约束的集合的凸环[公式:见正文],其中 xi 代表网络多面体中的弧流变量,yj 位于简单形中。对于单纯形包含单个 y 变量的情况,我们引入了一个系统化的过程,以获得上述集合在原始变量空间中的凸壳,并证明凸壳的所有面定义不等式都可以通过识别底层网络中的特殊树结构显式地获得。对于单纯形包含多个 y 变量的广义情况,我们设计了一种构造过程,以获得一类重要的、以底层网络中的特殊森林结构为特征的底层双线性集合的凸面定义不等式。在不同应用中进行的计算实验表明,所提出的方法能有效改善通过其他技术获得的对偶约束:这项工作得到了空军科学研究办公室[FA9550-23-1-0183 号资助]和美国国家科学基金会民用、机械和制造创新部[2338641 号资助]的支持:在线附录见 https://doi.org/10.1287/moor.2023.0001 。
{"title":"Convexification of Bilinear Terms over Network Polytopes","authors":"Erfan Khademnia, Danial Davarnia","doi":"10.1287/moor.2023.0001","DOIUrl":"https://doi.org/10.1287/moor.2023.0001","url":null,"abstract":"It is well-known that the McCormick relaxation for the bilinear constraint z = xy gives the convex hull over the box domains for x and y. In network applications where the domain of bilinear variables is described by a network polytope, the McCormick relaxation, also referred to as linearization, fails to provide the convex hull and often leads to poor dual bounds. We study the convex hull of the set containing bilinear constraints [Formula: see text] where x<jats:sub>i</jats:sub> represents the arc-flow variable in a network polytope, and y<jats:sub>j</jats:sub> is in a simplex. For the case where the simplex contains a single y variable, we introduce a systematic procedure to obtain the convex hull of the above set in the original space of variables, and show that all facet-defining inequalities of the convex hull can be obtained explicitly through identifying a special tree structure in the underlying network. For the generalization where the simplex contains multiple y variables, we design a constructive procedure to obtain an important class of facet-defining inequalities for the convex hull of the underlying bilinear set that is characterized by a special forest structure in the underlying network. Computational experiments conducted on different applications show the effectiveness of the proposed methods in improving the dual bounds obtained from alternative techniques.Funding: This work was supported by Air Force Office of Scientific Research [Grant FA9550-23-1-0183]; National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 2338641].Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0001 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"10 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased observations [Formula: see text]. We consider here the case where [Formula: see text] is an a sequence of independent and identically distributed random variables sequence or a uniformly geometrically ergodic Markov chain. We derive pth moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak–Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time [Formula: see text] of the underlying chain and the norm of the noise variables. We emphasize that our result requires the LSA step size to scale only with logarithm of the problem dimension d.Funding: The work of A. Durmus and E. Moulines was partly supported by [Grant ANR-19-CHIA-0002]. This project received funding from the European Research Council [ERC-SyG OCEAN Grant 101071601]. The research of A. Naumov and S. Samsonov was prepared within the framework of the HSE University Basic Research Program.
本文对具有固定步长的线性随机逼近(LSA)算法进行了有限时间分析,该算法是统计学和机器学习的核心方法。LSA 用于计算 d 维线性系统[公式:见正文]的近似解,其中[公式:见正文]只能通过(渐近)无偏观测[公式:见正文]来估计。在此,我们考虑[公式:见正文]是独立且同分布随机变量序列或均匀几何遍历马尔可夫链的情况。我们推导出 LSA 及其 Polyak-Ruppert 平均版本所定义迭代的 pth 矩和高概率偏差边界。我们得到的 LSA 平均迭代的有限时间实例相关界限是尖锐的,因为我们得到的前导项与局部渐近最小极限相吻合。此外,我们的边界余项与底层链的混合时间[公式:见正文]和噪声变量的规范有紧密联系。我们强调,我们的结果要求 LSA 步长仅与问题维度 d.Funding 的对数成比例:A. Durmus 和 E. Moulines 的工作得到了[ANR-19-CHIA-0002 号资助]的部分支持。本项目得到了欧洲研究理事会 [ERC-SyG OCEAN Grant 101071601] 的资助。A. Naumov 和 S. Samsonov 的研究是在 HSE 大学基础研究计划框架内进行的。
{"title":"Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation","authors":"Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov","doi":"10.1287/moor.2022.0179","DOIUrl":"https://doi.org/10.1287/moor.2022.0179","url":null,"abstract":"This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased observations [Formula: see text]. We consider here the case where [Formula: see text] is an a sequence of independent and identically distributed random variables sequence or a uniformly geometrically ergodic Markov chain. We derive pth moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak–Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time [Formula: see text] of the underlying chain and the norm of the noise variables. We emphasize that our result requires the LSA step size to scale only with logarithm of the problem dimension d.Funding: The work of A. Durmus and E. Moulines was partly supported by [Grant ANR-19-CHIA-0002]. This project received funding from the European Research Council [ERC-SyG OCEAN Grant 101071601]. The research of A. Naumov and S. Samsonov was prepared within the framework of the HSE University Basic Research Program.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"185 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}