Pub Date : 2024-05-25DOI: 10.1007/s10107-024-02096-x
Christoph Hertrich, Leon Sering
This paper studies the expressive power of artificial neural networks with rectified linear units. In order to study them as a model of real-valued computation, we introduce the concept of Max-Affine Arithmetic Programs and show equivalence between them and neural networks concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization problems can be solved with polynomial-size neural networks. First, we show that for any undirected graph with n nodes, there is a neural network (with fixed weights and biases) of size (mathcal {O}(n^3)) that takes the edge weights as input and computes the value of a minimum spanning tree of the graph. Second, we show that for any directed graph with n nodes and m arcs, there is a neural network of size (mathcal {O}(m^2n^2)) that takes the arc capacities as input and computes a maximum flow. Our results imply that these two problems can be solved with strongly polynomial time algorithms that solely use affine transformations and maxima computations, but no comparison-based branchings.
本文研究了具有整流线性单元的人工神经网络的表达能力。为了将它们作为实值计算模型进行研究,我们引入了最大阿芬算术程序的概念,并证明了它们与神经网络在自然复杂性度量方面的等价性。然后,我们利用这一结果表明,两个基本的组合优化问题可以用多项式大小的神经网络来解决。首先,我们证明了对于任何有 n 个节点的无向图,都存在一个大小为 (mathcal {O}(n^3)) 的神经网络(具有固定权重和偏置),它将边的权重作为输入,并计算图的最小生成树的值。其次,我们证明了对于任何有 n 个节点和 m 个弧的有向图,存在一个大小为 (mathcal {O}(m^2n^2)) 的神经网络,它将弧的容量作为输入,并计算出最大流量。我们的结果意味着这两个问题可以用强多项式时间算法来解决,这种算法只使用仿射变换和最大值计算,而不使用基于比较的分支。
{"title":"ReLU neural networks of polynomial size for exact maximum flow computation","authors":"Christoph Hertrich, Leon Sering","doi":"10.1007/s10107-024-02096-x","DOIUrl":"https://doi.org/10.1007/s10107-024-02096-x","url":null,"abstract":"<p>This paper studies the expressive power of artificial neural networks with rectified linear units. In order to study them as a model of <i>real-valued</i> computation, we introduce the concept of <i>Max-Affine Arithmetic Programs</i> and show equivalence between them and neural networks concerning natural complexity measures. We then use this result to show that two fundamental combinatorial optimization problems can be solved with polynomial-size neural networks. First, we show that for any undirected graph with <i>n</i> nodes, there is a neural network (with fixed weights and biases) of size <span>(mathcal {O}(n^3))</span> that takes the edge weights as input and computes the value of a minimum spanning tree of the graph. Second, we show that for any directed graph with <i>n</i> nodes and <i>m</i> arcs, there is a neural network of size <span>(mathcal {O}(m^2n^2))</span> that takes the arc capacities as input and computes a maximum flow. Our results imply that these two problems can be solved with strongly polynomial time algorithms that solely use affine transformations and maxima computations, but no comparison-based branchings.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"104 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-25DOI: 10.1007/s10107-024-02094-z
Claire Mathieu, Hang Zhou
In the unsplittable capacitated vehicle routing problem (UCVRP) on trees, we are given a rooted tree with edge weights and a subset of vertices of the tree called terminals. Each terminal is associated with a positive demand between 0 and 1. The goal is to find a minimum length collection of tours starting and ending at the root of the tree such that the demand of each terminal is covered by a single tour (i.e., the demand cannot be split), and the total demand of the terminals in each tour does not exceed the capacity of 1.
For the special case when all terminals have equal demands, a long line of research culminated in a quasi-polynomial time approximation scheme [Jayaprakash and Salavatipour, TALG 2023] and a polynomial time approximation scheme [Mathieu and Zhou, TALG 2023].
In this work, we study the general case when the terminals have arbitrary demands. Our main contribution is a polynomial time ((1.5+epsilon ))-approximation algorithm for the UCVRP on trees. This is the first improvement upon the 2-approximation algorithm more than 30 years ago. Our approximation ratio is essentially best possible, since it is NP-hard to approximate the UCVRP on trees to better than a 1.5 factor.
{"title":"A tight $$(1.5+epsilon )$$ -approximation for unsplittable capacitated vehicle routing on trees","authors":"Claire Mathieu, Hang Zhou","doi":"10.1007/s10107-024-02094-z","DOIUrl":"https://doi.org/10.1007/s10107-024-02094-z","url":null,"abstract":"<p>In the unsplittable capacitated vehicle routing problem (UCVRP) on trees, we are given a rooted tree with edge weights and a subset of vertices of the tree called terminals. Each terminal is associated with a positive demand between 0 and 1. The goal is to find a minimum length collection of tours starting and ending at the root of the tree such that the demand of each terminal is covered by a single tour (i.e., the demand cannot be split), and the total demand of the terminals in each tour does not exceed the capacity of 1.</p><p>For the special case when all terminals have equal demands, a long line of research culminated in a quasi-polynomial time approximation scheme [Jayaprakash and Salavatipour, TALG 2023] and a polynomial time approximation scheme [Mathieu and Zhou, TALG 2023].</p><p>In this work, we study the general case when the terminals have arbitrary demands. Our main contribution is a polynomial time <span>((1.5+epsilon ))</span>-approximation algorithm for the UCVRP on trees. This is the first improvement upon the 2-approximation algorithm more than 30 years ago. Our approximation ratio is essentially best possible, since it is NP-hard to approximate the UCVRP on trees to better than a 1.5 factor.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"75 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-25DOI: 10.1007/s10107-024-02093-0
Yu Hin Au, Levent Tunçel
We study the lift-and-project rank of the stable set polytopes of graphs with respect to the Lovász–Schrijver SDP operator ({{,textrm{LS},}}_+). In particular, we focus on a search for relatively small graphs with high ({{,textrm{LS},}}_+)-rank (i.e., the least number of iterations of the ({{,textrm{LS},}}_+) operator on the fractional stable set polytope to compute the stable set polytope). We provide families of graphs whose ({{,textrm{LS},}}_+)-rank is asymptotically a linear function of its number of vertices, which is the best possible up to improvements in the constant factor. This improves upon the previous best result in this direction from 1999, which yielded graphs whose ({{,textrm{LS},}}_+)-rank only grew with the square root of the number of vertices.
{"title":"Stable set polytopes with high lift-and-project ranks for the Lovász–Schrijver SDP operator","authors":"Yu Hin Au, Levent Tunçel","doi":"10.1007/s10107-024-02093-0","DOIUrl":"https://doi.org/10.1007/s10107-024-02093-0","url":null,"abstract":"<p>We study the lift-and-project rank of the stable set polytopes of graphs with respect to the Lovász–Schrijver SDP operator <span>({{,textrm{LS},}}_+)</span>. In particular, we focus on a search for relatively small graphs with high <span>({{,textrm{LS},}}_+)</span>-rank (i.e., the least number of iterations of the <span>({{,textrm{LS},}}_+)</span> operator on the fractional stable set polytope to compute the stable set polytope). We provide families of graphs whose <span>({{,textrm{LS},}}_+)</span>-rank is asymptotically a linear function of its number of vertices, which is the best possible up to improvements in the constant factor. This improves upon the previous best result in this direction from 1999, which yielded graphs whose <span>({{,textrm{LS},}}_+)</span>-rank only grew with the square root of the number of vertices.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"220 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1007/s10107-024-02092-1
Gonzalo Muñoz, Joseph Paat, Felipe Serrano
The intersection cut framework was introduced by Balas in 1971 as a method for generating cutting planes in integer optimization. In this framework, one uses a full-dimensional convex S-free set, where S is the feasible region of the integer program, to derive a cut separating S from a non-integral vertex of a linear relaxation of S. Among all S-free sets, it is the inclusion-wise maximal ones that yield the strongest cuts. Recently, this framework has been extended beyond the integer case in order to obtain cutting planes in non-linear settings. In this work, we consider the specific setting when S is defined by a homogeneous quadratic inequality. In this ‘quadratic-free’ setting, every function (Gamma : D^m rightarrow D^n), where (D^k) is the unit sphere in (mathbb {R}^k), generates a representation of a quadratic-free set. While not every (Gamma ) generates a maximal quadratic free set, it is the case that every full-dimensional maximal quadratic free set is generated by some (Gamma ). Our main result shows that the corresponding quadratic-free set is full-dimensional and maximal if and only if (Gamma ) is non-expansive and satisfies a technical condition. This result yields a broader class of maximal S-free sets than previously known. Our result stems from a new characterization of maximal S-free sets (for general S beyond the quadratic setting) based on sequences that ‘expose’ inequalities defining the S-free set.
相交切框架由巴拉斯于 1971 年提出,是一种在整数优化中生成切平面的方法。在这个框架中,我们使用一个全维凸无 S 集(其中 S 是整数程序的可行区域)来导出一个切面,将 S 与 S 的线性松弛的非积分顶点分开。最近,这一框架已被扩展到整数情况之外,以获得非线性环境中的切割平面。在这项研究中,我们考虑了 S 由同质二次不等式定义的特殊情况。在这种 "无二次不等式 "设置中,每个函数(Gamma : D^m rightarrow D^n),其中((D^k)是(mathbb {R}^k) 中的单位球)都会生成一个无二次不等式集的表示。虽然并不是每一个 (Gamma ) 都会生成一个最大二次自由集,但每一个全维最大二次自由集都是由(Gamma ) 生成的。我们的主要结果表明,当且仅当(Gamma )是非扩张的并且满足一个技术条件时,相应的无二次方集合才是全维的和最大的。这一结果产生了一类比以前已知的更广泛的最大无S集。我们的结果源于对最大无S集的新描述(对于一般S,超出了二次设定),这种描述基于 "暴露 "定义无S集的不等式的序列。
{"title":"A characterization of maximal homogeneous-quadratic-free sets","authors":"Gonzalo Muñoz, Joseph Paat, Felipe Serrano","doi":"10.1007/s10107-024-02092-1","DOIUrl":"https://doi.org/10.1007/s10107-024-02092-1","url":null,"abstract":"<p>The intersection cut framework was introduced by Balas in 1971 as a method for generating cutting planes in integer optimization. In this framework, one uses a full-dimensional convex <i>S</i>-free set, where <i>S</i> is the feasible region of the integer program, to derive a cut separating <i>S</i> from a non-integral vertex of a linear relaxation of <i>S</i>. Among all <i>S</i>-free sets, it is the inclusion-wise maximal ones that yield the strongest cuts. Recently, this framework has been extended beyond the integer case in order to obtain cutting planes in non-linear settings. In this work, we consider the specific setting when <i>S</i> is defined by a homogeneous quadratic inequality. In this ‘quadratic-free’ setting, every function <span>(Gamma : D^m rightarrow D^n)</span>, where <span>(D^k)</span> is the unit sphere in <span>(mathbb {R}^k)</span>, generates a representation of a quadratic-free set. While not every <span>(Gamma )</span> generates a maximal quadratic free set, it is the case that every full-dimensional maximal quadratic free set is generated by some <span>(Gamma )</span>. Our main result shows that the corresponding quadratic-free set is full-dimensional and maximal if and only if <span>(Gamma )</span> is non-expansive and satisfies a technical condition. This result yields a broader class of maximal <i>S</i>-free sets than previously known. Our result stems from a new characterization of maximal <i>S</i>-free sets (for general <i>S</i> beyond the quadratic setting) based on sequences that ‘expose’ inequalities defining the <i>S</i>-free set.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"25 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-23DOI: 10.1007/s10107-024-02098-9
Yusuke Kobayashi
In the optimal general factor problem, given a graph (G=(V, E)) and a set (B(v) subseteq {mathbb {Z}}) of integers for each (v in V), we seek for an edge subset F of maximum cardinality subject to (d_F(v) in B(v)) for (v in V), where (d_F(v)) denotes the number of edges in F incident to v. A recent crucial work by Dudycz and Paluch shows that this problem can be solved in polynomial time if each B(v) has no gap of length more than one. While their algorithm is very simple, its correctness proof is quite complicated. In this paper, we formulate the optimal general factor problem as the jump system intersection, and reveal when the algorithm by Dudycz and Paluch can be applied to this abstract form of the problem. By using this abstraction, we give another correctness proof of the algorithm, which is simpler than the original one. We also extend our result to the valuated case.
在最优一般因子问题中,给定一个图(G=(V, E))和一个整数集(B(v) subseteq{mathbb{Z}}),对于每个(v in V)、我们要为(v 在 V 中)寻找一个最大卡片数的边子集 F,其中 (d_F(v))表示 F 中与 v 有关的边的数量。Dudycz 和 Paluch 最近的一项重要工作表明,如果每个 B(v) 的间隙长度不超过 1,那么这个问题可以在多项式时间内解决。虽然他们的算法非常简单,但其正确性证明却相当复杂。在本文中,我们将最优一般因子问题表述为跳跃系统交集,并揭示了 Dudycz 和 Paluch 的算法何时可以应用于该问题的这种抽象形式。通过使用这种抽象形式,我们给出了另一种算法的正确性证明,它比原来的算法更简单。我们还将结果扩展到了估值情况。
{"title":"Optimal general factor problem and jump system intersection","authors":"Yusuke Kobayashi","doi":"10.1007/s10107-024-02098-9","DOIUrl":"https://doi.org/10.1007/s10107-024-02098-9","url":null,"abstract":"<p>In the optimal general factor problem, given a graph <span>(G=(V, E))</span> and a set <span>(B(v) subseteq {mathbb {Z}})</span> of integers for each <span>(v in V)</span>, we seek for an edge subset <i>F</i> of maximum cardinality subject to <span>(d_F(v) in B(v))</span> for <span>(v in V)</span>, where <span>(d_F(v))</span> denotes the number of edges in <i>F</i> incident to <i>v</i>. A recent crucial work by Dudycz and Paluch shows that this problem can be solved in polynomial time if each <i>B</i>(<i>v</i>) has no gap of length more than one. While their algorithm is very simple, its correctness proof is quite complicated. In this paper, we formulate the optimal general factor problem as the jump system intersection, and reveal when the algorithm by Dudycz and Paluch can be applied to this abstract form of the problem. By using this abstraction, we give another correctness proof of the algorithm, which is simpler than the original one. We also extend our result to the valuated case.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"45 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1007/s10107-024-02086-z
Eranda Çela, Bettina Klinz, Stefan Lendl, Gerhard J. Woeginger, Lasse Wulf
An instance of the NP-hard Quadratic Shortest Path Problem (QSPP) is called linearizable iff it is equivalent to an instance of the classic Shortest Path Problem (SPP) on the same input digraph. The linearization problem for the QSPP (LinQSPP) decides whether a given QSPP instance is linearizable and determines the corresponding SPP instance in the positive case. We provide a novel linear time algorithm for the LinQSPP on acyclic digraphs which runs considerably faster than the previously best algorithm. The algorithm is based on a new insight revealing that the linearizability of the QSPP for acyclic digraphs can be seen as a local property. Our approach extends to the more general higher-order shortest path problem.
{"title":"A linear time algorithm for linearizing quadratic and higher-order shortest path problems","authors":"Eranda Çela, Bettina Klinz, Stefan Lendl, Gerhard J. Woeginger, Lasse Wulf","doi":"10.1007/s10107-024-02086-z","DOIUrl":"https://doi.org/10.1007/s10107-024-02086-z","url":null,"abstract":"<p>An instance of the NP-hard Quadratic Shortest Path Problem (QSPP) is called linearizable iff it is equivalent to an instance of the classic Shortest Path Problem (SPP) on the same input digraph. The linearization problem for the QSPP (LinQSPP) decides whether a given QSPP instance is linearizable and determines the corresponding SPP instance in the positive case. We provide a novel linear time algorithm for the LinQSPP on acyclic digraphs which runs considerably faster than the previously best algorithm. The algorithm is based on a new insight revealing that the linearizability of the QSPP for acyclic digraphs can be seen as a local property. Our approach extends to the more general higher-order shortest path problem.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-08DOI: 10.1007/s10107-024-02087-y
Luze Xu, Jon Lee
Mixed-integer nonlinear optimization formulations of the disjunction between the origin and a polytope via a binary indicator variable is broadly used in nonlinear combinatorial optimization for modeling a fixed cost associated with carrying out a group of activities and a convex cost function associated with the levels of the activities. The perspective relaxation of such models is often used to solve to global optimality in a branch-and-bound context, but it typically requires suitable conic solvers and is not compatible with general-purpose NLP software in the presence of other classes of constraints. This motivates the investigation of when simpler but weaker relaxations may be adequate. Comparing the volume (i.e., Lebesgue measure) of the relaxations as a measure of tightness, we lift some of the results related to the simplex case to the box case. In order to compare the volumes of different relaxations in the box case, it is necessary to find an appropriate concave upper bound that preserves the convexity and is minimal, which is more difficult than in the simplex case. To address the challenge beyond the simplex case, the triangulation approach is used.
{"title":"Gaining or losing perspective for convex multivariate functions on box domains","authors":"Luze Xu, Jon Lee","doi":"10.1007/s10107-024-02087-y","DOIUrl":"https://doi.org/10.1007/s10107-024-02087-y","url":null,"abstract":"<p>Mixed-integer nonlinear optimization formulations of the disjunction between the origin and a polytope via a binary indicator variable is broadly used in nonlinear combinatorial optimization for modeling a fixed cost associated with carrying out a group of activities and a convex cost function associated with the levels of the activities. The perspective relaxation of such models is often used to solve to global optimality in a branch-and-bound context, but it typically requires suitable conic solvers and is not compatible with general-purpose NLP software in the presence of other classes of constraints. This motivates the investigation of when simpler but weaker relaxations may be adequate. Comparing the volume (i.e., Lebesgue measure) of the relaxations as a measure of tightness, we lift some of the results related to the simplex case to the box case. In order to compare the volumes of different relaxations in the box case, it is necessary to find an appropriate concave upper bound that preserves the convexity and is minimal, which is more difficult than in the simplex case. To address the challenge beyond the simplex case, the triangulation approach is used.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"28 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s10107-024-02078-z
Billy Jin, Katya Scheinberg, Miaolan Xie
Several classical adaptive optimization algorithms, such as line search and trust-region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic oracles. The requirements on the stochastic oracles are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM (Blanchet et al. in INFORMS J Optim 1(2):92–119, 2019) and SASS (Jin et al. in High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://doi.org/10.48550/ARXIV.2106.06454), in the expected risk minimization problem.
一些经典的自适应优化算法,如直线搜索法和信任区域法,最近已被扩展到随机设置中,在随机设置中,函数值、梯度和某些情况下的赫西亚斯(Hessians)都是通过随机信号来估计的。与大多数随机方法不同的是,这些方法不使用预先指定的步长参数序列,而是根据算法的估计进度调整步长参数,并用它来决定对随机神谕的精度要求。因此,对随机神谕的要求也是自适应的,神谕成本也会随着迭代的不同而变化。这些方法中的步长参数可以根据所感知的进度增大或减小,但与确定性方法不同的是,由于可能出现的神谕失败,步长参数并没有远离零的界限,而且以前也没有推导出步长参数的界限。这给此类方法的总复杂度分析造成了障碍,因为甲骨文成本通常随步长参数递减,当步长参数为 0 时,甲骨文成本可能会任意增大。因此,到目前为止,我们只分析了这些方法的总迭代复杂度。在本文中,我们推导出了步长参数的下限,该下限对于一大类自适应随机方法来说很有可能成立。然后,我们利用这个下限推导出一个框架,用于分析该类方法的预期和高概率总迭代复杂度。最后,我们应用这个框架分析了两种特定算法的总样本复杂度,即 STORM(Blanchet 等人,载于 INFORMS J Optim 1(2):92-119, 2019)和 SASS(Jin 等人,载于 High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://doi.org/10.48550/ARXIV.2106.06454),在预期风险最小化问题中的总样本复杂度。
{"title":"Sample complexity analysis for adaptive optimization algorithms with stochastic oracles","authors":"Billy Jin, Katya Scheinberg, Miaolan Xie","doi":"10.1007/s10107-024-02078-z","DOIUrl":"https://doi.org/10.1007/s10107-024-02078-z","url":null,"abstract":"<p>Several classical adaptive optimization algorithms, such as line search and trust-region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic oracles. The requirements on the stochastic oracles are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM (Blanchet et al. in INFORMS J Optim 1(2):92–119, 2019) and SASS (Jin et al. in High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://doi.org/10.48550/ARXIV.2106.06454), in the expected risk minimization problem.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"161 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1007/s10107-024-02084-1
Daniel Dadush, Friedrich Eisenbrand, Thomas Rothvoss
Approximate integer programming is the following: For a given convex body (K subseteq {mathbb {R}}^n), either determine whether (K cap {mathbb {Z}}^n) is empty, or find an integer point in the convex body (2cdot (K - c) +c) which is K, scaled by 2 from its center of gravity c. Approximate integer programming can be solved in time (2^{O(n)}) while the fastest known methods for exact integer programming run in time (2^{O(n)} cdot n^n). So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point (x^* in (K cap {mathbb {Z}}^n)) can be found in time (2^{O(n)}), provided that the remainders of each component (x_i^* mod ell ) for some arbitrarily fixed (ell ge 5(n+1)) of (x^*) are given. The algorithm is based on a cutting-plane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a (2^{O(n)}n^n) algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (Integer programming, lattice algorithms, and deterministic, vol. Estimation. Georgia Institute of Technology, Atlanta, 2012) that is considerably more involved. Our algorithm also relies on a new asymmetric approximate Carathéodory theorem that might be of interest on its own. Our second method concerns integer programming problems in equation-standard form (Ax = b, 0 le x le u, , x in {mathbb {Z}}^n). Such a problem can be reduced to the solution of (prod _i O(log u_i +1)) approximate integer programming problems. This implies, for example that knapsack or subset-sum problems with polynomial variable range(0 le x_i le p(n)) can be solved in time ((log n)^{O(n)}). For these problems, the best running time so far was (n^n cdot 2^{O(n)}).
近似整数编程如下:对于给定的凸体 (K subseteq {mathbb {R}}^n),要么确定 (K cap {mathbb {Z}}^n) 是否为空,要么在凸体 (2cdot (K - c) +c)中找到一个整数点,该点是 K,从其重心 c 起按比例缩放 2。近似整数编程可以在(2^{O(n)})时间内求解,而已知最快的精确整数编程方法运行时间为(2^{O(n)} cdot n^n)。迄今为止,还没有基于近似整数编程的高效整数编程方法。我们的主要贡献是两个这样的方法,每个方法都产生了新的复杂性结果。首先,我们证明,只要给定 x^* 的某个任意固定的 (ell ge 5(n+1)) 的每个分量 (x_i^* mod ell ) 的余数,就可以在 (2^{O(n)}) 的时间内找到 (K cap {mathbb {Z}}^n) 中的整数点 (x^*) 。该算法基于切割平面技术,迭代地将可行集的体积减半。切割面是通过近似整数编程确定的。对可能余数的枚举给出了一般整数编程的 (2^{O(n)}n^n) 算法。这与达杜什(Dadush,《整数编程、网格算法和确定性》,估算卷,佐治亚理工学院,亚特兰大)提出的算法的当前最佳界限相吻合。佐治亚理工学院,亚特兰大,2012 年),该算法涉及的内容要多得多。我们的算法还依赖于一个新的非对称近似 Carathéodory 定理,它本身可能也很有趣。我们的第二种方法涉及方程标准形式的整数编程问题(Ax = b, 0 le x le u, , x in {mathbb {Z}}^n )。这样的问题可以简化为(prod _i O(log u_i +1)) 近似整数编程问题的求解。例如,这意味着具有多项式变量范围的knapsack或子集和问题可以在((log n)^{O(n)} )时间内求解。对于这些问题,迄今为止最好的运行时间是 (n^n cdot 2^{O(n)}).
{"title":"From approximate to exact integer programming","authors":"Daniel Dadush, Friedrich Eisenbrand, Thomas Rothvoss","doi":"10.1007/s10107-024-02084-1","DOIUrl":"https://doi.org/10.1007/s10107-024-02084-1","url":null,"abstract":"<p>Approximate integer programming is the following: For a given convex body <span>(K subseteq {mathbb {R}}^n)</span>, either determine whether <span>(K cap {mathbb {Z}}^n)</span> is empty, or find an integer point in the convex body <span>(2cdot (K - c) +c)</span> which is <i>K</i>, scaled by 2 from its center of gravity <i>c</i>. Approximate integer programming can be solved in time <span>(2^{O(n)})</span> while the fastest known methods for exact integer programming run in time <span>(2^{O(n)} cdot n^n)</span>. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point <span>(x^* in (K cap {mathbb {Z}}^n))</span> can be found in time <span>(2^{O(n)})</span>, provided that the <i>remainders</i> of each component <span>(x_i^* mod ell )</span> for some arbitrarily fixed <span>(ell ge 5(n+1))</span> of <span>(x^*)</span> are given. The algorithm is based on a <i>cutting-plane technique</i>, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a <span>(2^{O(n)}n^n)</span> algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (Integer programming, lattice algorithms, and deterministic, vol. Estimation. Georgia Institute of Technology, Atlanta, 2012) that is considerably more involved. Our algorithm also relies on a new <i>asymmetric approximate Carathéodory theorem</i> that might be of interest on its own. Our second method concerns integer programming problems in equation-standard form <span>(Ax = b, 0 le x le u, , x in {mathbb {Z}}^n)</span>. Such a problem can be reduced to the solution of <span>(prod _i O(log u_i +1))</span> approximate integer programming problems. This implies, for example that <i>knapsack</i> or <i>subset-sum</i> problems with <i>polynomial variable range</i> <span>(0 le x_i le p(n))</span> can be solved in time <span>((log n)^{O(n)})</span>. For these problems, the best running time so far was <span>(n^n cdot 2^{O(n)})</span>.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"27 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1007/s10107-024-02077-0
Satoru Fujishige, Tomonari Kitahara, László A. Végh
We consider the minimum-norm-point (MNP) problem over polyhedra, a well-studied problem that encompasses linear programming. We present a general algorithmic framework that combines two fundamental approaches for this problem: active set methods and first order methods. Our algorithm performs first order update steps, followed by iterations that aim to ‘stabilize’ the current iterate with additional projections, i.e., find a locally optimal solution whilst keeping the current tight inequalities. Such steps have been previously used in active set methods for the nonnegative least squares (NNLS) problem. We bound on the number of iterations polynomially in the dimension and in the associated circuit imbalance measure. In particular, the algorithm is strongly polynomial for network flow instances. Classical NNLS algorithms such as the Lawson–Hanson algorithm are special instantiations of our framework; as a consequence, we obtain convergence bounds for these algorithms. Our preliminary computational experiments show promising practical performance.
{"title":"An update-and-stabilize framework for the minimum-norm-point problem","authors":"Satoru Fujishige, Tomonari Kitahara, László A. Végh","doi":"10.1007/s10107-024-02077-0","DOIUrl":"https://doi.org/10.1007/s10107-024-02077-0","url":null,"abstract":"<p>We consider the minimum-norm-point (MNP) problem over polyhedra, a well-studied problem that encompasses linear programming. We present a general algorithmic framework that combines two fundamental approaches for this problem: active set methods and first order methods. Our algorithm performs first order update steps, followed by iterations that aim to ‘stabilize’ the current iterate with additional projections, i.e., find a locally optimal solution whilst keeping the current tight inequalities. Such steps have been previously used in active set methods for the nonnegative least squares (NNLS) problem. We bound on the number of iterations polynomially in the dimension and in the associated circuit imbalance measure. In particular, the algorithm is strongly polynomial for network flow instances. Classical NNLS algorithms such as the Lawson–Hanson algorithm are special instantiations of our framework; as a consequence, we obtain convergence bounds for these algorithms. Our preliminary computational experiments show promising practical performance.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"27 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}