Pub Date : 2024-05-08DOI: 10.1007/s10107-024-02087-y
Luze Xu, Jon Lee
Mixed-integer nonlinear optimization formulations of the disjunction between the origin and a polytope via a binary indicator variable is broadly used in nonlinear combinatorial optimization for modeling a fixed cost associated with carrying out a group of activities and a convex cost function associated with the levels of the activities. The perspective relaxation of such models is often used to solve to global optimality in a branch-and-bound context, but it typically requires suitable conic solvers and is not compatible with general-purpose NLP software in the presence of other classes of constraints. This motivates the investigation of when simpler but weaker relaxations may be adequate. Comparing the volume (i.e., Lebesgue measure) of the relaxations as a measure of tightness, we lift some of the results related to the simplex case to the box case. In order to compare the volumes of different relaxations in the box case, it is necessary to find an appropriate concave upper bound that preserves the convexity and is minimal, which is more difficult than in the simplex case. To address the challenge beyond the simplex case, the triangulation approach is used.
{"title":"Gaining or losing perspective for convex multivariate functions on box domains","authors":"Luze Xu, Jon Lee","doi":"10.1007/s10107-024-02087-y","DOIUrl":"https://doi.org/10.1007/s10107-024-02087-y","url":null,"abstract":"<p>Mixed-integer nonlinear optimization formulations of the disjunction between the origin and a polytope via a binary indicator variable is broadly used in nonlinear combinatorial optimization for modeling a fixed cost associated with carrying out a group of activities and a convex cost function associated with the levels of the activities. The perspective relaxation of such models is often used to solve to global optimality in a branch-and-bound context, but it typically requires suitable conic solvers and is not compatible with general-purpose NLP software in the presence of other classes of constraints. This motivates the investigation of when simpler but weaker relaxations may be adequate. Comparing the volume (i.e., Lebesgue measure) of the relaxations as a measure of tightness, we lift some of the results related to the simplex case to the box case. In order to compare the volumes of different relaxations in the box case, it is necessary to find an appropriate concave upper bound that preserves the convexity and is minimal, which is more difficult than in the simplex case. To address the challenge beyond the simplex case, the triangulation approach is used.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"28 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140941064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s10107-024-02078-z
Billy Jin, Katya Scheinberg, Miaolan Xie
Several classical adaptive optimization algorithms, such as line search and trust-region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic oracles. The requirements on the stochastic oracles are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM (Blanchet et al. in INFORMS J Optim 1(2):92–119, 2019) and SASS (Jin et al. in High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://doi.org/10.48550/ARXIV.2106.06454), in the expected risk minimization problem.
一些经典的自适应优化算法,如直线搜索法和信任区域法,最近已被扩展到随机设置中,在随机设置中,函数值、梯度和某些情况下的赫西亚斯(Hessians)都是通过随机信号来估计的。与大多数随机方法不同的是,这些方法不使用预先指定的步长参数序列,而是根据算法的估计进度调整步长参数,并用它来决定对随机神谕的精度要求。因此,对随机神谕的要求也是自适应的,神谕成本也会随着迭代的不同而变化。这些方法中的步长参数可以根据所感知的进度增大或减小,但与确定性方法不同的是,由于可能出现的神谕失败,步长参数并没有远离零的界限,而且以前也没有推导出步长参数的界限。这给此类方法的总复杂度分析造成了障碍,因为甲骨文成本通常随步长参数递减,当步长参数为 0 时,甲骨文成本可能会任意增大。因此,到目前为止,我们只分析了这些方法的总迭代复杂度。在本文中,我们推导出了步长参数的下限,该下限对于一大类自适应随机方法来说很有可能成立。然后,我们利用这个下限推导出一个框架,用于分析该类方法的预期和高概率总迭代复杂度。最后,我们应用这个框架分析了两种特定算法的总样本复杂度,即 STORM(Blanchet 等人,载于 INFORMS J Optim 1(2):92-119, 2019)和 SASS(Jin 等人,载于 High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://doi.org/10.48550/ARXIV.2106.06454),在预期风险最小化问题中的总样本复杂度。
{"title":"Sample complexity analysis for adaptive optimization algorithms with stochastic oracles","authors":"Billy Jin, Katya Scheinberg, Miaolan Xie","doi":"10.1007/s10107-024-02078-z","DOIUrl":"https://doi.org/10.1007/s10107-024-02078-z","url":null,"abstract":"<p>Several classical adaptive optimization algorithms, such as line search and trust-region methods, have been recently extended to stochastic settings where function values, gradients, and Hessians in some cases, are estimated via stochastic oracles. Unlike the majority of stochastic methods, these methods do not use a pre-specified sequence of step size parameters, but adapt the step size parameter according to the estimated progress of the algorithm and use it to dictate the accuracy required from the stochastic oracles. The requirements on the stochastic oracles are, thus, also adaptive and the oracle costs can vary from iteration to iteration. The step size parameters in these methods can increase and decrease based on the perceived progress, but unlike the deterministic case they are not bounded away from zero due to possible oracle failures, and bounds on the step size parameter have not been previously derived. This creates obstacles in the total complexity analysis of such methods, because the oracle costs are typically decreasing in the step size parameter, and could be arbitrarily large as the step size parameter goes to 0. Thus, until now only the total iteration complexity of these methods has been analyzed. In this paper, we derive a lower bound on the step size parameter that holds with high probability for a large class of adaptive stochastic methods. We then use this lower bound to derive a framework for analyzing the expected and high probability total oracle complexity of any method in this class. Finally, we apply this framework to analyze the total sample complexity of two particular algorithms, STORM (Blanchet et al. in INFORMS J Optim 1(2):92–119, 2019) and SASS (Jin et al. in High probability complexity bounds for adaptive step search based on stochastic oracles, 2021. https://doi.org/10.48550/ARXIV.2106.06454), in the expected risk minimization problem.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"161 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-24DOI: 10.1007/s10107-024-02084-1
Daniel Dadush, Friedrich Eisenbrand, Thomas Rothvoss
Approximate integer programming is the following: For a given convex body (K subseteq {mathbb {R}}^n), either determine whether (K cap {mathbb {Z}}^n) is empty, or find an integer point in the convex body (2cdot (K - c) +c) which is K, scaled by 2 from its center of gravity c. Approximate integer programming can be solved in time (2^{O(n)}) while the fastest known methods for exact integer programming run in time (2^{O(n)} cdot n^n). So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point (x^* in (K cap {mathbb {Z}}^n)) can be found in time (2^{O(n)}), provided that the remainders of each component (x_i^* mod ell ) for some arbitrarily fixed (ell ge 5(n+1)) of (x^*) are given. The algorithm is based on a cutting-plane technique, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a (2^{O(n)}n^n) algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (Integer programming, lattice algorithms, and deterministic, vol. Estimation. Georgia Institute of Technology, Atlanta, 2012) that is considerably more involved. Our algorithm also relies on a new asymmetric approximate Carathéodory theorem that might be of interest on its own. Our second method concerns integer programming problems in equation-standard form (Ax = b, 0 le x le u, , x in {mathbb {Z}}^n). Such a problem can be reduced to the solution of (prod _i O(log u_i +1)) approximate integer programming problems. This implies, for example that knapsack or subset-sum problems with polynomial variable range(0 le x_i le p(n)) can be solved in time ((log n)^{O(n)}). For these problems, the best running time so far was (n^n cdot 2^{O(n)}).
近似整数编程如下:对于给定的凸体 (K subseteq {mathbb {R}}^n),要么确定 (K cap {mathbb {Z}}^n) 是否为空,要么在凸体 (2cdot (K - c) +c)中找到一个整数点,该点是 K,从其重心 c 起按比例缩放 2。近似整数编程可以在(2^{O(n)})时间内求解,而已知最快的精确整数编程方法运行时间为(2^{O(n)} cdot n^n)。迄今为止,还没有基于近似整数编程的高效整数编程方法。我们的主要贡献是两个这样的方法,每个方法都产生了新的复杂性结果。首先,我们证明,只要给定 x^* 的某个任意固定的 (ell ge 5(n+1)) 的每个分量 (x_i^* mod ell ) 的余数,就可以在 (2^{O(n)}) 的时间内找到 (K cap {mathbb {Z}}^n) 中的整数点 (x^*) 。该算法基于切割平面技术,迭代地将可行集的体积减半。切割面是通过近似整数编程确定的。对可能余数的枚举给出了一般整数编程的 (2^{O(n)}n^n) 算法。这与达杜什(Dadush,《整数编程、网格算法和确定性》,估算卷,佐治亚理工学院,亚特兰大)提出的算法的当前最佳界限相吻合。佐治亚理工学院,亚特兰大,2012 年),该算法涉及的内容要多得多。我们的算法还依赖于一个新的非对称近似 Carathéodory 定理,它本身可能也很有趣。我们的第二种方法涉及方程标准形式的整数编程问题(Ax = b, 0 le x le u, , x in {mathbb {Z}}^n )。这样的问题可以简化为(prod _i O(log u_i +1)) 近似整数编程问题的求解。例如,这意味着具有多项式变量范围的knapsack或子集和问题可以在((log n)^{O(n)} )时间内求解。对于这些问题,迄今为止最好的运行时间是 (n^n cdot 2^{O(n)}).
{"title":"From approximate to exact integer programming","authors":"Daniel Dadush, Friedrich Eisenbrand, Thomas Rothvoss","doi":"10.1007/s10107-024-02084-1","DOIUrl":"https://doi.org/10.1007/s10107-024-02084-1","url":null,"abstract":"<p>Approximate integer programming is the following: For a given convex body <span>(K subseteq {mathbb {R}}^n)</span>, either determine whether <span>(K cap {mathbb {Z}}^n)</span> is empty, or find an integer point in the convex body <span>(2cdot (K - c) +c)</span> which is <i>K</i>, scaled by 2 from its center of gravity <i>c</i>. Approximate integer programming can be solved in time <span>(2^{O(n)})</span> while the fastest known methods for exact integer programming run in time <span>(2^{O(n)} cdot n^n)</span>. So far, there are no efficient methods for integer programming known that are based on approximate integer programming. Our main contribution are two such methods, each yielding novel complexity results. First, we show that an integer point <span>(x^* in (K cap {mathbb {Z}}^n))</span> can be found in time <span>(2^{O(n)})</span>, provided that the <i>remainders</i> of each component <span>(x_i^* mod ell )</span> for some arbitrarily fixed <span>(ell ge 5(n+1))</span> of <span>(x^*)</span> are given. The algorithm is based on a <i>cutting-plane technique</i>, iteratively halving the volume of the feasible set. The cutting planes are determined via approximate integer programming. Enumeration of the possible remainders gives a <span>(2^{O(n)}n^n)</span> algorithm for general integer programming. This matches the current best bound of an algorithm by Dadush (Integer programming, lattice algorithms, and deterministic, vol. Estimation. Georgia Institute of Technology, Atlanta, 2012) that is considerably more involved. Our algorithm also relies on a new <i>asymmetric approximate Carathéodory theorem</i> that might be of interest on its own. Our second method concerns integer programming problems in equation-standard form <span>(Ax = b, 0 le x le u, , x in {mathbb {Z}}^n)</span>. Such a problem can be reduced to the solution of <span>(prod _i O(log u_i +1))</span> approximate integer programming problems. This implies, for example that <i>knapsack</i> or <i>subset-sum</i> problems with <i>polynomial variable range</i> <span>(0 le x_i le p(n))</span> can be solved in time <span>((log n)^{O(n)})</span>. For these problems, the best running time so far was <span>(n^n cdot 2^{O(n)})</span>.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"27 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1007/s10107-024-02077-0
Satoru Fujishige, Tomonari Kitahara, László A. Végh
We consider the minimum-norm-point (MNP) problem over polyhedra, a well-studied problem that encompasses linear programming. We present a general algorithmic framework that combines two fundamental approaches for this problem: active set methods and first order methods. Our algorithm performs first order update steps, followed by iterations that aim to ‘stabilize’ the current iterate with additional projections, i.e., find a locally optimal solution whilst keeping the current tight inequalities. Such steps have been previously used in active set methods for the nonnegative least squares (NNLS) problem. We bound on the number of iterations polynomially in the dimension and in the associated circuit imbalance measure. In particular, the algorithm is strongly polynomial for network flow instances. Classical NNLS algorithms such as the Lawson–Hanson algorithm are special instantiations of our framework; as a consequence, we obtain convergence bounds for these algorithms. Our preliminary computational experiments show promising practical performance.
{"title":"An update-and-stabilize framework for the minimum-norm-point problem","authors":"Satoru Fujishige, Tomonari Kitahara, László A. Végh","doi":"10.1007/s10107-024-02077-0","DOIUrl":"https://doi.org/10.1007/s10107-024-02077-0","url":null,"abstract":"<p>We consider the minimum-norm-point (MNP) problem over polyhedra, a well-studied problem that encompasses linear programming. We present a general algorithmic framework that combines two fundamental approaches for this problem: active set methods and first order methods. Our algorithm performs first order update steps, followed by iterations that aim to ‘stabilize’ the current iterate with additional projections, i.e., find a locally optimal solution whilst keeping the current tight inequalities. Such steps have been previously used in active set methods for the nonnegative least squares (NNLS) problem. We bound on the number of iterations polynomially in the dimension and in the associated circuit imbalance measure. In particular, the algorithm is strongly polynomial for network flow instances. Classical NNLS algorithms such as the Lawson–Hanson algorithm are special instantiations of our framework; as a consequence, we obtain convergence bounds for these algorithms. Our preliminary computational experiments show promising practical performance.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"27 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-09DOI: 10.1007/s10107-024-02082-3
Sebastian Lämmel, Vladimir Shikhman
We extend the convergence analysis of the Scholtes-type regularization method for cardinality-constrained optimization problems. Its behavior is clarified in the vicinity of saddle points, and not just of minimizers as it has been done in the literature before. This becomes possible by using as an intermediate step the recently introduced regularized continuous reformulation of a cardinality-constrained optimization problem. We show that the Scholtes-type regularization method is well-defined locally around a nondegenerate T-stationary point of this regularized continuous reformulation. Moreover, the nondegenerate Karush–Kuhn–Tucker points of the corresponding Scholtes-type regularization converge to a T-stationary point having the same index, i.e. its topological type persists. As consequence, we conclude that the global structure of the Scholtes-type regularization essentially coincides with that of CCOP.
{"title":"Extended convergence analysis of the Scholtes-type regularization for cardinality-constrained optimization problems","authors":"Sebastian Lämmel, Vladimir Shikhman","doi":"10.1007/s10107-024-02082-3","DOIUrl":"https://doi.org/10.1007/s10107-024-02082-3","url":null,"abstract":"<p>We extend the convergence analysis of the Scholtes-type regularization method for cardinality-constrained optimization problems. Its behavior is clarified in the vicinity of saddle points, and not just of minimizers as it has been done in the literature before. This becomes possible by using as an intermediate step the recently introduced regularized continuous reformulation of a cardinality-constrained optimization problem. We show that the Scholtes-type regularization method is well-defined locally around a nondegenerate T-stationary point of this regularized continuous reformulation. Moreover, the nondegenerate Karush–Kuhn–Tucker points of the corresponding Scholtes-type regularization converge to a T-stationary point having the same index, i.e. its topological type persists. As consequence, we conclude that the global structure of the Scholtes-type regularization essentially coincides with that of CCOP.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"48 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-06DOI: 10.1007/s10107-024-02080-5
Gonzalo Muñoz, Joseph Paat, Álinson S. Xavier
A branch-and-bound (BB) tree certifies a dual bound on the value of an integer program. In this work, we introduce the tree compression problem (TCP): Given a BB treeTthat certifies a dual bound, can we obtain a smaller tree with the same (or stronger) bound by either (1) applying a different disjunction at some node inTor (2) removing leaves fromT? We believe such post-hoc analysis of BB trees may assist in identifying helpful general disjunctions in BB algorithms. We initiate our study by considering computational complexity and limitations of TCP. We then conduct experiments to evaluate the compressibility of realistic branch-and-bound trees generated by commonly-used branching strategies, using both an exact and a heuristic compression algorithm.
分支约束(BB)树证明了整数程序值的双重约束。在这项工作中,我们引入了树压缩问题(TCP):给定一棵证明了对偶约束的分支约束树 T,我们能否通过(1)在 T 中的某个节点应用不同的析取或(2)从 T 中删除叶子,得到一棵具有相同(或更强)约束的更小的树?我们相信,这种对 BB 树的事后分析可能有助于识别 BB 算法中有用的通用析取。我们的研究首先考虑了 TCP 的计算复杂性和局限性。然后,我们使用精确压缩算法和启发式压缩算法进行实验,以评估由常用分支策略生成的现实分支约束树的可压缩性。
{"title":"Compressing branch-and-bound trees","authors":"Gonzalo Muñoz, Joseph Paat, Álinson S. Xavier","doi":"10.1007/s10107-024-02080-5","DOIUrl":"https://doi.org/10.1007/s10107-024-02080-5","url":null,"abstract":"<p>A branch-and-bound (BB) tree certifies a dual bound on the value of an integer program. In this work, we introduce the tree compression problem (TCP): <i>Given a BB tree</i> <i>T</i> <i>that certifies a dual bound, can we obtain a smaller tree with the same (or stronger) bound by either (1) applying a different disjunction at some node in</i> <i>T</i> <i>or (2) removing leaves from</i> <i>T</i>? We believe such post-hoc analysis of BB trees may assist in identifying helpful general disjunctions in BB algorithms. We initiate our study by considering computational complexity and limitations of TCP. We then conduct experiments to evaluate the compressibility of realistic branch-and-bound trees generated by commonly-used branching strategies, using both an exact and a heuristic compression algorithm.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"32 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140595792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-04DOI: 10.1007/s10107-024-02081-4
Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach
We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum. Given n samples, the computational cost is (O(n^{3.5})) in time, (O(n^2)) in space, and we achieve a convergence rate to the global optimum that is (O(n^{-m/d + 1/2 + 3/d})) where m is the degree of differentiability of the function and d the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions with many derivatives. Indeed, when m is in the order of d, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).
我们只考虑基于函数求值的平滑函数全局最小化问题。在给定精度水平下,实现最佳函数求值次数的算法通常依赖于显式构建函数近似值,然后用运行时间复杂度呈指数级的算法将其最小化。在本文中,我们考虑了一种方法,即联合建立函数近似模型并找到全局最小值。这种方法通过使用平方平滑函数的无限和来实现,并与多项式平方和层次结构有着密切联系。利用重现核希尔伯特空间的最新表示特性,可以通过子采样在函数求值次数为多项式的时间内求解无穷维优化问题,并从理论上保证求得最小值。给定 n 个样本,计算成本在时间上是(O(n^{3.5})),在空间上是(O(n^2)),我们达到全局最优的收敛速率是(O(n^{-m/d + 1/2 + 3/d})) 其中 m 是函数的可微分程度,d 是维数。在 Sobolev 函数的情况下,这个比率几乎是最优的,而且在更广泛的情况下,所提出的方法特别适用于具有许多导数的函数。事实上,当 m 在 d 的数量级时,向全局最优的收敛率不会受到维数诅咒的影响,维数诅咒只影响最坏情况下的常数(我们在论文中明确跟踪了这些常数)。
{"title":"Finding global minima via kernel approximations","authors":"Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach","doi":"10.1007/s10107-024-02081-4","DOIUrl":"https://doi.org/10.1007/s10107-024-02081-4","url":null,"abstract":"<p>We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum. This is done by using infinite sums of square smooth functions and has strong links with polynomial sum-of-squares hierarchies. Leveraging recent representation properties of reproducing kernel Hilbert spaces, the infinite-dimensional optimization problem can be solved by subsampling in time polynomial in the number of function evaluations, and with theoretical guarantees on the obtained minimum. Given <i>n</i> samples, the computational cost is <span>(O(n^{3.5}))</span> in time, <span>(O(n^2))</span> in space, and we achieve a convergence rate to the global optimum that is <span>(O(n^{-m/d + 1/2 + 3/d}))</span> where <i>m</i> is the degree of differentiability of the function and <i>d</i> the number of dimensions. The rate is nearly optimal in the case of Sobolev functions and more generally makes the proposed method particularly suitable for functions with many derivatives. Indeed, when <i>m</i> is in the order of <i>d</i>, the convergence rate to the global optimum does not suffer from the curse of dimensionality, which affects only the worst-case constants (that we track explicitly through the paper).\u0000</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"2 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140595779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-02DOI: 10.1007/s10107-024-02083-2
Abstract
Risk measures are commonly used to capture the risk preferences of decision-makers (DMs). The decisions of DMs can be nudged or manipulated when their risk preferences are influenced by factors such as the availability of information about the uncertainties. This work proposes a Stackelberg risk preference design (STRIPE) problem to capture a designer’s incentive to influence DMs’ risk preferences. STRIPE consists of two levels. In the lower level, individual DMs in a population, known as the followers, respond to uncertainties according to their risk preference types. In the upper level, the leader influences the distribution of the types to induce targeted decisions and steers the follower’s preferences to it. Our analysis centers around the solution concept of approximate Stackelberg equilibrium that yields suboptimal behaviors of the players. We show the existence of the approximate Stackelberg equilibrium. The primitive risk perception gap, defined as the Wasserstein distance between the original and the target type distributions, is important in estimating the optimal design cost. We connect the leader’s optimality compromise on the cost with her ambiguity tolerance on the follower’s approximate solutions leveraging Lipschitzian properties of the lower level solution mapping. To obtain the Stackelberg equilibrium, we reformulate STRIPE into a single-level optimization problem using the spectral representations of law-invariant coherent risk measures. We create a data-driven approach for computation and study its performance guarantees. We apply STRIPE to contract design problems under approximate incentive compatibility. Moreover, we connect STRIPE with meta-learning problems and derive adaptation performance estimates of the meta-parameters.
{"title":"Stackelberg risk preference design","authors":"","doi":"10.1007/s10107-024-02083-2","DOIUrl":"https://doi.org/10.1007/s10107-024-02083-2","url":null,"abstract":"<h3>Abstract</h3> <p>Risk measures are commonly used to capture the risk preferences of decision-makers (DMs). The decisions of DMs can be nudged or manipulated when their risk preferences are influenced by factors such as the availability of information about the uncertainties. This work proposes a Stackelberg risk preference design (STRIPE) problem to capture a designer’s incentive to influence DMs’ risk preferences. STRIPE consists of two levels. In the lower level, individual DMs in a population, known as the followers, respond to uncertainties according to their risk preference types. In the upper level, the leader influences the distribution of the types to induce targeted decisions and steers the follower’s preferences to it. Our analysis centers around the solution concept of approximate Stackelberg equilibrium that yields suboptimal behaviors of the players. We show the existence of the approximate Stackelberg equilibrium. The primitive risk perception gap, defined as the Wasserstein distance between the original and the target type distributions, is important in estimating the optimal design cost. We connect the leader’s optimality compromise on the cost with her ambiguity tolerance on the follower’s approximate solutions leveraging Lipschitzian properties of the lower level solution mapping. To obtain the Stackelberg equilibrium, we reformulate STRIPE into a single-level optimization problem using the spectral representations of law-invariant coherent risk measures. We create a data-driven approach for computation and study its performance guarantees. We apply STRIPE to contract design problems under approximate incentive compatibility. Moreover, we connect STRIPE with meta-learning problems and derive adaptation performance estimates of the meta-parameters. </p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"48 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mental illnesses are the leading cause of disease burden among children and young people (CYP) globally. Low- and middle-income countries (LMIC) are disproportionately affected. Enhancing mental health literacy (MHL) is one way to combat low levels of help-seeking and effective treatment receipt. We aimed to synthesis evidence about knowledge, beliefs and attitudes of CYP in LMICs about mental illnesses, their treatments and outcomes, evaluating factors that can enhance or impede help-seeking to inform context-specific and developmentally appropriate understandings of MHL. Eight bibliographic databases were searched from inception to July 2020: PsycInfo, EMBASE, Medline (OVID), Scopus, ASSIA (ProQuest), SSCI, SCI (Web of Science) CINAHL PLUS, Social Sciences full text (EBSCO). 58 papers (41 quantitative, 13 qualitative, 4 mixed methods) representing 52 separate studies comprising 36,429 participants with a mean age of 15.3 [10.4-17.4], were appraised and synthesized using narrative synthesis methods. Low levels of recognition and knowledge about mental health problems and illnesses, pervasive levels of stigma and low confidence in professional healthcare services, even when considered a valid treatment option were dominant themes. CYP cited the value of traditional healers and social networks for seeking help. Several important areas were under-researched including the link between specific stigma types and active help-seeking and research is needed to understand more fully the interplay between knowledge, beliefs and attitudes across varied cultural settings. Greater exploration of social networks and the value of collaboration with traditional healers is consistent with promising, yet understudied, areas of community-based MHL interventions combining education and social contact.
{"title":"Mental health literacy in children and adolescents in low- and middle-income countries: a mixed studies systematic review and narrative synthesis.","authors":"Laoise Renwick, Rebecca Pedley, Isobel Johnson, Vicky Bell, Karina Lovell, Penny Bee, Helen Brooks","doi":"10.1007/s00787-022-01997-6","DOIUrl":"10.1007/s00787-022-01997-6","url":null,"abstract":"<p><p>Mental illnesses are the leading cause of disease burden among children and young people (CYP) globally. Low- and middle-income countries (LMIC) are disproportionately affected. Enhancing mental health literacy (MHL) is one way to combat low levels of help-seeking and effective treatment receipt. We aimed to synthesis evidence about knowledge, beliefs and attitudes of CYP in LMICs about mental illnesses, their treatments and outcomes, evaluating factors that can enhance or impede help-seeking to inform context-specific and developmentally appropriate understandings of MHL. Eight bibliographic databases were searched from inception to July 2020: PsycInfo, EMBASE, Medline (OVID), Scopus, ASSIA (ProQuest), SSCI, SCI (Web of Science) CINAHL PLUS, Social Sciences full text (EBSCO). 58 papers (41 quantitative, 13 qualitative, 4 mixed methods) representing 52 separate studies comprising 36,429 participants with a mean age of 15.3 [10.4-17.4], were appraised and synthesized using narrative synthesis methods. Low levels of recognition and knowledge about mental health problems and illnesses, pervasive levels of stigma and low confidence in professional healthcare services, even when considered a valid treatment option were dominant themes. CYP cited the value of traditional healers and social networks for seeking help. Several important areas were under-researched including the link between specific stigma types and active help-seeking and research is needed to understand more fully the interplay between knowledge, beliefs and attitudes across varied cultural settings. Greater exploration of social networks and the value of collaboration with traditional healers is consistent with promising, yet understudied, areas of community-based MHL interventions combining education and social contact.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"13 1","pages":"961-985"},"PeriodicalIF":6.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11032284/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73281082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-25DOI: 10.1007/s10107-024-02071-6
Abstract
This work derives upper bounds on the convergence rate of the moment-sum-of-squares hierarchy with correlative sparsity for global minimization of polynomials on compact basic semialgebraic sets. The main conclusion is that both sparse hierarchies based on the Schmüdgen and Putinar Positivstellensätze enjoy a polynomial rate of convergence that depends on the size of the largest clique in the sparsity graph but not on the ambient dimension. Interestingly, the sparse bounds outperform the best currently available bounds for the dense hierarchy when the maximum clique size is sufficiently small compared to the ambient dimension and the performance is measured by the running time of an interior point method required to obtain a bound on the global minimum of a given accuracy.
{"title":"Convergence rates for sums-of-squares hierarchies with correlative sparsity","authors":"","doi":"10.1007/s10107-024-02071-6","DOIUrl":"https://doi.org/10.1007/s10107-024-02071-6","url":null,"abstract":"<h3>Abstract</h3> <p>This work derives upper bounds on the convergence rate of the moment-sum-of-squares hierarchy with correlative sparsity for global minimization of polynomials on compact basic semialgebraic sets. The main conclusion is that both sparse hierarchies based on the Schmüdgen and Putinar Positivstellensätze enjoy a polynomial rate of convergence that depends on the size of the largest clique in the sparsity graph but not on the ambient dimension. Interestingly, the sparse bounds outperform the best currently available bounds for the dense hierarchy when the maximum clique size is sufficiently small compared to the ambient dimension and the performance is measured by the running time of an interior point method required to obtain a bound on the global minimum of a given accuracy.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"90 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}