Pub Date : 2024-02-03DOI: 10.1007/s10107-023-02055-y
Abstract
The Frank–Wolfe (FW) method, which implements efficient linear oracles that minimize linear approximations of the objective function over a fixed compact convex set, has recently received much attention in the optimization and machine learning literature. In this paper, we propose a new FW-type method for minimizing a smooth function over a compact set defined as the level set of a single difference-of-convex function, based on new generalized linear-optimization oracles (LO). We show that these LOs can be computed efficiently with closed-form solutions in some important optimization models that arise in compressed sensing and machine learning. In addition, under a mild strict feasibility condition, we establish the subsequential convergence of our nonconvex FW-type method. Since the feasible region of our generalized LO typically changes from iteration to iteration, our convergence analysis is completely different from those existing works in the literature on FW-type methods that deal with fixed feasible regions among subproblems. Finally, motivated by the away steps for accelerating FW-type methods for convex problems, we further design an away-step oracle to supplement our nonconvex FW-type method, and establish subsequential convergence of this variant. Numerical results on the matrix completion problem with standard datasets are presented to demonstrate the efficiency of the proposed FW-type method and its away-step variant.
{"title":"Frank–Wolfe-type methods for a class of nonconvex inequality-constrained problems","authors":"","doi":"10.1007/s10107-023-02055-y","DOIUrl":"https://doi.org/10.1007/s10107-023-02055-y","url":null,"abstract":"<h3>Abstract</h3> <p>The Frank–Wolfe (FW) method, which implements efficient linear oracles that minimize linear approximations of the objective function over a <em>fixed</em> compact convex set, has recently received much attention in the optimization and machine learning literature. In this paper, we propose a new FW-type method for minimizing a smooth function over a compact set defined as the level set of a single <em>difference-of-convex</em> function, based on new <em>generalized</em> linear-optimization oracles (LO). We show that these LOs can be computed efficiently with <em>closed-form solutions</em> in some important optimization models that arise in compressed sensing and machine learning. In addition, under a mild strict feasibility condition, we establish the subsequential convergence of our nonconvex FW-type method. Since the feasible region of our generalized LO typically changes from iteration to iteration, our convergence analysis is <em>completely different</em> from those existing works in the literature on FW-type methods that deal with <em>fixed</em> feasible regions among subproblems. Finally, motivated by the away steps for accelerating FW-type methods for convex problems, we further design an <em>away-step oracle</em> to supplement our nonconvex FW-type method, and establish subsequential convergence of this variant. Numerical results on the matrix completion problem with standard datasets are presented to demonstrate the efficiency of the proposed FW-type method and its away-step variant. </p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"10 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-03DOI: 10.1007/s10107-023-02053-0
Simon Thomä, Grit Walther, Maximilian Schiffer
We study piecewise affine policies for multi-stage adjustable robust optimization (ARO) problems with non-negative right-hand side uncertainty. First, we construct new dominating uncertainty sets and show how a multi-stage ARO problem can be solved efficiently with a linear program when uncertainty is replaced by these new sets. We then demonstrate how solutions for this alternative problem can be transformed into solutions for the original problem. By carefully choosing the dominating sets, we prove strong approximation bounds for our policies and extend many previously best-known bounds for the two-staged problem variant to its multi-stage counterpart. Moreover, the new bounds are—to the best of our knowledge—the first bounds shown for the general multi-stage ARO problem considered. We extensively compare our policies to other policies from the literature and prove relative performance guarantees. In two numerical experiments, we identify beneficial and disadvantageous properties for different policies and present effective adjustments to tackle the most critical disadvantages of our policies. Overall, the experiments show that our piecewise affine policies can be computed by orders of magnitude faster than affine policies, while often yielding comparable or even better results.
我们研究了具有非负右侧不确定性的多阶段可调鲁棒优化(ARO)问题的片断仿射策略。首先,我们构建了新的支配性不确定性集,并展示了当不确定性被这些新的不确定性集取代时,如何用线性程序高效地解决多阶段 ARO 问题。然后,我们演示了如何将这一替代问题的解决方案转化为原始问题的解决方案。通过仔细选择支配集,我们证明了我们的策略具有很强的近似边界,并将两阶段问题变体的许多已知边界扩展到了多阶段问题变体。此外,据我们所知,新的界限是首次针对一般多阶段 ARO 问题给出的界限。我们将我们的策略与文献中的其他策略进行了广泛比较,并证明了相对性能保证。在两个数值实验中,我们确定了不同策略的优势和劣势,并提出了有效的调整措施,以解决我们策略中最关键的劣势。总之,实验表明,我们的片断仿射策略的计算速度比仿射策略快几个数量级,同时通常能获得相当甚至更好的结果。
{"title":"Designing tractable piecewise affine policies for multi-stage adjustable robust optimization","authors":"Simon Thomä, Grit Walther, Maximilian Schiffer","doi":"10.1007/s10107-023-02053-0","DOIUrl":"https://doi.org/10.1007/s10107-023-02053-0","url":null,"abstract":"<p>We study piecewise affine policies for multi-stage adjustable robust optimization (ARO) problems with non-negative right-hand side uncertainty. First, we construct new dominating uncertainty sets and show how a multi-stage ARO problem can be solved efficiently with a linear program when uncertainty is replaced by these new sets. We then demonstrate how solutions for this alternative problem can be transformed into solutions for the original problem. By carefully choosing the dominating sets, we prove strong approximation bounds for our policies and extend many previously best-known bounds for the two-staged problem variant to its multi-stage counterpart. Moreover, the new bounds are—to the best of our knowledge—the first bounds shown for the general multi-stage ARO problem considered. We extensively compare our policies to other policies from the literature and prove relative performance guarantees. In two numerical experiments, we identify beneficial and disadvantageous properties for different policies and present effective adjustments to tackle the most critical disadvantages of our policies. Overall, the experiments show that our piecewise affine policies can be computed by orders of magnitude faster than affine policies, while often yielding comparable or even better results.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139677417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In generalized malleable scheduling, jobs can be allocated and processed simultaneously on multiple machines so as to reduce the overall makespan of the schedule. The required processing time for each job is determined by the joint processing speed of the allocated machines. We study the case that processing speeds are job-dependent (M ^{natural })-concave functions and provide a constant-factor approximation for this setting, significantly expanding the realm of functions for which such an approximation is possible. Further, we explore the connection between malleable scheduling and the problem of fairly allocating items to a set of agents with distinct utility functions, devising a black-box reduction that allows to obtain resource-augmented approximation algorithms for the latter.
{"title":"A constant-factor approximation for generalized malleable scheduling under $$M ^{natural }$$ -concave processing speeds","authors":"Dimitris Fotakis, Jannik Matuschke, Orestis Papadigenopoulos","doi":"10.1007/s10107-023-02054-z","DOIUrl":"https://doi.org/10.1007/s10107-023-02054-z","url":null,"abstract":"<p>In generalized malleable scheduling, jobs can be allocated and processed simultaneously on multiple machines so as to reduce the overall makespan of the schedule. The required processing time for each job is determined by the joint processing speed of the allocated machines. We study the case that processing speeds are job-dependent <span>(M ^{natural })</span>-concave functions and provide a constant-factor approximation for this setting, significantly expanding the realm of functions for which such an approximation is possible. Further, we explore the connection between malleable scheduling and the problem of fairly allocating items to a set of agents with distinct utility functions, devising a black-box reduction that allows to obtain resource-augmented approximation algorithms for the latter.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"8 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139644974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-27DOI: 10.1007/s10107-023-02049-w
Abstract
We investigate the concept of adjustability—the difference in objective values between two types of dynamic robust optimization formulations: one where (static) decisions are made before uncertainty realization, and one where uncertainty is resolved before (adjustable) decisions. This difference reflects the value of information and decision timing in optimization under uncertainty, and is related to several other concepts such as the optimality of decision rules in robust optimization. We develop a theoretical framework to quantify adjustability based on the input data of a robust optimization problem with a linear objective, linear constraints, and fixed recourse. We make very few additional assumptions. In particular, we do not assume constraint-wise separability or parameter nonnegativity that are commonly imposed in the literature for the study of adjustability. This allows us to study important but previously under-investigated problems, such as formulations with equality constraints and problems with both upper and lower bound constraints. Based on the discovery of an interesting connection between the reformulations of the static and fully adjustable problems, our analysis gives a necessary and sufficient condition—in the form of a theorem-of-the-alternatives—for adjustability to be zero when the uncertainty set is polyhedral. Based on this sharp characterization, we provide two efficient mixed-integer optimization formulations to verify zero adjustability. Then, we develop a constructive approach to quantify adjustability when the uncertainty set is general, which results in an efficient and tight poly-time algorithm to bound adjustability. We demonstrate the efficiency and tightness via both theoretical and numerical analyses.
{"title":"Adjustability in robust linear optimization","authors":"","doi":"10.1007/s10107-023-02049-w","DOIUrl":"https://doi.org/10.1007/s10107-023-02049-w","url":null,"abstract":"<h3>Abstract</h3> <p>We investigate the concept of adjustability—the difference in objective values between two types of dynamic robust optimization formulations: one where (static) decisions are made before uncertainty realization, and one where uncertainty is resolved before (adjustable) decisions. This difference reflects the value of information and decision timing in optimization under uncertainty, and is related to several other concepts such as the optimality of decision rules in robust optimization. We develop a theoretical framework to quantify adjustability based on the input data of a robust optimization problem with a linear objective, linear constraints, and fixed recourse. We make very few additional assumptions. In particular, we do not assume constraint-wise separability or parameter nonnegativity that are commonly imposed in the literature for the study of adjustability. This allows us to study important but previously under-investigated problems, such as formulations with equality constraints and problems with both upper and lower bound constraints. Based on the discovery of an interesting connection between the reformulations of the static and fully adjustable problems, our analysis gives a necessary and sufficient condition—in the form of a theorem-of-the-alternatives—for adjustability to be zero when the uncertainty set is polyhedral. Based on this sharp characterization, we provide two efficient mixed-integer optimization formulations to verify zero adjustability. Then, we develop a constructive approach to quantify adjustability when the uncertainty set is general, which results in an efficient and tight poly-time algorithm to bound adjustability. We demonstrate the efficiency and tightness via both theoretical and numerical analyses.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"32 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139583987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-20DOI: 10.1007/s10107-023-02047-y
Soroosh Shafiee, Fatma Kılınç-Karzan
Optimization problems involving minimization of a rank-one convex function over constraints modeling restrictions on the support of the decision variables emerge in various machine learning applications. These problems are often modeled with indicator variables for identifying the support of the continuous variables. In this paper we investigate compact extended formulations for such problems through perspective reformulation techniques. In contrast to the majority of previous work that relies on support function arguments and disjunctive programming techniques to provide convex hull results, we propose a constructive approach that exploits a hidden conic structure induced by perspective functions. To this end, we first establish a convex hull result for a general conic mixed-binary set in which each conic constraint involves a linear function of independent continuous variables and a set of binary variables. We then demonstrate that extended representations of sets associated with epigraphs of rank-one convex functions over constraints modeling indicator relations naturally admit such a conic representation. This enables us to systematically give perspective formulations for the convex hull descriptions of these sets with nonlinear separable or non-separable objective functions, sign constraints on continuous variables, and combinatorial constraints on indicator variables. We illustrate the efficacy of our results on sparse nonnegative logistic regression problems.
{"title":"Constrained optimization of rank-one functions with indicator variables","authors":"Soroosh Shafiee, Fatma Kılınç-Karzan","doi":"10.1007/s10107-023-02047-y","DOIUrl":"https://doi.org/10.1007/s10107-023-02047-y","url":null,"abstract":"<p>Optimization problems involving minimization of a rank-one convex function over constraints modeling restrictions on the support of the decision variables emerge in various machine learning applications. These problems are often modeled with indicator variables for identifying the support of the continuous variables. In this paper we investigate compact extended formulations for such problems through perspective reformulation techniques. In contrast to the majority of previous work that relies on support function arguments and disjunctive programming techniques to provide convex hull results, we propose a constructive approach that exploits a hidden conic structure induced by perspective functions. To this end, we first establish a convex hull result for a general conic mixed-binary set in which each conic constraint involves a linear function of independent continuous variables and a set of binary variables. We then demonstrate that extended representations of sets associated with epigraphs of rank-one convex functions over constraints modeling indicator relations naturally admit such a conic representation. This enables us to systematically give perspective formulations for the convex hull descriptions of these sets with nonlinear separable or non-separable objective functions, sign constraints on continuous variables, and combinatorial constraints on indicator variables. We illustrate the efficacy of our results on sparse nonnegative logistic regression problems.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139507788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-20DOI: 10.1007/s10107-023-02048-x
Marcin Briański, Martin Koutecký, Daniel Král’, Kristýna Pekárková, Felix Schröder
An intensive line of research on fixed parameter tractability of integer programming is focused on exploiting the relation between the sparsity of a constraint matrix A and the norm of the elements of its Graver basis. In particular, integer programming is fixed parameter tractable when parameterized by the primal tree-depth and the entry complexity of A, and when parameterized by the dual tree-depth and the entry complexity of A; both these parameterization imply that A is sparse, in particular, the number of its non-zero entries is linear in the number of columns or rows, respectively. We study preconditioners transforming a given matrix to a row-equivalent sparse matrix if it exists and provide structural results characterizing the existence of a sparse row-equivalent matrix in terms of the structural properties of the associated column matroid. In particular, our results imply that the (ell _1)-norm of the Graver basis is bounded by a function of the maximum (ell _1)-norm of a circuit of A. We use our results to design a parameterized algorithm that constructs a matrix row-equivalent to an input matrix A that has small primal/dual tree-depth and entry complexity if such a row-equivalent matrix exists. Our results yield parameterized algorithms for integer programming when parameterized by the (ell _1)-norm of the Graver basis of the constraint matrix, when parameterized by the (ell _1)-norm of the circuits of the constraint matrix, when parameterized by the smallest primal tree-depth and entry complexity of a matrix row-equivalent to the constraint matrix, and when parameterized by the smallest dual tree-depth and entry complexity of a matrix row-equivalent to the constraint matrix.
关于整数编程固定参数可控性的深入研究,主要集中在利用约束矩阵 A 的稀疏性与其格拉弗基元素的规范之间的关系。特别是,当以 A 的原始树深度和输入复杂度为参数时,以及以 A 的对偶树深度和输入复杂度为参数时,整数编程都是固定参数可控的;这两种参数化都意味着 A 是稀疏的,特别是,其非零条目数分别与列数或行数呈线性关系。如果存在将给定矩阵转换为行等效稀疏矩阵的预处理器,我们将对其进行研究,并根据相关列 matroid 的结构特性提供表征稀疏行等效矩阵存在性的结构性结果。特别是,我们的结果意味着格拉弗基的(ell _1)-norm是由A的一个回路的最大(ell _1)-norm的函数限定的。我们利用我们的结果设计了一种参数化算法,如果存在这样一个行等价矩阵,该算法可以构造一个与输入矩阵A行等价的矩阵,该矩阵具有较小的原始/双树深度和入口复杂度。当以约束矩阵的格拉弗基的(ell _1)-正态为参数时,当以约束矩阵的回路的(ell _1)-正态为参数时,当以与约束矩阵行向等价的矩阵的最小原始树深度和入口复杂度为参数时,以及当以与约束矩阵行向等价的矩阵的最小对偶树深度和入口复杂度为参数时,我们的结果产生了整数编程的参数化算法。
{"title":"Characterization of matrices with bounded Graver bases and depth parameters and applications to integer programming","authors":"Marcin Briański, Martin Koutecký, Daniel Král’, Kristýna Pekárková, Felix Schröder","doi":"10.1007/s10107-023-02048-x","DOIUrl":"https://doi.org/10.1007/s10107-023-02048-x","url":null,"abstract":"<p>An intensive line of research on fixed parameter tractability of integer programming is focused on exploiting the relation between the sparsity of a constraint matrix <i>A</i> and the norm of the elements of its Graver basis. In particular, integer programming is fixed parameter tractable when parameterized by the primal tree-depth and the entry complexity of <i>A</i>, and when parameterized by the dual tree-depth and the entry complexity of <i>A</i>; both these parameterization imply that <i>A</i> is sparse, in particular, the number of its non-zero entries is linear in the number of columns or rows, respectively. We study preconditioners transforming a given matrix to a row-equivalent sparse matrix if it exists and provide structural results characterizing the existence of a sparse row-equivalent matrix in terms of the structural properties of the associated column matroid. In particular, our results imply that the <span>(ell _1)</span>-norm of the Graver basis is bounded by a function of the maximum <span>(ell _1)</span>-norm of a circuit of <i>A</i>. We use our results to design a parameterized algorithm that constructs a matrix row-equivalent to an input matrix <i>A</i> that has small primal/dual tree-depth and entry complexity if such a row-equivalent matrix exists. Our results yield parameterized algorithms for integer programming when parameterized by the <span>(ell _1)</span>-norm of the Graver basis of the constraint matrix, when parameterized by the <span>(ell _1)</span>-norm of the circuits of the constraint matrix, when parameterized by the smallest primal tree-depth and entry complexity of a matrix row-equivalent to the constraint matrix, and when parameterized by the smallest dual tree-depth and entry complexity of a matrix row-equivalent to the constraint matrix.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"29 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139507778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-10DOI: 10.1007/s10107-023-02045-0
Benjamin Moseley, Kirk Pruhs, Clifford Stein, Rudy Zhou
This paper considers the basic problem of scheduling jobs online with preemption to maximize the number of jobs completed by their deadline on m identical machines. The main result is an O(1) competitive deterministic algorithm for any number of machines (m >1).
本文研究了一个基本问题,即通过抢占式在线作业调度,在 m 台相同机器上最大限度地提高在截止日期前完成作业的数量。主要结果是针对任意机器数量(m >1)的 O(1)竞争确定性算法。
{"title":"A competitive algorithm for throughput maximization on identical machines","authors":"Benjamin Moseley, Kirk Pruhs, Clifford Stein, Rudy Zhou","doi":"10.1007/s10107-023-02045-0","DOIUrl":"https://doi.org/10.1007/s10107-023-02045-0","url":null,"abstract":"<p>This paper considers the basic problem of scheduling jobs online with preemption to maximize the number of jobs completed by their deadline on <i>m</i> identical machines. The main result is an <i>O</i>(1) competitive deterministic algorithm for any number of machines <span>(m >1)</span>.\u0000</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139421255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-08DOI: 10.1007/s10107-023-02042-3
Abstract
This note provides a counterexample to a theorem announced in the last part of the paper (Vicente and Custódio Math Program 133:299–325, 2012). The counterexample involves an objective function (f: mathbb {R}rightarrow mathbb {R}) which satisfies all the assumptions required by the theorem but contradicts some of its conclusions. A corollary of this theorem is also affected by this counterexample. The main flaw revealed by the counterexample is the possibility that a directional direct search method (dDSM) generates a sequence of trial points ((x_k)_{k in mathbb {N}}) converging to a point (x_*) where f is discontinuous, lower semicontinuous and whose objective function value (f(x_*)) is strictly less than (lim _{krightarrow infty } f(x_k)). Moreover the dDSM generates trial points in only one of the continuity sets of f near (x_*). This note also investigates the proof of the theorem to highlight the inexact statements in the original paper. Finally this work introduces a modification of the dDSM that allows, in usual cases, to recover the properties broken by the counterexample.
摘要 本注释提供了论文最后一部分(Vicente and Custódio Math Program 133:299-325, 2012)中公布的一个定理的反例。该反例涉及一个目标函数(f: mathbb {R}rightarrow mathbb {R}/),它满足定理所要求的所有假设,但与定理的某些结论相矛盾。该定理的一个推论也受到了这个反例的影响。这个反例揭示的主要缺陷是定向直接搜索法(dDSM)有可能产生一连串的试验点 ((x_k)_{k in mathbb {N}}) 收敛到 f 不连续的点(x_*)、并且其目标函数值 (f(x_*))严格小于 (f(x_k))。此外,dDSM 只在在(x_*)附近的 f 的连续性集合中的一个集合中产生试验点。本注释还研究了定理的证明,以突出原论文中不精确的陈述。最后,本文介绍了对 dDSM 的修改,在通常情况下,它可以恢复被反例破坏的性质。
{"title":"Counterexample and an additional revealing poll step for a result of “analysis of direct searches for discontinuous functions”","authors":"","doi":"10.1007/s10107-023-02042-3","DOIUrl":"https://doi.org/10.1007/s10107-023-02042-3","url":null,"abstract":"<h3>Abstract</h3> <p>This note provides a counterexample to a theorem announced in the last part of the paper (Vicente and Custódio Math Program 133:299–325, 2012). The counterexample involves an objective function <span> <span>(f: mathbb {R}rightarrow mathbb {R})</span> </span> which satisfies all the assumptions required by the theorem but contradicts some of its conclusions. A corollary of this theorem is also affected by this counterexample. The main flaw revealed by the counterexample is the possibility that a directional direct search method (dDSM) generates a sequence of trial points <span> <span>((x_k)_{k in mathbb {N}})</span> </span> converging to a point <span> <span>(x_*)</span> </span> where <em>f</em> is discontinuous, lower semicontinuous and whose objective function value <span> <span>(f(x_*))</span> </span> is strictly less than <span> <span>(lim _{krightarrow infty } f(x_k))</span> </span>. Moreover the dDSM generates trial points in only one of the continuity sets of <em>f</em> near <span> <span>(x_*)</span> </span>. This note also investigates the proof of the theorem to highlight the inexact statements in the original paper. Finally this work introduces a modification of the dDSM that allows, in usual cases, to recover the properties broken by the counterexample. </p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139410419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-06DOI: 10.1007/s10107-023-02031-6
Lai Tian, Anthony Man-Cho So
We consider the oracle complexity of computing an approximate stationary point of a Lipschitz function. When the function is smooth, it is well known that the simple deterministic gradient method has finite dimension-free oracle complexity. However, when the function can be nonsmooth, it is only recently that a randomized algorithm with finite dimension-free oracle complexity has been developed. In this paper, we show that no deterministic algorithm can do the same. Moreover, even without the dimension-free requirement, we show that any finite-time deterministic method cannot be general zero-respecting. In particular, this implies that a natural derandomization of the aforementioned randomized algorithm cannot have finite-time complexity. Our results reveal a fundamental hurdle in modern large-scale nonconvex nonsmooth optimization.
{"title":"No dimension-free deterministic algorithm computes approximate stationarities of Lipschitzians","authors":"Lai Tian, Anthony Man-Cho So","doi":"10.1007/s10107-023-02031-6","DOIUrl":"https://doi.org/10.1007/s10107-023-02031-6","url":null,"abstract":"<p>We consider the oracle complexity of computing an approximate stationary point of a Lipschitz function. When the function is smooth, it is well known that the simple deterministic gradient method has finite dimension-free oracle complexity. However, when the function can be nonsmooth, it is only recently that a randomized algorithm with finite dimension-free oracle complexity has been developed. In this paper, we show that no deterministic algorithm can do the same. Moreover, even without the dimension-free requirement, we show that any finite-time deterministic method cannot be general zero-respecting. In particular, this implies that a natural derandomization of the aforementioned randomized algorithm cannot have finite-time complexity. Our results reveal a fundamental hurdle in modern large-scale nonconvex nonsmooth optimization.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"89 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-05DOI: 10.1007/s10107-023-02040-5
Jelena Diakonikolas, Cristóbal Guzmán
Composite minimization is a powerful framework in large-scale convex optimization, based on decoupling of the objective function into terms with structurally different properties and allowing for more flexible algorithmic design. We introduce a new algorithmic framework for complementary composite minimization, where the objective function decouples into a (weakly) smooth and a uniformly convex term. This particular form of decoupling is pervasive in statistics and machine learning, due to its link to regularization. The main contributions of our work are summarized as follows. First, we introduce the problem of complementary composite minimization in general normed spaces; second, we provide a unified accelerated algorithmic framework to address broad classes of complementary composite minimization problems; and third, we prove that the algorithms resulting from our framework are near-optimal in most of the standard optimization settings. Additionally, we show that our algorithmic framework can be used to address the problem of making the gradients small in general normed spaces. As a concrete example, we obtain a nearly-optimal method for the standard (ell _1) setup (small gradients in the (ell _infty ) norm), essentially matching the bound of Nesterov (Optima Math Optim Soc Newsl 88:10–11, 2012) that was previously known only for the Euclidean setup. Finally, we show that our composite methods are broadly applicable to a number of regression and other classes of optimization problems, where regularization plays a key role. Our methods lead to complexity bounds that are either new or match the best existing ones.
{"title":"Complementary composite minimization, small gradients in general norms, and applications","authors":"Jelena Diakonikolas, Cristóbal Guzmán","doi":"10.1007/s10107-023-02040-5","DOIUrl":"https://doi.org/10.1007/s10107-023-02040-5","url":null,"abstract":"<p>Composite minimization is a powerful framework in large-scale convex optimization, based on decoupling of the objective function into terms with structurally different properties and allowing for more flexible algorithmic design. We introduce a new algorithmic framework for <i>complementary composite minimization</i>, where the objective function decouples into a (weakly) smooth and a uniformly convex term. This particular form of decoupling is pervasive in statistics and machine learning, due to its link to regularization. The main contributions of our work are summarized as follows. First, we introduce the problem of complementary composite minimization in general normed spaces; second, we provide a unified accelerated algorithmic framework to address broad classes of complementary composite minimization problems; and third, we prove that the algorithms resulting from our framework are near-optimal in most of the standard optimization settings. Additionally, we show that our algorithmic framework can be used to address the problem of making the gradients small in general normed spaces. As a concrete example, we obtain a nearly-optimal method for the standard <span>(ell _1)</span> setup (small gradients in the <span>(ell _infty )</span> norm), essentially matching the bound of Nesterov (Optima Math Optim Soc Newsl 88:10–11, 2012) that was previously known only for the Euclidean setup. Finally, we show that our composite methods are broadly applicable to a number of regression and other classes of optimization problems, where regularization plays a key role. Our methods lead to complexity bounds that are either new or match the best existing ones.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"20 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}