Pub Date : 2024-08-27DOI: 10.1007/s10107-024-02138-4
Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li
We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the “sub-optimality” of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines—stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)—which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.
{"title":"Accelerated stochastic approximation with state-dependent noise","authors":"Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li","doi":"10.1007/s10107-024-02138-4","DOIUrl":"https://doi.org/10.1007/s10107-024-02138-4","url":null,"abstract":"<p>We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the “sub-optimality” of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines—stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)—which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"62 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s10107-024-02133-9
Noah Weninger, Ricardo Fukasawa
We consider the bilevel knapsack problem with interdiction constraints, a fundamental bilevel integer programming problem which generalizes the 0–1 knapsack problem. In this problem, there are two knapsacks and n items. The objective is to select some items to pack into the first knapsack such that the maximum profit attainable from packing some of the remaining items into the second knapsack is minimized. We present a combinatorial branch-and-bound algorithm which outperforms the current state-of-the-art solution method in computational experiments for 99% of the instances reported in the literature. On many of the harder instances, our algorithm is orders of magnitude faster, which enabled it to solve 53 of the 72 previously unsolved instances. Our result relies fundamentally on a new dynamic programming algorithm which computes very strong lower bounds. This dynamic program solves a relaxation of the problem from bilevel to 2n-level where the items are processed in an online fashion. The relaxation is easier to solve but approximates the original problem surprisingly well in practice. We believe that this same technique may be useful for other interdiction problems.
{"title":"A fast combinatorial algorithm for the bilevel knapsack problem with interdiction constraints","authors":"Noah Weninger, Ricardo Fukasawa","doi":"10.1007/s10107-024-02133-9","DOIUrl":"https://doi.org/10.1007/s10107-024-02133-9","url":null,"abstract":"<p>We consider the bilevel knapsack problem with interdiction constraints, a fundamental bilevel integer programming problem which generalizes the 0–1 knapsack problem. In this problem, there are two knapsacks and <i>n</i> items. The objective is to select some items to pack into the first knapsack such that the maximum profit attainable from packing some of the remaining items into the second knapsack is minimized. We present a combinatorial branch-and-bound algorithm which outperforms the current state-of-the-art solution method in computational experiments for 99% of the instances reported in the literature. On many of the harder instances, our algorithm is orders of magnitude faster, which enabled it to solve 53 of the 72 previously unsolved instances. Our result relies fundamentally on a new dynamic programming algorithm which computes very strong lower bounds. This dynamic program solves a relaxation of the problem from bilevel to 2<i>n</i>-level where the items are processed in an online fashion. The relaxation is easier to solve but approximates the original problem surprisingly well in practice. We believe that this same technique may be useful for other interdiction problems.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"2 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s10107-024-02127-7
Shuvomoy Das Gupta, Robert M. Freund, Xu Andy Sun, Adrien Taylor
We propose a computer-assisted approach to the analysis of the worst-case convergence of nonlinear conjugate gradient methods (NCGMs). Those methods are known for their generally good empirical performances for large-scale optimization, while having relatively incomplete analyses. Using our computer-assisted approach, we establish novel complexity bounds for the Polak-Ribière-Polyak (PRP) and the Fletcher-Reeves (FR) NCGMs for smooth strongly convex minimization. In particular, we construct mathematical proofs that establish the first non-asymptotic convergence bound for FR (which is historically the first developed NCGM), and a much improved non-asymptotic convergence bound for PRP. Additionally, we provide simple adversarial examples on which these methods do not perform better than gradient descent with exact line search, leaving very little room for improvements on the same class of problems.
{"title":"Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses","authors":"Shuvomoy Das Gupta, Robert M. Freund, Xu Andy Sun, Adrien Taylor","doi":"10.1007/s10107-024-02127-7","DOIUrl":"https://doi.org/10.1007/s10107-024-02127-7","url":null,"abstract":"<p>We propose a computer-assisted approach to the analysis of the worst-case convergence of nonlinear conjugate gradient methods (NCGMs). Those methods are known for their generally good empirical performances for large-scale optimization, while having relatively incomplete analyses. Using our computer-assisted approach, we establish novel complexity bounds for the Polak-Ribière-Polyak (PRP) and the Fletcher-Reeves (FR) NCGMs for smooth strongly convex minimization. In particular, we construct mathematical proofs that establish the first non-asymptotic convergence bound for FR (which is historically the first developed NCGM), and a much improved non-asymptotic convergence bound for PRP. Additionally, we provide simple adversarial examples on which these methods do not perform better than gradient descent with exact line search, leaving very little room for improvements on the same class of problems.\u0000</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"32 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s10107-024-02130-y
Lara Scavuzzo, Karen Aardal, Andrea Lodi, Neil Yorke-Smith
Mixed Integer Linear Programming (MILP) is a pillar of mathematical optimization that offers a powerful modeling language for a wide range of applications. The main engine for solving MILPs is the branch-and-bound algorithm. Adding to the enormous algorithmic progress in MILP solving of the past decades, in more recent years there has been an explosive development in the use of machine learning for enhancing all main tasks involved in the branch-and-bound algorithm. These include primal heuristics, branching, cutting planes, node selection and solver configuration decisions. This article presents a survey of such approaches, addressing the vision of integration of machine learning and mathematical optimization as complementary technologies, and how this integration can benefit MILP solving. In particular, we give detailed attention to machine learning algorithms that automatically optimize some metric of branch-and-bound efficiency. We also address appropriate MILP representations, benchmarks and software tools used in the context of applying learning algorithms.
{"title":"Machine learning augmented branch and bound for mixed integer linear programming","authors":"Lara Scavuzzo, Karen Aardal, Andrea Lodi, Neil Yorke-Smith","doi":"10.1007/s10107-024-02130-y","DOIUrl":"https://doi.org/10.1007/s10107-024-02130-y","url":null,"abstract":"<p>Mixed Integer Linear Programming (MILP) is a pillar of mathematical optimization that offers a powerful modeling language for a wide range of applications. The main engine for solving MILPs is the branch-and-bound algorithm. Adding to the enormous algorithmic progress in MILP solving of the past decades, in more recent years there has been an explosive development in the use of machine learning for enhancing all main tasks involved in the branch-and-bound algorithm. These include primal heuristics, branching, cutting planes, node selection and solver configuration decisions. This article presents a survey of such approaches, addressing the vision of integration of machine learning and mathematical optimization as complementary technologies, and how this integration can benefit MILP solving. In particular, we give detailed attention to machine learning algorithms that automatically optimize some metric of branch-and-bound efficiency. We also address appropriate MILP representations, benchmarks and software tools used in the context of applying learning algorithms.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"6 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s10107-024-02121-z
Matthew Brun, Tyler Perini, Saumya Sinha, Andrew J. Schaefer
This paper investigates the potential of Lagrangian relaxations to generate quality bounds on non-dominated images of multiobjective integer programs (MOIPs). Under some conditions on the relaxed constraints, we show that a set of Lagrangian relaxations can provide bounds that coincide with every bound generated by the convex hull relaxation. We also provide a guarantee of the relative quality of the Lagrangian bound at unsupported solutions. These results imply that, if the relaxed feasible region is bounded, some Lagrangian bounds will be strictly better than some convex hull bounds. We demonstrate that there exist Lagrangian multipliers which are sparse, satisfy a complementary slackness property, and generate tight relaxations at supported solutions. However, if all constraints are dualized, a relaxation can never be tight at an unsupported solution. These results characterize the strength of the Lagrangian dual at efficient solutions of an MOIP.
{"title":"On the strength of Lagrangian duality in multiobjective integer programming","authors":"Matthew Brun, Tyler Perini, Saumya Sinha, Andrew J. Schaefer","doi":"10.1007/s10107-024-02121-z","DOIUrl":"https://doi.org/10.1007/s10107-024-02121-z","url":null,"abstract":"<p>This paper investigates the potential of Lagrangian relaxations to generate quality bounds on non-dominated images of multiobjective integer programs (MOIPs). Under some conditions on the relaxed constraints, we show that a set of Lagrangian relaxations can provide bounds that coincide with every bound generated by the convex hull relaxation. We also provide a guarantee of the relative quality of the Lagrangian bound at unsupported solutions. These results imply that, if the relaxed feasible region is bounded, some Lagrangian bounds will be strictly better than some convex hull bounds. We demonstrate that there exist Lagrangian multipliers which are sparse, satisfy a complementary slackness property, and generate tight relaxations at supported solutions. However, if all constraints are dualized, a relaxation can never be tight at an unsupported solution. These results characterize the strength of the Lagrangian dual at efficient solutions of an MOIP.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"23 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-16DOI: 10.1007/s10107-024-02131-x
Taotao He, Siyue Liu, Mohit Tawarmalani
This paper develops a correspondence relating convex hulls of fractional functions with those of polynomial functions over the same domain. Using this result, we develop a number of new reformulations and relaxations for fractional programming problems. First, we relate (0mathord {-}1) problems involving a ratio of affine functions with the boolean quadric polytope, and use inequalities for the latter to develop tighter formulations for the former. Second, we derive a new formulation to optimize a ratio of quadratic functions over a polytope using copositive programming. Third, we show that univariate fractional functions can be convexified using moment hulls. Fourth, we develop a new hierarchy of relaxations that converges finitely to the simultaneous convex hull of a collection of ratios of affine functions of (0mathord {-}1) variables. Finally, we demonstrate theoretically and computationally that our techniques close a significant gap relative to state-of-the-art relaxations, require much less computational effort, and can solve larger problem instances.
{"title":"Convexification techniques for fractional programs","authors":"Taotao He, Siyue Liu, Mohit Tawarmalani","doi":"10.1007/s10107-024-02131-x","DOIUrl":"https://doi.org/10.1007/s10107-024-02131-x","url":null,"abstract":"<p>This paper develops a correspondence relating convex hulls of fractional functions with those of polynomial functions over the same domain. Using this result, we develop a number of new reformulations and relaxations for fractional programming problems. First, we relate <span>(0mathord {-}1)</span> problems involving a ratio of affine functions with the boolean quadric polytope, and use inequalities for the latter to develop tighter formulations for the former. Second, we derive a new formulation to optimize a ratio of quadratic functions over a polytope using copositive programming. Third, we show that univariate fractional functions can be convexified using moment hulls. Fourth, we develop a new hierarchy of relaxations that converges finitely to the simultaneous convex hull of a collection of ratios of affine functions of <span>(0mathord {-}1)</span> variables. Finally, we demonstrate theoretically and computationally that our techniques close a significant gap relative to state-of-the-art relaxations, require much less computational effort, and can solve larger problem instances.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"33 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142183167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s10107-024-02116-w
Edin Husić, Zhuan Khye Koh, Georg Loho, László A. Végh
A set function can be extended to the unit cube in various ways; the correlation gap measures the ratio between two natural extensions. This quantity has been identified as the performance guarantee in a range of approximation algorithms and mechanism design settings. It is known that the correlation gap of a monotone submodular function is at least (1-1/e), and this is tight for simple matroid rank functions. We initiate a fine-grained study of the correlation gap of matroid rank functions. In particular, we present an improved lower bound on the correlation gap as parametrized by the rank and girth of the matroid. We also show that for any matroid, the correlation gap of its weighted rank function is minimized under uniform weights. Such improved lower bounds have direct applications for submodular maximization under matroid constraints, mechanism design, and contention resolution schemes.
{"title":"On the correlation gap of matroids","authors":"Edin Husić, Zhuan Khye Koh, Georg Loho, László A. Végh","doi":"10.1007/s10107-024-02116-w","DOIUrl":"https://doi.org/10.1007/s10107-024-02116-w","url":null,"abstract":"<p>A set function can be extended to the unit cube in various ways; the correlation gap measures the ratio between two natural extensions. This quantity has been identified as the performance guarantee in a range of approximation algorithms and mechanism design settings. It is known that the correlation gap of a monotone submodular function is at least <span>(1-1/e)</span>, and this is tight for simple matroid rank functions. We initiate a fine-grained study of the correlation gap of matroid rank functions. In particular, we present an improved lower bound on the correlation gap as parametrized by the rank and girth of the matroid. We also show that for any matroid, the correlation gap of its weighted rank function is minimized under uniform weights. Such improved lower bounds have direct applications for submodular maximization under matroid constraints, mechanism design, and contention resolution schemes.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"24 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s10107-024-02132-w
Franziska Eberle, Anupam Gupta, Nicole Megow, Benjamin Moseley, Rudy Zhou
The configuration balancing problem with stochastic requests generalizes well-studied resource allocation problems such as load balancing and virtual circuit routing. There are given m resources and n requests; each request has multiple possible configurations, each of which increases the load of each resource by some amount. The goal is to select one configuration for each request to minimize the makespan: the load of the most-loaded resource. In the stochastic setting, the amount by which a configuration increases the resource load is uncertain until the configuration is chosen, but we are given a probability distribution. We develop both offline and online algorithms for configuration balancing with stochastic requests. When the requests are known offline, we give a non-adaptive policy for configuration balancing with stochastic requests that (O(frac{log m}{log log m}))-approximates the optimal adaptive policy, which matches a known lower bound for the special case of load balancing on identical machines. When requests arrive online in a list, we give a non-adaptive policy that is (O(log m)) competitive. Again, this result is asymptotically tight due to information-theoretic lower bounds for special cases (e.g., for load balancing on unrelated machines). Finally, we show how to leverage adaptivity in the special case of load balancing on related machines to obtain a constant-factor approximation offline and an (O(log log m))-approximation online. A crucial technical ingredient in all of our results is a new structural characterization of the optimal adaptive policy that allows us to limit the correlations between its decisions.
随机请求的配置平衡问题概括了负载平衡和虚拟电路路由等已被充分研究的资源分配问题。给定 m 个资源和 n 个请求;每个请求都有多个可能的配置,每个配置都会使每个资源的负载增加一定量。我们的目标是为每个请求选择一种配置,以最小化跨度(makespan),即负载最大的资源的负载。在随机设置中,在选择配置之前,配置增加资源负载的数量是不确定的,但我们得到了一个概率分布。我们为随机请求的配置平衡开发了离线和在线算法。当离线请求已知时,我们给出了一种非自适应的随机请求配置平衡策略,该策略(O(frac{log m}{log log m})接近最优自适应策略,与已知的相同机器负载平衡特例下限相匹配。当请求以列表形式在线到达时,我们给出的非自适应策略具有 (O(log m))竞争力。同样,由于特殊情况(如在不相关机器上的负载均衡)的信息论下限,这一结果在渐近上是紧密的。最后,我们展示了如何在相关机器上的负载均衡这种特殊情况下利用适应性来获得离线恒因子近似和在线(O(loglog m))近似。我们所有结果中的一个关键技术要素是最优自适应策略的新结构特征,它允许我们限制其决策之间的相关性。
{"title":"Configuration balancing for stochastic requests","authors":"Franziska Eberle, Anupam Gupta, Nicole Megow, Benjamin Moseley, Rudy Zhou","doi":"10.1007/s10107-024-02132-w","DOIUrl":"https://doi.org/10.1007/s10107-024-02132-w","url":null,"abstract":"<p>The configuration balancing problem with stochastic requests generalizes well-studied resource allocation problems such as load balancing and virtual circuit routing. There are given <i>m</i> resources and <i>n</i> requests; each request has multiple possible <i>configurations</i>, each of which increases the load of each resource by some amount. The goal is to select one configuration for each request to minimize the <i>makespan</i>: the load of the most-loaded resource. In the stochastic setting, the amount by which a configuration increases the resource load is uncertain until the configuration is chosen, but we are given a probability distribution. We develop both offline and online algorithms for configuration balancing with stochastic requests. When the requests are known offline, we give a non-adaptive policy for configuration balancing with stochastic requests that <span>(O(frac{log m}{log log m}))</span>-approximates the optimal adaptive policy, which matches a known lower bound for the special case of load balancing on identical machines. When requests arrive online in a list, we give a non-adaptive policy that is <span>(O(log m))</span> competitive. Again, this result is asymptotically tight due to information-theoretic lower bounds for special cases (e.g., for load balancing on unrelated machines). Finally, we show how to leverage adaptivity in the special case of load balancing on <i>related</i> machines to obtain a constant-factor approximation offline and an <span>(O(log log m))</span>-approximation online. A crucial technical ingredient in all of our results is a new structural characterization of the optimal adaptive policy that allows us to limit the correlations between its decisions.\u0000</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"30 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s10107-024-02123-x
Wei Bian, Xiaojun Chen
In this paper, we focus on a class of convexly constrained nonsmooth convex–concave saddle point problems with cardinality penalties. Although such nonsmooth nonconvex–nonconcave and discontinuous min–max problems may not have a saddle point, we show that they have a local saddle point and a global minimax point, and some local saddle points have the lower bound properties. We define a class of strong local saddle points based on the lower bound properties for stability of variable selection. Moreover, we give a framework to construct continuous relaxations of the discontinuous min–max problems based on convolution, such that they have the same saddle points with the original problem. We also establish the relations between the continuous relaxation problems and the original problems regarding local saddle points, global minimax points, local minimax points and stationary points. Finally, we illustrate our results with distributionally robust sparse convex regression, sparse robust bond portfolio construction and sparse convex–concave logistic regression saddle point problems.
{"title":"Nonsmooth convex–concave saddle point problems with cardinality penalties","authors":"Wei Bian, Xiaojun Chen","doi":"10.1007/s10107-024-02123-x","DOIUrl":"https://doi.org/10.1007/s10107-024-02123-x","url":null,"abstract":"<p>In this paper, we focus on a class of convexly constrained nonsmooth convex–concave saddle point problems with cardinality penalties. Although such nonsmooth nonconvex–nonconcave and discontinuous min–max problems may not have a saddle point, we show that they have a local saddle point and a global minimax point, and some local saddle points have the lower bound properties. We define a class of strong local saddle points based on the lower bound properties for stability of variable selection. Moreover, we give a framework to construct continuous relaxations of the discontinuous min–max problems based on convolution, such that they have the same saddle points with the original problem. We also establish the relations between the continuous relaxation problems and the original problems regarding local saddle points, global minimax points, local minimax points and stationary points. Finally, we illustrate our results with distributionally robust sparse convex regression, sparse robust bond portfolio construction and sparse convex–concave logistic regression saddle point problems.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"85 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s10107-024-02113-z
Jan Harold Alcantara, Chieu Thanh Nguyen, Takayuki Okuno, Akiko Takeda, Jein-Shan Chen
Strongly motivated from applications in various fields including machine learning, the methodology of sparse optimization has been developed intensively so far. Especially, the advancement of algorithms for solving problems with nonsmooth regularizers has been remarkable. However, those algorithms suppose that weight parameters of regularizers, called hyperparameters hereafter, are pre-fixed, but it is a crucial matter how the best hyperparameter should be selected. In this paper, we focus on the hyperparameter selection of regularizers related to the (ell _p) function with (0<ple 1) and apply a bilevel programming strategy, wherein we need to solve a bilevel problem, whose lower-level problem is nonsmooth, possibly nonconvex and non-Lipschitz. Recently, for solving a bilevel problem for hyperparameter selection of the pure (ell _p (0<p le 1)) regularizer Okuno et al. discovered new necessary optimality conditions, called SB(scaled bilevel)-KKT conditions, and further proposed a smoothing-type algorithm using a specific smoothing function. However, this optimality measure is loose in the sense that there could be many points that satisfy the SB-KKT conditions. In this work, we propose new bilevel KKT conditions, which are new necessary optimality conditions tighter than the ones proposed by Okuno et al. Moreover, we propose a unified smoothing approach using smoothing functions that belong to the Chen-Mangasarian class, and then prove that generated iteration points accumulate at bilevel KKT points under milder constraint qualifications. Another contribution is that our approach and analysis are applicable to a wider class of regularizers. Numerical comparisons demonstrate which smoothing functions work well for hyperparameter optimization via bilevel optimization approach.
{"title":"Unified smoothing approach for best hyperparameter selection problem using a bilevel optimization strategy","authors":"Jan Harold Alcantara, Chieu Thanh Nguyen, Takayuki Okuno, Akiko Takeda, Jein-Shan Chen","doi":"10.1007/s10107-024-02113-z","DOIUrl":"https://doi.org/10.1007/s10107-024-02113-z","url":null,"abstract":"<p>Strongly motivated from applications in various fields including machine learning, the methodology of sparse optimization has been developed intensively so far. Especially, the advancement of algorithms for solving problems with nonsmooth regularizers has been remarkable. However, those algorithms suppose that weight parameters of regularizers, called hyperparameters hereafter, are pre-fixed, but it is a crucial matter how the best hyperparameter should be selected. In this paper, we focus on the hyperparameter selection of regularizers related to the <span>(ell _p)</span> function with <span>(0<ple 1)</span> and apply a bilevel programming strategy, wherein we need to solve a bilevel problem, whose lower-level problem is nonsmooth, possibly nonconvex and non-Lipschitz. Recently, for solving a bilevel problem for hyperparameter selection of the pure <span>(ell _p (0<p le 1))</span> regularizer Okuno et al. discovered new necessary optimality conditions, called SB(scaled bilevel)-KKT conditions, and further proposed a smoothing-type algorithm using a specific smoothing function. However, this optimality measure is loose in the sense that there could be many points that satisfy the SB-KKT conditions. In this work, we propose new bilevel KKT conditions, which are new necessary optimality conditions tighter than the ones proposed by Okuno et al. Moreover, we propose a unified smoothing approach using smoothing functions that belong to the Chen-Mangasarian class, and then prove that generated iteration points accumulate at bilevel KKT points under milder constraint qualifications. Another contribution is that our approach and analysis are applicable to a wider class of regularizers. Numerical comparisons demonstrate which smoothing functions work well for hyperparameter optimization via bilevel optimization approach.</p>","PeriodicalId":18297,"journal":{"name":"Mathematical Programming","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141933212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}