首页 > 最新文献

SIAM Journal on Optimization最新文献

英文 中文
Provably Faster Gradient Descent via Long Steps 通过长步骤实现更快的梯度下降
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-18 DOI: 10.1137/23m1588408
Benjamin Grimmer
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2588-2608, September 2024.
Abstract. This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster [math] rate for gradient descent is also motivated along with simple numerical validation.
SIAM 优化期刊》,第 34 卷第 3 期,第 2588-2608 页,2024 年 9 月。 摘要本文通过计算机辅助分析技术为平滑凸优化中的梯度下降建立了新的收敛保证。我们的理论通过一次性分析多次迭代的整体效果,而不是大多数一阶方法分析中使用的典型的一次迭代归纳,允许非恒定步长的策略频繁出现可能违反下降的长步长。我们证明,长步长可能会在短期内增加目标值,但却能在长期内加快收敛速度。我们还提出了梯度下降法更快[数学]收敛速度的猜想,并进行了简单的数值验证。
{"title":"Provably Faster Gradient Descent via Long Steps","authors":"Benjamin Grimmer","doi":"10.1137/23m1588408","DOIUrl":"https://doi.org/10.1137/23m1588408","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2588-2608, September 2024. <br/> Abstract. This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster [math] rate for gradient descent is also motivated along with simple numerical validation.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Gradient Algorithm with Dry-like Friction and Nonmonotone Line Search for Nonconvex Optimization Problems 针对非凸优化问题的干摩擦和非单调线性搜索快速梯度算法
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-17 DOI: 10.1137/22m1532354
Lien T. Nguyen, Andrew Eberhard, Xinghuo Yu, Chaojie Li
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2557-2587, September 2024.
Abstract. In this paper, we propose a fast gradient algorithm for the problem of minimizing a differentiable (possibly nonconvex) function in Hilbert spaces. We first extend the dry friction property for convex functions to what we call the dry-like friction property in a nonconvex setting, and then employ a line search technique to adaptively update parameters at each iteration. Depending on the choice of parameters, the proposed algorithm exhibits subsequential convergence to a critical point or full sequential convergence to an “approximate” critical point of the objective function. We also establish the full sequential convergence to a critical point under the Kurdyka–Łojasiewicz (KL) property of a merit function. Thanks to the parameters’ flexibility, our algorithm can reduce to a number of existing inertial gradient algorithms with Hessian damping and dry friction. By exploiting variational properties of the Moreau envelope, the proposed algorithm is adapted to address weakly convex nonsmooth optimization problems. In particular, we extend the result on KL exponent for the Moreau envelope of a convex KL function to a broad class of KL functions that are not necessarily convex nor continuous. Simulation results illustrate the efficiency of our algorithm and demonstrate the potential advantages of combining dry-like friction with extrapolation and line search techniques.
SIAM 优化期刊》,第 34 卷第 3 期,第 2557-2587 页,2024 年 9 月。 摘要本文针对在希尔伯特空间中最小化可微(可能是非凸)函数的问题提出了一种快速梯度算法。我们首先将凸函数的干摩擦特性扩展为非凸环境下的类干摩擦特性,然后采用直线搜索技术在每次迭代时自适应地更新参数。根据参数的选择,所提出的算法表现出向临界点的后续收敛或向目标函数 "近似 "临界点的完全顺序收敛。我们还根据绩函数的 Kurdyka-Łojasiewicz (KL) 特性建立了向临界点的全序列收敛。得益于参数的灵活性,我们的算法可以简化为许多现有的带有黑森阻尼和干摩擦的惯性梯度算法。通过利用莫罗包络的变分特性,我们提出的算法可用于解决弱凸非光滑优化问题。特别是,我们将凸 KL 函数莫劳包络的 KL 指数结果扩展到了不一定是凸的也不一定是连续的一大类 KL 函数。仿真结果表明了我们算法的效率,并证明了类干摩擦与外推法和直线搜索技术相结合的潜在优势。
{"title":"Fast Gradient Algorithm with Dry-like Friction and Nonmonotone Line Search for Nonconvex Optimization Problems","authors":"Lien T. Nguyen, Andrew Eberhard, Xinghuo Yu, Chaojie Li","doi":"10.1137/22m1532354","DOIUrl":"https://doi.org/10.1137/22m1532354","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2557-2587, September 2024. <br/> Abstract. In this paper, we propose a fast gradient algorithm for the problem of minimizing a differentiable (possibly nonconvex) function in Hilbert spaces. We first extend the dry friction property for convex functions to what we call the dry-like friction property in a nonconvex setting, and then employ a line search technique to adaptively update parameters at each iteration. Depending on the choice of parameters, the proposed algorithm exhibits subsequential convergence to a critical point or full sequential convergence to an “approximate” critical point of the objective function. We also establish the full sequential convergence to a critical point under the Kurdyka–Łojasiewicz (KL) property of a merit function. Thanks to the parameters’ flexibility, our algorithm can reduce to a number of existing inertial gradient algorithms with Hessian damping and dry friction. By exploiting variational properties of the Moreau envelope, the proposed algorithm is adapted to address weakly convex nonsmooth optimization problems. In particular, we extend the result on KL exponent for the Moreau envelope of a convex KL function to a broad class of KL functions that are not necessarily convex nor continuous. Simulation results illustrate the efficiency of our algorithm and demonstrate the potential advantages of combining dry-like friction with extrapolation and line search techniques.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141719096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Finitely Convergent Circumcenter Method for the Convex Feasibility Problem 凸可行性问题的有限收敛圆心法
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-15 DOI: 10.1137/23m1595412
Roger Behling, Yunier Bello-Cruz, Alfredo N. Iusem, Di Liu, Luiz-Rafael Santos
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2535-2556, September 2024.
Abstract. In this paper, we present a variant of the circumcenter method for the convex feasibility problem (CFP), ensuring finite convergence under a Slater assumption. The method replaces exact projections onto the convex sets with projections onto separating half-spaces, perturbed by positive exogenous parameters that decrease to zero along the iterations. If the perturbation parameters decrease slowly enough, such as the terms of a diverging series, finite convergence is achieved. To the best of our knowledge, this is the first circumcenter method for CFP that guarantees finite convergence.
SIAM 优化期刊》,第 34 卷第 3 期,第 2535-2556 页,2024 年 9 月。 摘要本文提出了凸可行性问题(CFP)圆周中心法的一种变体,在 Slater 假设下确保有限收敛。该方法用在分离半空间上的投影代替在凸集上的精确投影。如果扰动参数的下降速度足够慢,例如发散级数的项,就会实现有限收敛。据我们所知,这是第一种能保证有限收敛的 CFP 圆心方法。
{"title":"A Finitely Convergent Circumcenter Method for the Convex Feasibility Problem","authors":"Roger Behling, Yunier Bello-Cruz, Alfredo N. Iusem, Di Liu, Luiz-Rafael Santos","doi":"10.1137/23m1595412","DOIUrl":"https://doi.org/10.1137/23m1595412","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2535-2556, September 2024. <br/> Abstract. In this paper, we present a variant of the circumcenter method for the convex feasibility problem (CFP), ensuring finite convergence under a Slater assumption. The method replaces exact projections onto the convex sets with projections onto separating half-spaces, perturbed by positive exogenous parameters that decrease to zero along the iterations. If the perturbation parameters decrease slowly enough, such as the terms of a diverging series, finite convergence is achieved. To the best of our knowledge, this is the first circumcenter method for CFP that guarantees finite convergence.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141719097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Taylor-Approximated Gradients to Improve the Frank–Wolfe Method for Empirical Risk Minimization 使用泰勒近似梯度改进经验风险最小化的弗兰克-沃尔夫方法
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-11 DOI: 10.1137/22m1519286
Zikai Xiong, Robert M. Freund
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2503-2534, September 2024.
Abstract. The Frank–Wolfe method has become increasingly useful in statistical and machine learning applications due to the structure-inducing properties of the iterates and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of empirical risk minimization—one of the fundamental optimization problems in statistical and machine learning—the computational effectiveness of Frank–Wolfe methods typically grows linearly in the number of data observations [math]. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on [math], we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example), and we propose amending the Frank–Wolfe method with Taylor series–approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance [math] is sufficiently small, our methods are able to simultaneously reduce the dependence on large [math] while obtaining optimal convergence rates of Frank–Wolfe methods in both convex and nonconvex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Finally, we present computational experiments which show that our methods exhibit very significant speedups over existing methods on real-world datasets for both convex and nonconvex binary classification problems.
SIAM 优化期刊》,第 34 卷第 3 期,第 2503-2534 页,2024 年 9 月。 摘要由于迭代的结构诱导特性,特别是在可行集上的线性最小化比投影更有效计算的情况下,Frank-Wolfe 方法在统计和机器学习应用中变得越来越有用。在经验风险最小化--统计和机器学习领域的基本优化问题之一--的环境中,Frank-Wolfe 方法的计算效率通常与数据观测的数量成线性增长[数学]。这与典型的随机投影方法形成了鲜明对比。为了降低对[math]的依赖性,我们研究了典型平滑损失函数(例如最小二乘损失和逻辑损失)的二阶平滑性,并提出用泰勒级数近似梯度修正弗兰克-沃尔夫方法,包括确定性和随机设置的变体。在优化容限[math]足够小的情况下,与当前最先进的方法相比,我们的方法能够同时降低对大[math]的依赖,同时在凸和非凸环境下获得最佳的弗兰克-沃尔夫方法收敛率。我们还提出了一种新颖的自适应步长方法,并为其提供了计算保证。最后,我们介绍了计算实验,实验结果表明,在现实世界的数据集上,我们的方法在凸和非凸二元分类问题上都比现有方法有非常显著的提速。
{"title":"Using Taylor-Approximated Gradients to Improve the Frank–Wolfe Method for Empirical Risk Minimization","authors":"Zikai Xiong, Robert M. Freund","doi":"10.1137/22m1519286","DOIUrl":"https://doi.org/10.1137/22m1519286","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2503-2534, September 2024. <br/> Abstract. The Frank–Wolfe method has become increasingly useful in statistical and machine learning applications due to the structure-inducing properties of the iterates and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of empirical risk minimization—one of the fundamental optimization problems in statistical and machine learning—the computational effectiveness of Frank–Wolfe methods typically grows linearly in the number of data observations [math]. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on [math], we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example), and we propose amending the Frank–Wolfe method with Taylor series–approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance [math] is sufficiently small, our methods are able to simultaneously reduce the dependence on large [math] while obtaining optimal convergence rates of Frank–Wolfe methods in both convex and nonconvex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Finally, we present computational experiments which show that our methods exhibit very significant speedups over existing methods on real-world datasets for both convex and nonconvex binary classification problems.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complexity of Finite-Sum Optimization with Nonsmooth Composite Functions and Non-Lipschitz Regularization 具有非光滑复合函数和非 Lipschitz 正则化的有限和优化的复杂性
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-10 DOI: 10.1137/23m1546701
Xiao Wang, Xiaojun Chen
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2472-2502, September 2024.
Abstract. In this paper, we present complexity analysis of proximal inexact gradient methods for finite-sum optimization with a nonconvex nonsmooth composite function and non-Lipschitz regularization. By getting access to a convex approximation to the Lipschitz function and a Lipschitz continuous approximation to the non-Lipschitz regularizer, we construct a proximal subproblem at each iteration without using exact function values and gradients. With certain accuracy control on inexact gradients and subproblem solutions, we show that the oracle complexity in terms of total number of inexact gradient evaluations is in order [math] to find an [math]-approximate first-order stationary point, ensuring that within a [math]-ball centered at this point the maximum reduction of an approximation model does not exceed [math]. This shows that we can have the same worst-case evaluation complexity order as in [C. Cartis, N. I. M. Gould, and P. L. Toint, SIAM J. Optim., 21 (2011), pp. 1721–1739, X. Chen, Ph. L. Toint, and H. Wang, SIAM J. Optim., 29 (2019), pp. 874–903], even if we introduce the non-Lipschitz singularity and the nonconvex nonsmooth composite function in the objective function. Moreover, we establish that the oracle complexity regarding the total number of stochastic oracles is in order [math] with high probability for stochastic proximal inexact gradient methods. We further extend the algorithm to adjust to solving stochastic problems with expectation form and derive the associated oracle complexity in order [math] with high probability.
SIAM 优化期刊》,第 34 卷第 3 期,第 2472-2502 页,2024 年 9 月。 摘要本文提出了近似非精确梯度方法的复杂性分析,用于非凸非光滑复合函数和非 Lipschitz 正则化的有限和优化。通过获取 Lipschitz 函数的凸近似值和非 Lipschitz 正则化的 Lipschitz 连续近似值,我们可以在每次迭代时构建一个近似子问题,而无需使用精确的函数值和梯度。在对非精确梯度和子问题解进行一定精度控制的情况下,我们证明了以非精确梯度求值总数为单位的oracle复杂度是[math],以找到一个[math]近似的一阶静止点,确保在以该点为中心的[math]球内,近似模型的最大还原度不超过[math]。这表明,我们可以获得与 [C. Cartis, N. I.] 中相同的最坏情况评估复杂度阶次。Cartis、N. I.M. Gould, and P. L. Toint, SIAM J. Optim., 21 (2011), pp.此外,我们还证明,对于随机近似不精确梯度法来说,关于随机神谕总数的神谕复杂度很有可能是[math]。我们进一步扩展了该算法,以适应求解期望形式的随机问题,并推导出相关的神谕复杂度以高概率为[math]阶。
{"title":"Complexity of Finite-Sum Optimization with Nonsmooth Composite Functions and Non-Lipschitz Regularization","authors":"Xiao Wang, Xiaojun Chen","doi":"10.1137/23m1546701","DOIUrl":"https://doi.org/10.1137/23m1546701","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2472-2502, September 2024. <br/> Abstract. In this paper, we present complexity analysis of proximal inexact gradient methods for finite-sum optimization with a nonconvex nonsmooth composite function and non-Lipschitz regularization. By getting access to a convex approximation to the Lipschitz function and a Lipschitz continuous approximation to the non-Lipschitz regularizer, we construct a proximal subproblem at each iteration without using exact function values and gradients. With certain accuracy control on inexact gradients and subproblem solutions, we show that the oracle complexity in terms of total number of inexact gradient evaluations is in order [math] to find an [math]-approximate first-order stationary point, ensuring that within a [math]-ball centered at this point the maximum reduction of an approximation model does not exceed [math]. This shows that we can have the same worst-case evaluation complexity order as in [C. Cartis, N. I. M. Gould, and P. L. Toint, SIAM J. Optim., 21 (2011), pp. 1721–1739, X. Chen, Ph. L. Toint, and H. Wang, SIAM J. Optim., 29 (2019), pp. 874–903], even if we introduce the non-Lipschitz singularity and the nonconvex nonsmooth composite function in the objective function. Moreover, we establish that the oracle complexity regarding the total number of stochastic oracles is in order [math] with high probability for stochastic proximal inexact gradient methods. We further extend the algorithm to adjust to solving stochastic problems with expectation form and derive the associated oracle complexity in order [math] with high probability.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141585848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Rate of Convergence of Bregman Proximal Methods: Local Geometry Versus Regularity Versus Sharpness 布雷格曼近端方法的收敛速度:局部几何VS正规性VS锐利性
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-09 DOI: 10.1137/23m1580218
Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2440-2471, September 2024.
Abstract. We examine the last-iterate convergence rate of Bregman proximal methods—from mirror descent to mirror-prox and its optimistic variants—as a function of the local geometry induced by the prox-mapping defining the method. For generality, we focus on local solutions of constrained, nonmonotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and nonzero Legendre exponent: The former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization.
SIAM 优化期刊》,第 34 卷第 3 期,第 2440-2471 页,2024 年 9 月。 摘要我们研究了布雷格曼近似方法--从镜像后裔到镜像近似及其乐观变体--的末次迭代收敛率,它是定义该方法的近似映射所引起的局部几何的函数。我们表明,给定方法的收敛速率与相关的 Legendre 指数密切相关,而 Legendre 指数是一个衡量解附近基本 Bregman 函数(欧氏、熵或其他)增长率的概念。我们特别指出,边界解在 Legendre 指数为零和非零的方法之间表现出截然不同的状态:前者以线性速率收敛,而后者一般以亚线性速率收敛。这种二分法在线性约束问题中变得更加明显,与欧几里得正则化的有限步数收敛相比,熵正则化方法在尖锐方向上实现了线性收敛率。
{"title":"The Rate of Convergence of Bregman Proximal Methods: Local Geometry Versus Regularity Versus Sharpness","authors":"Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos","doi":"10.1137/23m1580218","DOIUrl":"https://doi.org/10.1137/23m1580218","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2440-2471, September 2024. <br/> Abstract. We examine the last-iterate convergence rate of Bregman proximal methods—from mirror descent to mirror-prox and its optimistic variants—as a function of the local geometry induced by the prox-mapping defining the method. For generality, we focus on local solutions of constrained, nonmonotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and nonzero Legendre exponent: The former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Descent Algorithm for the Optimal Control of ReLU Neural Network Informed PDEs Based on Approximate Directional Derivatives 基于近似方向衍生物的 ReLU 神经网络 PDE 最佳控制后裔算法
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-02 DOI: 10.1137/22m1534420
Guozhi Dong, Michael Hintermüller, Kostas Papafitsoros
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2314-2349, September 2024.
Abstract. We propose and analyze a numerical algorithm for solving a class of optimal control problems for learning-informed semilinear partial differential equations (PDEs). Such PDEs contain constituents that are in principle unknown and are approximated by nonsmooth ReLU neural networks. We first show that direct smoothing of the ReLU network with the aim of using classical numerical solvers can have disadvantages, such as potentially introducing multiple solutions for the corresponding PDE. This motivates us to devise a numerical algorithm that treats directly the nonsmooth optimal control problem, by employing a descent algorithm inspired by a bundle-free method. Several numerical examples are provided and the efficiency of the algorithm is shown.
SIAM 优化期刊》,第 34 卷第 3 期,第 2314-2349 页,2024 年 9 月。 摘要我们提出并分析了一种数值算法,用于求解一类学习信息半线性偏微分方程(PDE)的最优控制问题。这类偏微分方程包含原则上未知的成分,由非光滑 ReLU 神经网络近似。我们首先表明,以使用经典数值求解器为目的直接平滑 ReLU 网络会有一些缺点,例如可能会为相应的 PDE 引入多个解。这促使我们设计出一种数值算法,通过采用受无束法启发的下降算法,直接处理非平滑最优控制问题。我们提供了几个数值示例,并展示了该算法的效率。
{"title":"A Descent Algorithm for the Optimal Control of ReLU Neural Network Informed PDEs Based on Approximate Directional Derivatives","authors":"Guozhi Dong, Michael Hintermüller, Kostas Papafitsoros","doi":"10.1137/22m1534420","DOIUrl":"https://doi.org/10.1137/22m1534420","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2314-2349, September 2024. <br/> Abstract. We propose and analyze a numerical algorithm for solving a class of optimal control problems for learning-informed semilinear partial differential equations (PDEs). Such PDEs contain constituents that are in principle unknown and are approximated by nonsmooth ReLU neural networks. We first show that direct smoothing of the ReLU network with the aim of using classical numerical solvers can have disadvantages, such as potentially introducing multiple solutions for the corresponding PDE. This motivates us to devise a numerical algorithm that treats directly the nonsmooth optimal control problem, by employing a descent algorithm inspired by a bundle-free method. Several numerical examples are provided and the efficiency of the algorithm is shown.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subgradient Regularized Multivariate Convex Regression at Scale 子梯度正则化多变量尺度凸回归
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-02 DOI: 10.1137/21m1413134
Wenyu Chen, Rahul Mazumder
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2350-2377, September 2024.
Abstract. We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to [math] samples in [math] dimensions—a key problem in shape constrained nonparametric regression with applications in statistics, engineering, and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with [math] decision variables and [math] constraints. While instances with [math] in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., [math] or [math]) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced subproblems and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with [math] and [math] within minutes and shows strong computational performance compared to earlier approaches.
SIAM 优化期刊》,第 34 卷第 3 期,第 2350-2377 页,2024 年 9 月。 摘要我们提出了在[数学]维度上对[数学]样本进行亚梯度正则化多元凸回归函数拟合的新大规模算法--这是形状约束非参数回归的关键问题,在统计学、工程学和应用科学中都有应用。无穷维学习任务可以通过一个具有[数学]决策变量和[数学]约束条件的凸二次方程程序(QP)来表达。虽然目前的算法可以在合理的运行时间内解决[math]在数千以下的实例,但解决更大的问题(如[math]或[math])在计算上具有挑战性。为此,我们提出了一种对偶 QP 的有源集类型算法。为了提高计算的可扩展性,我们允许对缩小的子问题进行近似优化,并提出了扩展活动集的随机增强规则。我们为算法推导出了新的计算保证。我们证明,我们的框架可以在几分钟内近似解决[math]和[math]的子梯度正则化凸回归问题实例,与早期方法相比,我们的框架显示出很强的计算性能。
{"title":"Subgradient Regularized Multivariate Convex Regression at Scale","authors":"Wenyu Chen, Rahul Mazumder","doi":"10.1137/21m1413134","DOIUrl":"https://doi.org/10.1137/21m1413134","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2350-2377, September 2024. <br/> Abstract. We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to [math] samples in [math] dimensions—a key problem in shape constrained nonparametric regression with applications in statistics, engineering, and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with [math] decision variables and [math] constraints. While instances with [math] in the lower thousands can be addressed with current algorithms within reasonable runtimes, solving larger problems (e.g., [math] or [math]) is computationally challenging. To this end, we present an active set type algorithm on the dual QP. For computational scalability, we allow for approximate optimization of the reduced subproblems and propose randomized augmentation rules for expanding the active set. We derive novel computational guarantees for our algorithms. We demonstrate that our framework can approximately solve instances of the subgradient regularized convex regression problem with [math] and [math] within minutes and shows strong computational performance compared to earlier approaches.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Convergence of Inertial Multiobjective Gradient-Like Systems with Asymptotic Vanishing Damping 具有渐近消失阻尼的惯性多目标梯度样系统的快速收敛性
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-02 DOI: 10.1137/23m1588512
Konstantin Sonntag, Sebastian Peitz
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2259-2286, September 2024.
Abstract. We present a new gradient-like dynamical system related to unconstrained convex smooth multiobjective optimization which involves inertial effects and asymptotic vanishing damping. To the best of our knowledge, this system is the first inertial gradient-like system for multiobjective optimization problems including asymptotic vanishing damping, expanding the ideas previously laid out in [H. Attouch and G. Garrigos, Multiobjective Optimization: An Inertial Dynamical Approach to Pareto Optima, preprint, arXiv:1506.02823, 2015]. We prove existence of solutions to this system in finite dimensions and further prove that its bounded solutions converge weakly to weakly Pareto optimal points. In addition, we obtain a convergence rate of order [math] for the function values measured with a merit function. This approach presents a good basis for the development of fast gradient methods for multiobjective optimization.
SIAM 优化期刊》,第 34 卷第 3 期,第 2259-2286 页,2024 年 9 月。 摘要我们提出了一个与无约束凸平滑多目标优化相关的新的类梯度动力系统,它涉及惯性效应和渐近消失阻尼。据我们所知,该系统是第一个包含渐近消失阻尼的多目标优化问题的惯性类梯度系统,拓展了以前在 [H. Attouch and G. Garrigou] 中提出的观点。Attouch 和 G. Garrigos,多目标优化:An Inertial Dynamical Approach to Pareto Optima, preprint, arXiv:1506.02823, 2015]中的观点。我们证明了该系统在有限维度上的解的存在性,并进一步证明了其有界解弱收敛于弱帕累托最优点。此外,我们还获得了用绩优函数测量的函数值的[math]阶收敛率。这种方法为开发多目标优化的快速梯度方法奠定了良好基础。
{"title":"Fast Convergence of Inertial Multiobjective Gradient-Like Systems with Asymptotic Vanishing Damping","authors":"Konstantin Sonntag, Sebastian Peitz","doi":"10.1137/23m1588512","DOIUrl":"https://doi.org/10.1137/23m1588512","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2259-2286, September 2024. <br/> Abstract. We present a new gradient-like dynamical system related to unconstrained convex smooth multiobjective optimization which involves inertial effects and asymptotic vanishing damping. To the best of our knowledge, this system is the first inertial gradient-like system for multiobjective optimization problems including asymptotic vanishing damping, expanding the ideas previously laid out in [H. Attouch and G. Garrigos, Multiobjective Optimization: An Inertial Dynamical Approach to Pareto Optima, preprint, arXiv:1506.02823, 2015]. We prove existence of solutions to this system in finite dimensions and further prove that its bounded solutions converge weakly to weakly Pareto optimal points. In addition, we obtain a convergence rate of order [math] for the function values measured with a merit function. This approach presents a good basis for the development of fast gradient methods for multiobjective optimization.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Frank–Wolfe on Generalized Self-Concordant Functions via Simple Steps 通过简单步骤对广义自洽函数进行可扩展的弗兰克-沃尔夫计算
IF 3.1 1区 数学 Q1 MATHEMATICS, APPLIED Pub Date : 2024-07-02 DOI: 10.1137/23m1616789
Alejandro Carderera, Mathieu Besançon, Sebastian Pokutta
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2231-2258, September 2024.
Abstract. Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank–Wolfe variant that uses the open-loop step size strategy [math], obtaining an [math] convergence rate for this class of functions in terms of primal gap and Frank–Wolfe gap, where [math] is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
SIAM 优化期刊》,第 34 卷第 3 期,第 2231-2258 页,2024 年 9 月。 摘要广义自洽性是许多重要学习问题目标函数的一个关键属性。我们建立了使用开环步长策略[math]的简单弗兰克-沃尔夫变体的收敛率,得到了该类函数在原始差距和弗兰克-沃尔夫差距方面的[math]收敛率,其中[math]为迭代次数。这避免了使用二阶信息,也不需要估计以前工作中的局部平滑参数。我们还展示了各种常见情况下收敛率的提高,例如,当考虑的可行区域是均匀凸面或多面体时。
{"title":"Scalable Frank–Wolfe on Generalized Self-Concordant Functions via Simple Steps","authors":"Alejandro Carderera, Mathieu Besançon, Sebastian Pokutta","doi":"10.1137/23m1616789","DOIUrl":"https://doi.org/10.1137/23m1616789","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2231-2258, September 2024. <br/> Abstract. Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank–Wolfe variant that uses the open-loop step size strategy [math], obtaining an [math] convergence rate for this class of functions in terms of primal gap and Frank–Wolfe gap, where [math] is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
SIAM Journal on Optimization
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1