SIAM Journal on Optimization, Volume 34, Issue 3, Page 2411-2439, September 2024. Abstract. We consider a step search method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth- and first-order oracles. (We introduce the term step search for a class of methods, similar to line search, but where step direction can change during the back-tracking procedure.) Unlike the stochastic gradient method and its many variants, the algorithm does not use a prespecified sequence of step sizes but increases or decreases the step size adaptively according to the estimated progress of the algorithm. These oracles capture multiple standard settings including expected loss minimization and zeroth-order optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe and easy to implement. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when it is applied to nonconvex, convex, and strongly convex (more generally, those satisfying the Polyak-Łojasiewicz (PL) condition) functions. Our analysis strengthens and extends prior results for stochastic step and line search methods.
{"title":"High Probability Complexity Bounds for Adaptive Step Search Based on Stochastic Oracles","authors":"Billy Jin, Katya Scheinberg, Miaolan Xie","doi":"10.1137/22m1512764","DOIUrl":"https://doi.org/10.1137/22m1512764","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2411-2439, September 2024. <br/> Abstract. We consider a step search method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth- and first-order oracles. (We introduce the term step search for a class of methods, similar to line search, but where step direction can change during the back-tracking procedure.) Unlike the stochastic gradient method and its many variants, the algorithm does not use a prespecified sequence of step sizes but increases or decreases the step size adaptively according to the estimated progress of the algorithm. These oracles capture multiple standard settings including expected loss minimization and zeroth-order optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe and easy to implement. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when it is applied to nonconvex, convex, and strongly convex (more generally, those satisfying the Polyak-Łojasiewicz (PL) condition) functions. Our analysis strengthens and extends prior results for stochastic step and line search methods.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2287-2313, September 2024. Abstract. In this paper, the convergence analysis of accelerated second-order methods for convex optimization problems is developed from the point of view of autonomous dissipative inertial continuous dynamics in the magnetic field. Different from the classical heavy ball model with damping, we consider the motion of a charged particle in a magnetic field model involving the linear asymptotic vanishing damping. It is a coupled ordinary differential system by adding the magnetic coupled term [math] to the heavy ball system with [math]. In order to develop fast optimization methods, our first contribution is to prove the global existence and uniqueness of a smooth solution under certain regularity conditions of this system via the Banach fixed point theorem. Our second contribution is to establish the convergence rate of corresponding algorithms involving inertial features via discrete time versions of inertial dynamics under the magnetic field. Meanwhile, the connection of algorithms between the heavy ball model and the motion of a charged particle in a magnetic field model is established.
{"title":"Fast Optimization of Charged Particle Dynamics with Damping","authors":"Weiping Yan, Yu Tang, Gonglin Yuan","doi":"10.1137/23m1599045","DOIUrl":"https://doi.org/10.1137/23m1599045","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2287-2313, September 2024. <br/> Abstract. In this paper, the convergence analysis of accelerated second-order methods for convex optimization problems is developed from the point of view of autonomous dissipative inertial continuous dynamics in the magnetic field. Different from the classical heavy ball model with damping, we consider the motion of a charged particle in a magnetic field model involving the linear asymptotic vanishing damping. It is a coupled ordinary differential system by adding the magnetic coupled term [math] to the heavy ball system with [math]. In order to develop fast optimization methods, our first contribution is to prove the global existence and uniqueness of a smooth solution under certain regularity conditions of this system via the Banach fixed point theorem. Our second contribution is to establish the convergence rate of corresponding algorithms involving inertial features via discrete time versions of inertial dynamics under the magnetic field. Meanwhile, the connection of algorithms between the heavy ball model and the motion of a charged particle in a magnetic field model is established.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2378-2410, September 2024. Abstract. In recent years, matrix completion has become one of the main concepts in data science. In the process of data acquisition in real applications, in addition to missing data, observed data may be inaccurate. This paper is concerned with such matrix completion of inexact observed data which can be modeled as a rank minimization problem. We adopt the difference of the nuclear norm and the Frobenius norm as an approximation of the rank function, employ Tikhonov-type regularization to preserve the inherent characteristics of original data and control oscillation arising from inexact observations, and then establish a new nonsmooth and nonconvex relaxation model for such low-rank matrix completion. We propose a new accelerated proximal gradient–type algorithm to solve the nonsmooth and nonconvex minimization problem and show that the generated sequence is bounded and globally converges to a critical point of our model. Furthermore, the rate of convergence is given via the Kurdyka–Łojasiewicz property. We evaluate our model and method on visual images and received signal strength fingerprint data in an indoor positioning system. Numerical experiments illustrate that our approach outperforms some state-of-the-art methods, and also verify the efficacy of the Tikhonov-type regularization.
{"title":"A Novel Nonconvex Relaxation Approach to Low-Rank Matrix Completion of Inexact Observed Data","authors":"Yan Li, Liping Zhang","doi":"10.1137/22m1543653","DOIUrl":"https://doi.org/10.1137/22m1543653","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2378-2410, September 2024. <br/> Abstract. In recent years, matrix completion has become one of the main concepts in data science. In the process of data acquisition in real applications, in addition to missing data, observed data may be inaccurate. This paper is concerned with such matrix completion of inexact observed data which can be modeled as a rank minimization problem. We adopt the difference of the nuclear norm and the Frobenius norm as an approximation of the rank function, employ Tikhonov-type regularization to preserve the inherent characteristics of original data and control oscillation arising from inexact observations, and then establish a new nonsmooth and nonconvex relaxation model for such low-rank matrix completion. We propose a new accelerated proximal gradient–type algorithm to solve the nonsmooth and nonconvex minimization problem and show that the generated sequence is bounded and globally converges to a critical point of our model. Furthermore, the rate of convergence is given via the Kurdyka–Łojasiewicz property. We evaluate our model and method on visual images and received signal strength fingerprint data in an indoor positioning system. Numerical experiments illustrate that our approach outperforms some state-of-the-art methods, and also verify the efficacy of the Tikhonov-type regularization.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2201-2230, September 2024. Abstract. Maximum a posteriori (MAP) estimation, like all Bayesian methods, depends on prior assumptions. These assumptions are often chosen to promote specific features in the recovered estimate. The form of the chosen prior determines the shape of the posterior distribution, thus the behavior of the estimator and complexity of the associated optimization problem. Here, we consider a family of Gaussian hierarchical models with generalized gamma hyperpriors designed to promote sparsity in linear inverse problems. By varying the hyperparameters, we move continuously between priors that act as smoothed [math] penalties with flexible [math], smoothing, and scale. We then introduce a predictor-corrector method that tracks MAP solution paths as the hyperparameters vary. Path following allows a user to explore the space of possible MAP solutions and to test the sensitivity of solutions to changes in the prior assumptions. By tracing paths from a convex region to a nonconvex region, the user could find local minimizers in strongly sparsity promoting regimes that are consistent with a convex relaxation derived using related prior assumptions. We show experimentally that these solutions are less error prone than direct optimization of the nonconvex problem.
{"title":"Path-Following Methods for Maximum a Posteriori Estimators in Bayesian Hierarchical Models: How Estimates Depend on Hyperparameters","authors":"Zilai Si, Yucong Liu, Alexander Strang","doi":"10.1137/22m153330x","DOIUrl":"https://doi.org/10.1137/22m153330x","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2201-2230, September 2024. <br/> Abstract. Maximum a posteriori (MAP) estimation, like all Bayesian methods, depends on prior assumptions. These assumptions are often chosen to promote specific features in the recovered estimate. The form of the chosen prior determines the shape of the posterior distribution, thus the behavior of the estimator and complexity of the associated optimization problem. Here, we consider a family of Gaussian hierarchical models with generalized gamma hyperpriors designed to promote sparsity in linear inverse problems. By varying the hyperparameters, we move continuously between priors that act as smoothed [math] penalties with flexible [math], smoothing, and scale. We then introduce a predictor-corrector method that tracks MAP solution paths as the hyperparameters vary. Path following allows a user to explore the space of possible MAP solutions and to test the sensitivity of solutions to changes in the prior assumptions. By tracing paths from a convex region to a nonconvex region, the user could find local minimizers in strongly sparsity promoting regimes that are consistent with a convex relaxation derived using related prior assumptions. We show experimentally that these solutions are less error prone than direct optimization of the nonconvex problem.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 3, Page 2169-2200, September 2024. Abstract. In this work, we consider the low-rank decomposition (SDPR) of general convex semidefinite programming (SDP) problems that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a rank-support-adaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optimal solution for generic constraint vectors. We prove its global convergence and local linear convergence without assuming that the objective function is twice differentiable. Due to the special structure of the low-rank SDP problem, our algorithm can achieve better iteration complexity than existing results for more general smooth nonconvex problems. In order to overcome the degeneracy issues of SDP problems, we develop two strategies based on random perturbation and dual refinement. These techniques enable us to solve some primal degenerate SDP problems efficiently, for example, Lovász theta SDPs. Our work is a step forward in extending the application range of Riemannian optimization approaches for solving SDP problems. Numerical experiments are conducted to verify the efficiency and robustness of our method.
{"title":"A Feasible Method for General Convex Low-Rank SDP Problems","authors":"Tianyun Tang, Kim-Chuan Toh","doi":"10.1137/23m1561464","DOIUrl":"https://doi.org/10.1137/23m1561464","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 3, Page 2169-2200, September 2024. <br/> Abstract. In this work, we consider the low-rank decomposition (SDPR) of general convex semidefinite programming (SDP) problems that contain both a positive semidefinite matrix and a nonnegative vector as variables. We develop a rank-support-adaptive feasible method to solve (SDPR) based on Riemannian optimization. The method is able to escape from a saddle point to ensure its convergence to a global optimal solution for generic constraint vectors. We prove its global convergence and local linear convergence without assuming that the objective function is twice differentiable. Due to the special structure of the low-rank SDP problem, our algorithm can achieve better iteration complexity than existing results for more general smooth nonconvex problems. In order to overcome the degeneracy issues of SDP problems, we develop two strategies based on random perturbation and dual refinement. These techniques enable us to solve some primal degenerate SDP problems efficiently, for example, Lovász theta SDPs. Our work is a step forward in extending the application range of Riemannian optimization approaches for solving SDP problems. Numerical experiments are conducted to verify the efficiency and robustness of our method.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 2, Page 2150-2168, June 2024. Abstract. A significant milestone in modern gradient-based optimization was achieved with the development of Nesterov’s accelerated gradient descent (NAG) method. This forward-backward technique has been further advanced with the introduction of its proximal generalization, commonly known as the fast iterative shrinkage-thresholding algorithm (FISTA), which enjoys widespread application in image science and engineering. Nonetheless, it remains unclear whether both NAG and FISTA exhibit linear convergence for strongly convex functions. Remarkably, these algorithms demonstrate convergence without requiring any prior knowledge of strongly convex modulus, and this intriguing characteristic has been acknowledged as an open problem in the comprehensive review [A. Chambolle and T. Pock, Acta Numer., 25 (2016), pp. 161–319]. In this paper, we address this question by utilizing the high-resolution ordinary differential equation (ODE) framework. Expanding upon the established phase-space representation, we emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy that evolves throughout the iterations. Furthermore, we highlight that the linear convergence of both NAG and FISTA is independent of the parameter [math]. Additionally, we demonstrate that the square of the proximal subgradient norm likewise advances toward linear convergence.
{"title":"Linear Convergence of Forward-Backward Accelerated Algorithms without Knowledge of the Modulus of Strong Convexity","authors":"Bowen Li, Bin Shi, Ya-xiang Yuan","doi":"10.1137/23m158111x","DOIUrl":"https://doi.org/10.1137/23m158111x","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 2, Page 2150-2168, June 2024. <br/> Abstract. A significant milestone in modern gradient-based optimization was achieved with the development of Nesterov’s accelerated gradient descent (NAG) method. This forward-backward technique has been further advanced with the introduction of its proximal generalization, commonly known as the fast iterative shrinkage-thresholding algorithm (FISTA), which enjoys widespread application in image science and engineering. Nonetheless, it remains unclear whether both NAG and FISTA exhibit linear convergence for strongly convex functions. Remarkably, these algorithms demonstrate convergence without requiring any prior knowledge of strongly convex modulus, and this intriguing characteristic has been acknowledged as an open problem in the comprehensive review [A. Chambolle and T. Pock, Acta Numer., 25 (2016), pp. 161–319]. In this paper, we address this question by utilizing the high-resolution ordinary differential equation (ODE) framework. Expanding upon the established phase-space representation, we emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy that evolves throughout the iterations. Furthermore, we highlight that the linear convergence of both NAG and FISTA is independent of the parameter [math]. Additionally, we demonstrate that the square of the proximal subgradient norm likewise advances toward linear convergence.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 2, Page 2121-2149, June 2024. Abstract. In this work we investigate the min-max-min robust optimization problem and the k-adaptability robust optimization problem for binary problems with uncertain costs. The idea of the first approach is to calculate a set of k feasible solutions which are worst-case optimal if in each possible scenario the best of the k solutions is implemented. It is known that the min-max-min robust problem can be solved efficiently if k is at least the dimension of the problem, while it is theoretically and computationally hard if k is small. However, nothing is known about the intermediate case, i.e., k lies between one and the dimension of the problem. We approach this open question and present an approximation algorithm which achieves good problem-specific approximation guarantees for the cases where k is close to or a fraction of the dimension. The derived bounds can be used to show that the min-max-min robust problem is solvable in oracle-polynomial time under certain conditions even if k is smaller than the dimension. We extend the previous results to the robust k-adaptability problem. As a consequence we can provide bounds on the number of necessary second-stage policies to approximate the exact two-stage robust problem. We derive an approximation algorithm for the k-adaptability problem which has similar guarantees as for the min-max-min problem. Finally, we test both algorithms on knapsack and shortest path problems. The experiments show that both algorithms calculate solutions with relatively small optimality gap in seconds.
SIAM 优化期刊》,第 34 卷第 2 期,第 2121-2149 页,2024 年 6 月。 摘要在这项工作中,我们研究了具有不确定成本的二元问题的最小-最大-最小鲁棒优化问题和 k-适应性鲁棒优化问题。第一种方法的思路是计算一组 k 个可行解,如果在每种可能的情况下都实施了 k 个解中的最佳解,则这些解都是最坏情况下的最优解。众所周知,如果 k 至少是问题的维度,则最小-最大-最小稳健问题可以高效求解,而如果 k 较小,则理论上和计算上都很困难。然而,对于中间情况,即 k 介于 1 和问题维度之间,我们却一无所知。我们从这一悬而未决的问题入手,提出了一种近似算法,它能在 k 接近维数或维数的几分之一的情况下,实现针对具体问题的良好近似保证。推导出的边界可以用来证明,即使 k 小于维数,最小-最大-最小鲁棒问题在某些条件下也可以在oracle-polynomial 时间内求解。我们将前面的结果扩展到鲁棒 k 适应性问题。因此,我们可以提供近似精确两阶段鲁棒问题所需的第二阶段策略数量的边界。我们为 k 适应性问题推导出了一种近似算法,该算法具有与最小-最大-最小问题类似的保证。最后,我们在knapsack和最短路径问题上测试了这两种算法。实验结果表明,这两种算法都能在几秒钟内计算出最优差距相对较小的解决方案。
{"title":"Approximation Guarantees for Min-Max-Min Robust Optimization and [math]-Adaptability Under Objective Uncertainty","authors":"Jannis Kurtz","doi":"10.1137/23m1595084","DOIUrl":"https://doi.org/10.1137/23m1595084","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 2, Page 2121-2149, June 2024. <br/> Abstract. In this work we investigate the min-max-min robust optimization problem and the k-adaptability robust optimization problem for binary problems with uncertain costs. The idea of the first approach is to calculate a set of k feasible solutions which are worst-case optimal if in each possible scenario the best of the k solutions is implemented. It is known that the min-max-min robust problem can be solved efficiently if k is at least the dimension of the problem, while it is theoretically and computationally hard if k is small. However, nothing is known about the intermediate case, i.e., k lies between one and the dimension of the problem. We approach this open question and present an approximation algorithm which achieves good problem-specific approximation guarantees for the cases where k is close to or a fraction of the dimension. The derived bounds can be used to show that the min-max-min robust problem is solvable in oracle-polynomial time under certain conditions even if k is smaller than the dimension. We extend the previous results to the robust k-adaptability problem. As a consequence we can provide bounds on the number of necessary second-stage policies to approximate the exact two-stage robust problem. We derive an approximation algorithm for the k-adaptability problem which has similar guarantees as for the min-max-min problem. Finally, we test both algorithms on knapsack and shortest path problems. The experiments show that both algorithms calculate solutions with relatively small optimality gap in seconds.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 2, Page 2093-2120, June 2024. Abstract. We propose a new first-order method for minimizing nonconvex functions with a Lipschitz continuous gradient and Hessian. The proposed method is an accelerated gradient descent with two restart mechanisms and finds a solution where the gradient norm is less than [math] in [math] function and gradient evaluations. Unlike existing first-order methods with similar complexity bounds, our algorithm is parameter-free because it requires no prior knowledge of problem-dependent parameters, e.g., the Lipschitz constants and the target accuracy [math]. The main challenge in achieving this advantage is estimating the Lipschitz constant of the Hessian using only first-order information. To this end, we develop a new Hessian-free analysis based on two technical inequalities: a Jensen-type inequality for gradients and an error bound for the trapezoidal rule. Several numerical results illustrate that the proposed method performs comparably to existing algorithms with similar complexity bounds, even without parameter tuning.
{"title":"Parameter-Free Accelerated Gradient Descent for Nonconvex Minimization","authors":"Naoki Marumo, Akiko Takeda","doi":"10.1137/22m1540934","DOIUrl":"https://doi.org/10.1137/22m1540934","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 2, Page 2093-2120, June 2024. <br/> Abstract. We propose a new first-order method for minimizing nonconvex functions with a Lipschitz continuous gradient and Hessian. The proposed method is an accelerated gradient descent with two restart mechanisms and finds a solution where the gradient norm is less than [math] in [math] function and gradient evaluations. Unlike existing first-order methods with similar complexity bounds, our algorithm is parameter-free because it requires no prior knowledge of problem-dependent parameters, e.g., the Lipschitz constants and the target accuracy [math]. The main challenge in achieving this advantage is estimating the Lipschitz constant of the Hessian using only first-order information. To this end, we develop a new Hessian-free analysis based on two technical inequalities: a Jensen-type inequality for gradients and an error bound for the trapezoidal rule. Several numerical results illustrate that the proposed method performs comparably to existing algorithms with similar complexity bounds, even without parameter tuning.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Optimization, Volume 34, Issue 2, Page 2067-2092, June 2024. Abstract. Using tail bounds, we introduce a new probabilistic condition for function estimation in stochastic derivative-free optimization (SDFO) which leads to a reduction in the number of samples and eases algorithmic analyses. Moreover, we develop simple stochastic direct-search and trust-region methods for the optimization of a potentially nonsmooth function whose values can only be estimated via stochastic observations. For trial points to be accepted, these algorithms require the estimated function values to yield a sufficient decrease measured in terms of a power larger than 1 of the algoritmic stepsize. Our new tail bound condition is precisely imposed on the reduction estimate used to achieve such a sufficient decrease. This condition allows us to select the stepsize power used for sufficient decrease in such a way that the number of samples needed per iteration is reduced. In previous works, the number of samples necessary for global convergence at every iteration [math] of this type of algorithm was [math], where [math] is the stepsize or trust-region radius. However, using the new tail bound condition, and under mild assumptions on the noise, one can prove that such a number of samples is only [math], where [math] can be made arbitrarily small by selecting the power of the stepsize in the sufficient decrease test arbitrarily close to 1. In the common random number generator setting, a further improvement by a factor of [math] can be obtained. The global convergence properties of the stochastic direct-search and trust-region algorithms are established under the new tail bound condition.
{"title":"Stochastic Trust-Region and Direct-Search Methods: A Weak Tail Bound Condition and Reduced Sample Sizing","authors":"F. Rinaldi, L. N. Vicente, D. Zeffiro","doi":"10.1137/22m1543446","DOIUrl":"https://doi.org/10.1137/22m1543446","url":null,"abstract":"SIAM Journal on Optimization, Volume 34, Issue 2, Page 2067-2092, June 2024. <br/> Abstract. Using tail bounds, we introduce a new probabilistic condition for function estimation in stochastic derivative-free optimization (SDFO) which leads to a reduction in the number of samples and eases algorithmic analyses. Moreover, we develop simple stochastic direct-search and trust-region methods for the optimization of a potentially nonsmooth function whose values can only be estimated via stochastic observations. For trial points to be accepted, these algorithms require the estimated function values to yield a sufficient decrease measured in terms of a power larger than 1 of the algoritmic stepsize. Our new tail bound condition is precisely imposed on the reduction estimate used to achieve such a sufficient decrease. This condition allows us to select the stepsize power used for sufficient decrease in such a way that the number of samples needed per iteration is reduced. In previous works, the number of samples necessary for global convergence at every iteration [math] of this type of algorithm was [math], where [math] is the stepsize or trust-region radius. However, using the new tail bound condition, and under mild assumptions on the noise, one can prove that such a number of samples is only [math], where [math] can be made arbitrarily small by selecting the power of the stepsize in the sufficient decrease test arbitrarily close to 1. In the common random number generator setting, a further improvement by a factor of [math] can be obtained. The global convergence properties of the stochastic direct-search and trust-region algorithms are established under the new tail bound condition.","PeriodicalId":49529,"journal":{"name":"SIAM Journal on Optimization","volume":null,"pages":null},"PeriodicalIF":3.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141496177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}