{"title":"随机斯蒂芬森法","authors":"Minda Zhao, Zehua Lai, Lek-Heng Lim","doi":"10.1007/s10589-024-00583-7","DOIUrl":null,"url":null,"abstract":"<p>Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes—the <i>Steffensen method</i> avoids second derivatives and is still quadratically convergent like Newton method. By incorporating a specific step size we can even push its convergence order beyond quadratic to <span>\\(1+\\sqrt{2} \\approx 2.414\\)</span>. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method—note that this is not true for SGD or SLBFGS—and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.</p>","PeriodicalId":55227,"journal":{"name":"Computational Optimization and Applications","volume":"8 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Steffensen method\",\"authors\":\"Minda Zhao, Zehua Lai, Lek-Heng Lim\",\"doi\":\"10.1007/s10589-024-00583-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes—the <i>Steffensen method</i> avoids second derivatives and is still quadratically convergent like Newton method. By incorporating a specific step size we can even push its convergence order beyond quadratic to <span>\\\\(1+\\\\sqrt{2} \\\\approx 2.414\\\\)</span>. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method—note that this is not true for SGD or SLBFGS—and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.</p>\",\"PeriodicalId\":55227,\"journal\":{\"name\":\"Computational Optimization and Applications\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Optimization and Applications\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s10589-024-00583-7\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Optimization and Applications","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10589-024-00583-7","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes—the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating a specific step size we can even push its convergence order beyond quadratic to \(1+\sqrt{2} \approx 2.414\). While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several existing first-order methods. When restricted to a quadratic objective, our stochastic Steffensen methods reduce to randomized Kaczmarz method—note that this is not true for SGD or SLBFGS—and thus we may also view our methods as a generalization of randomized Kaczmarz to arbitrary objectives.
期刊介绍:
Computational Optimization and Applications is a peer reviewed journal that is committed to timely publication of research and tutorial papers on the analysis and development of computational algorithms and modeling technology for optimization. Algorithms either for general classes of optimization problems or for more specific applied problems are of interest. Stochastic algorithms as well as deterministic algorithms will be considered. Papers that can provide both theoretical analysis, along with carefully designed computational experiments, are particularly welcome.
Topics of interest include, but are not limited to the following:
Large Scale Optimization,
Unconstrained Optimization,
Linear Programming,
Quadratic Programming Complementarity Problems, and Variational Inequalities,
Constrained Optimization,
Nondifferentiable Optimization,
Integer Programming,
Combinatorial Optimization,
Stochastic Optimization,
Multiobjective Optimization,
Network Optimization,
Complexity Theory,
Approximations and Error Analysis,
Parametric Programming and Sensitivity Analysis,
Parallel Computing, Distributed Computing, and Vector Processing,
Software, Benchmarks, Numerical Experimentation and Comparisons,
Modelling Languages and Systems for Optimization,
Automatic Differentiation,
Applications in Engineering, Finance, Optimal Control, Optimal Design, Operations Research,
Transportation, Economics, Communications, Manufacturing, and Management Science.