A non-monotone trust-region method with noisy oracles and additional sampling

IF 2 2区数学 Q2 MATHEMATICS, APPLIED Computational Optimization and Applications Pub Date : 2024-05-31 DOI:10.1007/s10589-024-00580-w

Nataša Krejić, Nataša Krklec Jerinkić, Ángeles Martínez, Mahsa Yousefi

{"title":"A non-monotone trust-region method with noisy oracles and additional sampling","authors":"Nataša Krejić, Nataša Krklec Jerinkić, Ángeles Martínez, Mahsa Yousefi","doi":"10.1007/s10589-024-00580-w","DOIUrl":null,"url":null,"abstract":"<p>In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies that yield noisy approximations of the finite sum objective function and its gradient. We introduce an adaptive sample size strategy based on inexpensive additional sampling to control the resulting approximation error. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.</p>","PeriodicalId":55227,"journal":{"name":"Computational Optimization and Applications","volume":"87 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Optimization and Applications","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10589-024-00580-w","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

In this work, we introduce a novel stochastic second-order method, within the framework of a non-monotone trust-region approach, for solving the unconstrained, nonlinear, and non-convex optimization problems arising in the training of deep neural networks. The proposed algorithm makes use of subsampling strategies that yield noisy approximations of the finite sum objective function and its gradient. We introduce an adaptive sample size strategy based on inexpensive additional sampling to control the resulting approximation error. Depending on the estimated progress of the algorithm, this can yield sample size scenarios ranging from mini-batch to full sample functions. We provide convergence analysis for all possible scenarios and show that the proposed method achieves almost sure convergence under standard assumptions for the trust-region framework. We report numerical experiments showing that the proposed algorithm outperforms its state-of-the-art counterpart in deep neural network training for image classification and regression tasks while requiring a significantly smaller number of gradient evaluations.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有噪声信标和额外采样的非单调信任区域方法

在这项工作中，我们在非单调信任区域方法的框架内引入了一种新型随机二阶方法，用于解决深度神经网络训练中出现的无约束、非线性和非凸优化问题。所提出的算法采用了子采样策略，可以得到有限和目标函数及其梯度的噪声近似值。我们引入了一种基于廉价额外采样的自适应样本大小策略，以控制由此产生的近似误差。根据算法的估计进度，这可以产生从小批量到全样本函数的样本大小方案。我们提供了所有可能方案的收敛性分析，并表明在信任区域框架的标准假设条件下，所提出的方法几乎可以确保收敛性。我们报告的数值实验表明，在针对图像分类和回归任务的深度神经网络训练中，所提出的算法优于其最先进的同类算法，同时所需的梯度评估次数也大大减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Optimization and Applications 数学-应用数学

CiteScore

3.70

自引率

9.10%

发文量

审稿时长

10 months

期刊介绍： Computational Optimization and Applications is a peer reviewed journal that is committed to timely publication of research and tutorial papers on the analysis and development of computational algorithms and modeling technology for optimization. Algorithms either for general classes of optimization problems or for more specific applied problems are of interest. Stochastic algorithms as well as deterministic algorithms will be considered. Papers that can provide both theoretical analysis, along with carefully designed computational experiments, are particularly welcome. Topics of interest include, but are not limited to the following: Large Scale Optimization, Unconstrained Optimization, Linear Programming, Quadratic Programming Complementarity Problems, and Variational Inequalities, Constrained Optimization, Nondifferentiable Optimization, Integer Programming, Combinatorial Optimization, Stochastic Optimization, Multiobjective Optimization, Network Optimization, Complexity Theory, Approximations and Error Analysis, Parametric Programming and Sensitivity Analysis, Parallel Computing, Distributed Computing, and Vector Processing, Software, Benchmarks, Numerical Experimentation and Comparisons, Modelling Languages and Systems for Optimization, Automatic Differentiation, Applications in Engineering, Finance, Optimal Control, Optimal Design, Operations Research, Transportation, Economics, Communications, Manufacturing, and Management Science.