三次正则化子采样牛顿方法的自适应随机方差约简

INFORMS journal on optimization Pub Date : 2018-11-28 DOI:10.1287/ijoo.2021.0058

Junyu Zhang, Lin Xiao, Shuzhong Zhang

{"title":"三次正则化子采样牛顿方法的自适应随机方差约简","authors":"Junyu Zhang, Lin Xiao, Shuzhong Zhang","doi":"10.1287/ijoo.2021.0058","DOIUrl":null,"url":null,"abstract":"The cubic regularized Newton method of Nesterov and Polyak has become increasingly popular for nonconvex optimization because of its capability of finding an approximate local solution with a second order guarantee and its low iteration complexity. Several recent works extend this method to the setting of minimizing the average of N smooth functions by replacing the exact gradients and Hessians with subsampled approximations. It is shown that the total Hessian sample complexity can be reduced to be sublinear in N per iteration by leveraging stochastic variance reduction techniques. We present an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and show that the expected Hessian sample complexity is [Formula: see text] for finding an [Formula: see text]-approximate local solution (in terms of first and second order guarantees, respectively). Moreover, we show that the same Hessian sample complexity is retained with fixed sample sizes if exact gradients are used. The techniques of our analysis are different from previous works in that we do not rely on high probability bounds based on matrix concentration inequalities. Instead, we derive and utilize new bounds on the third and fourth order moments of the average of random matrices, which are of independent interest on their own.","PeriodicalId":73382,"journal":{"name":"INFORMS journal on optimization","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization\",\"authors\":\"Junyu Zhang, Lin Xiao, Shuzhong Zhang\",\"doi\":\"10.1287/ijoo.2021.0058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The cubic regularized Newton method of Nesterov and Polyak has become increasingly popular for nonconvex optimization because of its capability of finding an approximate local solution with a second order guarantee and its low iteration complexity. Several recent works extend this method to the setting of minimizing the average of N smooth functions by replacing the exact gradients and Hessians with subsampled approximations. It is shown that the total Hessian sample complexity can be reduced to be sublinear in N per iteration by leveraging stochastic variance reduction techniques. We present an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and show that the expected Hessian sample complexity is [Formula: see text] for finding an [Formula: see text]-approximate local solution (in terms of first and second order guarantees, respectively). Moreover, we show that the same Hessian sample complexity is retained with fixed sample sizes if exact gradients are used. The techniques of our analysis are different from previous works in that we do not rely on high probability bounds based on matrix concentration inequalities. Instead, we derive and utilize new bounds on the third and fourth order moments of the average of random matrices, which are of independent interest on their own.\",\"PeriodicalId\":73382,\"journal\":{\"name\":\"INFORMS journal on optimization\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"INFORMS journal on optimization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/ijoo.2021.0058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"INFORMS journal on optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/ijoo.2021.0058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

Nesterov和Polyak的三次正则牛顿法由于具有二阶保证的近似局部解和较低的迭代复杂度，在非凸优化中得到越来越广泛的应用。最近的一些工作将这种方法扩展到最小化N个光滑函数的平均值的设置，通过用次采样近似代替精确梯度和Hessians。结果表明，利用随机方差约简技术，总Hessian样本复杂度在N次迭代中可以降低到亚线性。我们提出了一种具有三次正则化的次采样牛顿方法的自适应方差减少方案，并表明期望的Hessian样本复杂度为[公式:见文本]，用于寻找[公式:见文本]-近似局部解(分别根据一阶和二阶保证)。此外，我们还表明，如果使用精确梯度，则在固定样本量下保持相同的Hessian样本复杂度。我们的分析技术与以前的工作不同，因为我们不依赖基于矩阵浓度不等式的高概率界限。相反，我们推导并利用了随机矩阵平均值的三阶和四阶矩的新界，它们本身是独立的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization

The cubic regularized Newton method of Nesterov and Polyak has become increasingly popular for nonconvex optimization because of its capability of finding an approximate local solution with a second order guarantee and its low iteration complexity. Several recent works extend this method to the setting of minimizing the average of N smooth functions by replacing the exact gradients and Hessians with subsampled approximations. It is shown that the total Hessian sample complexity can be reduced to be sublinear in N per iteration by leveraging stochastic variance reduction techniques. We present an adaptive variance reduction scheme for a subsampled Newton method with cubic regularization and show that the expected Hessian sample complexity is [Formula: see text] for finding an [Formula: see text]-approximate local solution (in terms of first and second order guarantees, respectively). Moreover, we show that the same Hessian sample complexity is retained with fixed sample sizes if exact gradients are used. The techniques of our analysis are different from previous works in that we do not rely on high probability bounds based on matrix concentration inequalities. Instead, we derive and utilize new bounds on the third and fourth order moments of the average of random matrices, which are of independent interest on their own.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

INFORMS journal on optimization

自引率

0.00%

发文量

期刊最新文献

A Stochastic Inexact Sequential Quadratic Optimization Algorithm for Nonlinear Equality-Constrained Optimization Scenario-Based Robust Optimization for Two-Stage Decision Making Under Binary Uncertainty On the Hardness of Learning from Censored and Nonstationary Demand Temporal Bin Packing with Half-Capacity Jobs Editorial Board