关于非凸随机梯度下降的扩散逼近

IF 0.4 Q4 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Annals of Mathematical Sciences and Applications Pub Date : 2017-05-22 DOI:10.4310/AMSA.2019.V4.N1.A1

Wenqing Hu, C. J. Li, Lei Li, Jian‐Guo Liu

{"title":"关于非凸随机梯度下降的扩散逼近","authors":"Wenqing Hu, C. J. Li, Lei Li, Jian‐Guo Liu","doi":"10.4310/AMSA.2019.V4.N1.A1","DOIUrl":null,"url":null,"abstract":"We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization.","PeriodicalId":42896,"journal":{"name":"Annals of Mathematical Sciences and Applications","volume":"1 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"132","resultStr":"{\"title\":\"On the diffusion approximation of nonconvex stochastic gradient descent\",\"authors\":\"Wenqing Hu, C. J. Li, Lei Li, Jian‐Guo Liu\",\"doi\":\"10.4310/AMSA.2019.V4.N1.A1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization.\",\"PeriodicalId\":42896,\"journal\":{\"name\":\"Annals of Mathematical Sciences and Applications\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2017-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"132\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Mathematical Sciences and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4310/AMSA.2019.V4.N1.A1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematical Sciences and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4310/AMSA.2019.V4.N1.A1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 132

摘要

从近似扩散过程的角度研究了随机梯度下降法在非凸优化问题中的应用。利用概率演化主方程的弱形式，严格证明了扩散过程可以弱逼近SGD算法。在小步长和全向噪声存在的情况下，我们的弱近似扩散过程表明，从局部最小值开始的SGD迭代具有以下动态。~鞍点):它在若干次迭代中以指数形式进行转义(例如:(几乎线性)依赖于逆步长。利用动力系统随机扰动理论(局部极小值的大偏差理论和不稳定平稳点的退出理论)得到了结果。此外，我们还讨论了批大小对深度神经网络的影响，我们发现小的批大小有助于SGD算法摆脱不稳定的平稳点和尖锐的最小化。我们的理论表明，应该在后期增加批大小，使SGD被困在平面最小化器中，以便更好地泛化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

On the diffusion approximation of nonconvex stochastic gradient descent

We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annals of Mathematical Sciences and Applications MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-

自引率

0.00%

发文量