Partial differential equations for training deep neural networks

2017 51st Asilomar Conference on Signals, Systems, and Computers Pub Date : 2017-10-01 DOI:10.1109/ACSSC.2017.8335634

P. Chaudhari, Adam M. Oberman, S. Osher, Stefano Soatto, G. Carlier

引用次数: 10

Abstract

This paper establishes a connection between non-convex optimization and nonlinear partial differential equations (PDEs). We interpret empirically successful relaxation techniques motivated from statistical physics for training deep neural networks as solutions of a viscous Hamilton-Jacobi (HJ) PDE. The underlying stochastic control interpretation allows us to prove that these techniques perform better than stochastic gradient descent. Our analysis provides insight into the geometry of the energy landscape and suggests new algorithms based on the non-viscous Hamilton-Jacobi PDE that can effectively tackle the high dimensionality of modern neural networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

训练深度神经网络的偏微分方程

本文建立了非凸优化与非线性偏微分方程之间的联系。我们将经验上成功的松弛技术解释为训练深度神经网络的统计物理学动机作为粘性汉密尔顿-雅可比(HJ) PDE的解决方案。潜在的随机控制解释使我们能够证明这些技术比随机梯度下降性能更好。我们的分析提供了对能源格局几何的洞察，并提出了基于非粘性Hamilton-Jacobi PDE的新算法，该算法可以有效地解决现代神经网络的高维问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 51st Asilomar Conference on Signals, Systems, and Computers

自引率

0.00%

发文量

期刊最新文献

milliProxy: A TCP proxy architecture for 5G mmWave cellular systems Joint user scheduling and power optimization in full-duplex cells with successive interference cancellation Deep neural network architectures for modulation classification Towards provably invisible network flow fingerprints Seeded graph matching: Efficient algorithms and theoretical guarantees