Numerical Analysis for Convergence of a Sample-Wise Backpropagation Method for Training Stochastic Neural Networks

IF 2.8 2区数学 Q1 MATHEMATICS, APPLIED SIAM Journal on Numerical Analysis Pub Date : 2024-03-01 DOI:10.1137/22m1523765

Richard Archibald, Feng Bao, Yanzhao Cao, Hui Sun

{"title":"Numerical Analysis for Convergence of a Sample-Wise Backpropagation Method for Training Stochastic Neural Networks","authors":"Richard Archibald, Feng Bao, Yanzhao Cao, Hui Sun","doi":"10.1137/22m1523765","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Numerical Analysis, Volume 62, Issue 2, Page 593-621, April 2024. <br/> Abstract. The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807–2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.","PeriodicalId":49527,"journal":{"name":"SIAM Journal on Numerical Analysis","volume":"30 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Numerical Analysis","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1137/22m1523765","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

SIAM Journal on Numerical Analysis, Volume 62, Issue 2, Page 593-621, April 2024.
Abstract. The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807–2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于训练随机神经网络的采样-明智反向传播方法收敛性的数值分析

SIAM 数值分析期刊》第 62 卷第 2 期第 593-621 页，2024 年 4 月。摘要本文旨在对训练一类随机神经网络（SNN）的新型采样反向传播方法进行收敛性分析和算法实现。关于此类 SNN 框架的初步讨论最早见于 [Archibald 等人，Discrete Contin.Dyn.Syst.S, 15 (2022), pp.］SNN 的结构被表述为随机微分方程 (SDE) 的离散化。引入了一个随机最优控制框架来模拟训练过程，并应用了一种用于邻接后向 SDE 的采样近似方案来提高随机最优控制求解器的效率，该方案等同于用于训练 SNN 的反向传播。通过引入梯度过程的新型联合条件期望，得出了收敛性分析。在凸性假设下，我们的结果表明，在凸优化情况下，SNN 的训练步数应与层数的平方成正比。在使用基准 MNIST 数据集实现基于样本的 SNN 算法时，我们采用了卷积神经网络（CNN）架构，并证明我们的基于样本的 SNN 算法比传统的 CNN 更稳健。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

SIAM Journal on Numerical Analysis 数学-应用数学

CiteScore

4.80

自引率

6.90%

发文量

110

审稿时长

4-8 weeks

期刊介绍： SIAM Journal on Numerical Analysis (SINUM) contains research articles on the development and analysis of numerical methods. Topics include the rigorous study of convergence of algorithms, their accuracy, their stability, and their computational complexity. Also included are results in mathematical analysis that contribute to algorithm analysis, and computational results that demonstrate algorithm behavior and applicability.