Projected Forward Gradient-Guided Frank-Wolfe Algorithm via Variance Reduction

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS IEEE Control Systems Letters Pub Date : 2024-12-26 DOI:10.1109/LCSYS.2024.3523243

Mohammadreza Rostami;Solmaz S. Kia

引用次数: 0

Abstract

This letter aims to enhance the use of the Frank-Wolfe (FW) algorithm for training deep neural networks. Similar to any gradient-based optimization algorithm, FW suffers from high computational and memory costs when computing gradients for DNNs. This letter introduces the application of the recently proposed projected forward gradient (Projected-FG) method to the FW framework, offering reduced computational cost similar to backpropagation and low memory utilization akin to forward propagation. Our results show that trivial application of the Projected-FG introduces non-vanishing convergence error due to the stochastic noise that the Projected-FG method introduces in the process. This noise results in an non-vanishing variance in the Projected-FG estimated gradient. To address this, we propose a variance reduction approach by aggregating historical Projected-FG directions. We demonstrate rigorously that this approach ensures convergence to the optimal solution for convex functions and to a stationary point for non-convex functions. Simulations demonstrate our results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于方差缩减的投影前向梯度引导Frank-Wolfe算法

这封信的目的是加强使用Frank-Wolfe （FW）算法来训练深度神经网络。与任何基于梯度的优化算法一样，FW在计算dnn的梯度时需要耗费大量的计算和内存。这封信介绍了最近提出的投影前向梯度（投影- fg）方法在FW框架中的应用，它提供了类似于反向传播的更低的计算成本和类似于前向传播的低内存利用率。我们的结果表明，由于投影- fg方法在过程中引入的随机噪声，投影- fg方法的平凡应用会引入非消失收敛误差。这种噪声导致投影- fg估计梯度的方差不消失。为了解决这个问题，我们提出了一种通过汇总历史project - fg方向来减少方差的方法。我们严格地证明了这种方法确保收敛到凸函数的最优解和非凸函数的平稳点。仿真验证了我们的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊