Robust Training of Neural Networks at Arbitrary Precision and Sparsity

arXiv - MATH - Numerical Analysis Pub Date : 2024-09-14 DOI:arxiv-2409.09245

Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Andrew Howard

引用次数: 0

Abstract

The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation. This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes. We propose a novel, robust, and universal solution: a denoising affine transform that stabilizes training under these challenging conditions. By formulating quantization and sparsification as perturbations during training, we derive a perturbation-resilient approach based on ridge regression. Our solution employs a piecewise constant backbone model to ensure a performance lower bound and features an inherent noise reduction mechanism to mitigate perturbation-induced corruption. This formulation allows existing models to be trained at arbitrarily low precision and sparsity levels with off-the-shelf recipes. Furthermore, our method provides a novel perspective on training temporal binary neural networks, contributing to ongoing efforts to narrow the gap between artificial and biological neural networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

以任意精度和稀疏度进行神经网络的鲁棒性训练

量化和稀疏化固有的不连续操作给反向传播带来了障碍。在超低精度和稀疏状态下训练深度神经网络时，这尤其具有挑战性。我们提出了一种新颖、稳健和通用的解决方案：去噪仿射变换，它能在这些具有挑战性的条件下稳定训练。通过将量化和稀疏化表述为训练过程中的扰动，我们得出了一种基于脊回归的抗扰动方法。我们的解决方案采用片断常数骨干模型来确保性能下限，并具有内在降噪机制来减轻扰动引起的破坏。此外，我们的方法为时空二元神经网络的训练提供了一个新的视角，有助于缩小人工神经网络与生物神经网络之间的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - MATH - Numerical Analysis

自引率

0.00%

发文量