Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Andrew Howard
{"title":"Robust Training of Neural Networks at Arbitrary Precision and Sparsity","authors":"Chengxi Ye, Grace Chu, Yanfeng Liu, Yichi Zhang, Lukasz Lew, Andrew Howard","doi":"arxiv-2409.09245","DOIUrl":null,"url":null,"abstract":"The discontinuous operations inherent in quantization and sparsification\nintroduce obstacles to backpropagation. This is particularly challenging when\ntraining deep neural networks in ultra-low precision and sparse regimes. We\npropose a novel, robust, and universal solution: a denoising affine transform\nthat stabilizes training under these challenging conditions. By formulating\nquantization and sparsification as perturbations during training, we derive a\nperturbation-resilient approach based on ridge regression. Our solution employs\na piecewise constant backbone model to ensure a performance lower bound and\nfeatures an inherent noise reduction mechanism to mitigate perturbation-induced\ncorruption. This formulation allows existing models to be trained at\narbitrarily low precision and sparsity levels with off-the-shelf recipes.\nFurthermore, our method provides a novel perspective on training temporal\nbinary neural networks, contributing to ongoing efforts to narrow the gap\nbetween artificial and biological neural networks.","PeriodicalId":501162,"journal":{"name":"arXiv - MATH - Numerical Analysis","volume":"82 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Numerical Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The discontinuous operations inherent in quantization and sparsification
introduce obstacles to backpropagation. This is particularly challenging when
training deep neural networks in ultra-low precision and sparse regimes. We
propose a novel, robust, and universal solution: a denoising affine transform
that stabilizes training under these challenging conditions. By formulating
quantization and sparsification as perturbations during training, we derive a
perturbation-resilient approach based on ridge regression. Our solution employs
a piecewise constant backbone model to ensure a performance lower bound and
features an inherent noise reduction mechanism to mitigate perturbation-induced
corruption. This formulation allows existing models to be trained at
arbitrarily low precision and sparsity levels with off-the-shelf recipes.
Furthermore, our method provides a novel perspective on training temporal
binary neural networks, contributing to ongoing efforts to narrow the gap
between artificial and biological neural networks.