Latent Weight Quantization for Integerized Training of Deep Neural Networks

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-01-09 DOI:10.1109/TPAMI.2025.3527498

Wen Fei;Wenrui Dai;Liang Zhang;Luoming Zhang;Chenglin Li;Junni Zou;Hongkai Xiong

{"title":"Latent Weight Quantization for Integerized Training of Deep Neural Networks","authors":"Wen Fei;Wenrui Dai;Liang Zhang;Luoming Zhang;Chenglin Li;Junni Zou;Hongkai Xiong","doi":"10.1109/TPAMI.2025.3527498","DOIUrl":null,"url":null,"abstract":"Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2816-2832"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10834560/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing methods for integerized training speed up deep learning by using low-bitwidth integerized weights, activations, gradients, and optimizer buffers. However, they overlook the issue of full-precision latent weights, which consume excessive memory to accumulate gradient-based updates for optimizing the integerized weights. In this paper, we propose the first latent weight quantization schema for general integerized training, which minimizes quantization perturbation to training process via residual quantization with optimized dual quantizer. We leverage residual quantization to eliminate the correlation between latent weight and integerized weight for suppressing quantization noise. We further propose dual quantizer with optimal nonuniform codebook to avoid frozen weight and ensure statistically unbiased training trajectory as full-precision latent weight. The codebook is optimized to minimize the disturbance on weight update under importance guidance and achieved with a three-segment polyline approximation for hardware-friendly implementation. Extensive experiments show that the proposed schema allows integerized training with lowest 4-bit latent weight for various architectures including ResNets, MobileNetV2, and Transformers, and yields negligible performance loss in image classification and text generation. Furthermore, we successfully fine-tune Large Language Models with up to 13 billion parameters on one single GPU using the proposed schema.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度神经网络综合训练的潜在权量化

现有的集成训练方法通过使用低位宽集成权重、激活、梯度和优化器缓冲来加速深度学习。然而，他们忽略了全精度潜在权值的问题，这消耗了过多的内存来积累基于梯度的更新以优化集成权值。本文提出了通用综合训练的第一种潜在权量化模式，该模式通过优化的对偶量化器残差量化，最大限度地减少了量化对训练过程的扰动。我们利用残差量化来消除潜在权值和积分权值之间的相关性，从而抑制量化噪声。我们进一步提出了具有最优非均匀码本的双量化器，以避免权值冻结，并确保训练轨迹作为全精度潜在权值在统计上无偏。在重要性指导下，优化了码本以最小化权值更新的干扰，并采用三段折线近似实现了硬件友好的实现。大量的实验表明，所提出的模式允许以最低的4位潜在权值对各种架构（包括ResNets， MobileNetV2和Transformers）进行集成训练，并且在图像分类和文本生成方面的性能损失可以忽略不计。此外，我们使用提出的模式在单个GPU上成功地微调了具有多达130亿个参数的大型语言模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量