QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI:10.1109/ICCD53106.2021.00088

Shuyu Zhang, Donglei Wu, Haoyu Jin, Xiangyu Zou, Wen Xia, Xiaojia Huang

{"title":"QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks","authors":"Shuyu Zhang, Donglei Wu, Haoyu Jin, Xiangyu Zou, Wen Xia, Xiaojia Huang","doi":"10.1109/ICCD53106.2021.00088","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved remarkable success in many fields. Large-scale DNNs also bring storage challenges when storing snapshots for preventing clusters’ frequent failures, and bring massive internet traffic when dispatching or updating DNNs for resource-constrained devices (e.g., IoT devices, mobile phones). Several approaches are aiming to compress DNNs. The Recent work, Delta-DNN, notices high similarity existed in DNNs and thus calculates differences between them for improving the compression ratio.However, we observe that Delta-DNN, applying traditional global lossy quantization technique in calculating differences of two neighboring versions of the DNNs, can not fully exploit the data similarity between them for delta compression. This is because the parameters’ value ranges (and also the delta data in Delta-DNN) are varying among layers in DNNs, which inspires us to propose a local-sensitive quantization scheme: the quantizers are adaptive to parameters’ local value ranges in layers. Moreover, instead of quantizing differences of DNNs in Delta-DNN, our approach quantizes DNNs before calculating differences to make the differences more compressible. Besides, we also propose an error feedback mechanism to reduce DNNs’ accuracy loss caused by the lossy quantization.Therefore, we design a novel quantization-based delta compressor called QD-Compressor, which calculates the lossy differences between epochs of DNNs for saving storage cost of backing up DNNs’ snapshots and internet traffic of dispatching DNNs for resource-constrained devices. Experiments on several popular DNNs and datasets show that QD-Compressor obtains a compression ratio of 2.4× ~ 31.5× higher than the state-of-the-art approaches while well maintaining the model’s test accuracy.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"268 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Deep neural networks (DNNs) have achieved remarkable success in many fields. Large-scale DNNs also bring storage challenges when storing snapshots for preventing clusters’ frequent failures, and bring massive internet traffic when dispatching or updating DNNs for resource-constrained devices (e.g., IoT devices, mobile phones). Several approaches are aiming to compress DNNs. The Recent work, Delta-DNN, notices high similarity existed in DNNs and thus calculates differences between them for improving the compression ratio.However, we observe that Delta-DNN, applying traditional global lossy quantization technique in calculating differences of two neighboring versions of the DNNs, can not fully exploit the data similarity between them for delta compression. This is because the parameters’ value ranges (and also the delta data in Delta-DNN) are varying among layers in DNNs, which inspires us to propose a local-sensitive quantization scheme: the quantizers are adaptive to parameters’ local value ranges in layers. Moreover, instead of quantizing differences of DNNs in Delta-DNN, our approach quantizes DNNs before calculating differences to make the differences more compressible. Besides, we also propose an error feedback mechanism to reduce DNNs’ accuracy loss caused by the lossy quantization.Therefore, we design a novel quantization-based delta compressor called QD-Compressor, which calculates the lossy differences between epochs of DNNs for saving storage cost of backing up DNNs’ snapshots and internet traffic of dispatching DNNs for resource-constrained devices. Experiments on several popular DNNs and datasets show that QD-Compressor obtains a compression ratio of 2.4× ~ 31.5× higher than the state-of-the-art approaches while well maintaining the model’s test accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

QD-Compressor:基于量化的深度神经网络Delta压缩框架

深度神经网络(dnn)在许多领域取得了显著的成功。大规模dnn在存储快照以防止集群频繁故障时也会带来存储挑战，并且在为资源受限的设备(例如物联网设备，移动电话)调度或更新dnn时带来巨大的互联网流量。有几种方法旨在压缩dnn。最近的工作Delta-DNN注意到dnn之间存在高度相似性，从而计算它们之间的差异以提高压缩比。然而，我们发现delta - dnn在计算两个相邻版本的dnn的差异时，采用传统的全局有损量化技术，不能充分利用它们之间的数据相似性进行delta压缩。这是因为dnn中参数的取值范围(以及delta - dnn中的delta数据)在各层之间是不同的，这启发我们提出了一种局部敏感的量化方案:量化器自适应各层参数的局部取值范围。此外，我们的方法不是量化Delta-DNN中dnn的差异，而是在计算差异之前量化dnn，使差异更可压缩。此外，我们还提出了一种误差反馈机制，以减少有耗量化给dnn带来的精度损失。因此，我们设计了一种新的基于量化的增量压缩器，称为QD-Compressor，它计算dnn的时代之间的有损差异，以节省备份dnn快照的存储成本和资源受限设备调度dnn的互联网流量。在几个流行的深度神经网络和数据集上的实验表明，QD-Compressor在保持模型测试精度的同时，获得了比现有方法高2.4× ~ 31.5×的压缩比。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 39th International Conference on Computer Design (ICCD)

自引率

0.00%

发文量