Practical Network Acceleration With Tiny Sets: Hypothesis, Theory, and Algorithm

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-06-25 DOI:10.1109/TPAMI.2024.3418999

Guo-Hua Wang;Jianxin Wu

{"title":"Practical Network Acceleration With Tiny Sets: Hypothesis, Theory, and Algorithm","authors":"Guo-Hua Wang;Jianxin Wu","doi":"10.1109/TPAMI.2024.3418999","DOIUrl":null,"url":null,"abstract":"Due to data privacy issues, accelerating networks with tiny training sets has become a critical need in practice. Previous methods achieved promising results empirically by filter-level pruning. In this paper, we both study this problem theoretically and propose an effective algorithm aligning well with our theoretical results. First, we propose the finetune convexity hypothesis to explain why recent few-shot compression algorithms do not suffer from overfitting problems. Based on it, a theory is further established to explain these methods for the first time. Compared to naively finetuning a pruned network, feature mimicking is proved to achieve a lower variance of parameters and hence enjoys easier optimization. With our theoretical conclusions, we claim dropping blocks is a fundamentally superior few-shot compression scheme in terms of more convex optimization and a higher acceleration ratio. To choose which blocks to drop, we propose a new metric, recoverability, to effectively measure the difficulty of recovering the compressed network. Finally, we propose an algorithm named \n<sc>Practise</small>\n to accelerate networks using only tiny sets of training images. \n<sc>Practise</small>\n outperforms previous methods by a significant margin. For 22% latency reduction, \n<sc>Practise</small>\n surpasses previous methods by on average 7 percentage points on ImageNet-1k. It also enjoys high generalization ability, working well under data-free or out-of-domain data settings, too.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"9272-9285"},"PeriodicalIF":18.6000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10571608/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Due to data privacy issues, accelerating networks with tiny training sets has become a critical need in practice. Previous methods achieved promising results empirically by filter-level pruning. In this paper, we both study this problem theoretically and propose an effective algorithm aligning well with our theoretical results. First, we propose the finetune convexity hypothesis to explain why recent few-shot compression algorithms do not suffer from overfitting problems. Based on it, a theory is further established to explain these methods for the first time. Compared to naively finetuning a pruned network, feature mimicking is proved to achieve a lower variance of parameters and hence enjoys easier optimization. With our theoretical conclusions, we claim dropping blocks is a fundamentally superior few-shot compression scheme in terms of more convex optimization and a higher acceleration ratio. To choose which blocks to drop, we propose a new metric, recoverability, to effectively measure the difficulty of recovering the compressed network. Finally, we propose an algorithm named Practise to accelerate networks using only tiny sets of training images. Practise outperforms previous methods by a significant margin. For 22% latency reduction, Practise surpasses previous methods by on average 7 percentage points on ImageNet-1k. It also enjoys high generalization ability, working well under data-free or out-of-domain data settings, too.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用微小集的实用网络加速：假设、理论和算法

由于数据隐私问题，利用极小的训练集加速网络已成为实践中的一个关键需求。以前的方法通过滤波器级剪枝取得了可喜的经验成果。在本文中，我们从理论上研究了这一问题，并提出了一种与理论结果相一致的有效算法。首先，我们提出了微调凸性假说，以解释为什么最近的几次压缩算法不会出现过拟合问题。在此基础上，我们进一步建立了理论，首次解释了这些方法。事实证明，与天真地对剪枝网络进行微调相比，特征模仿能获得更低的参数方差，因此更容易优化。根据我们的理论结论，我们认为从根本上讲，丢弃数据块是一种更优越的少量压缩方案，因为它能实现更多的凸优化和更高的加速比。为了选择丢弃哪些区块，我们提出了一个新指标--可恢复性，以有效衡量恢复压缩网络的难度。最后，我们提出了一种名为 Practise 的算法，仅使用极小的训练图像集来加速网络。Practise 明显优于之前的方法。在 ImageNet-1k 上，Practise 的延迟降低了 22%，平均比以前的方法高出 7 个百分点。它还具有很强的泛化能力，在无数据或域外数据设置下也能很好地工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量