Practical Network Acceleration With Tiny Sets: Hypothesis, Theory, and Algorithm

Guo-Hua Wang;Jianxin Wu
{"title":"Practical Network Acceleration With Tiny Sets: Hypothesis, Theory, and Algorithm","authors":"Guo-Hua Wang;Jianxin Wu","doi":"10.1109/TPAMI.2024.3418999","DOIUrl":null,"url":null,"abstract":"Due to data privacy issues, accelerating networks with tiny training sets has become a critical need in practice. Previous methods achieved promising results empirically by filter-level pruning. In this paper, we both study this problem theoretically and propose an effective algorithm aligning well with our theoretical results. First, we propose the finetune convexity hypothesis to explain why recent few-shot compression algorithms do not suffer from overfitting problems. Based on it, a theory is further established to explain these methods for the first time. Compared to naively finetuning a pruned network, feature mimicking is proved to achieve a lower variance of parameters and hence enjoys easier optimization. With our theoretical conclusions, we claim dropping blocks is a fundamentally superior few-shot compression scheme in terms of more convex optimization and a higher acceleration ratio. To choose which blocks to drop, we propose a new metric, recoverability, to effectively measure the difficulty of recovering the compressed network. Finally, we propose an algorithm named \n<sc>Practise</small>\n to accelerate networks using only tiny sets of training images. \n<sc>Practise</small>\n outperforms previous methods by a significant margin. For 22% latency reduction, \n<sc>Practise</small>\n surpasses previous methods by on average 7 percentage points on ImageNet-1k. It also enjoys high generalization ability, working well under data-free or out-of-domain data settings, too.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"9272-9285"},"PeriodicalIF":18.6000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10571608/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Due to data privacy issues, accelerating networks with tiny training sets has become a critical need in practice. Previous methods achieved promising results empirically by filter-level pruning. In this paper, we both study this problem theoretically and propose an effective algorithm aligning well with our theoretical results. First, we propose the finetune convexity hypothesis to explain why recent few-shot compression algorithms do not suffer from overfitting problems. Based on it, a theory is further established to explain these methods for the first time. Compared to naively finetuning a pruned network, feature mimicking is proved to achieve a lower variance of parameters and hence enjoys easier optimization. With our theoretical conclusions, we claim dropping blocks is a fundamentally superior few-shot compression scheme in terms of more convex optimization and a higher acceleration ratio. To choose which blocks to drop, we propose a new metric, recoverability, to effectively measure the difficulty of recovering the compressed network. Finally, we propose an algorithm named Practise to accelerate networks using only tiny sets of training images. Practise outperforms previous methods by a significant margin. For 22% latency reduction, Practise surpasses previous methods by on average 7 percentage points on ImageNet-1k. It also enjoys high generalization ability, working well under data-free or out-of-domain data settings, too.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用微小集的实用网络加速:假设、理论和算法
由于数据隐私问题,利用极小的训练集加速网络已成为实践中的一个关键需求。以前的方法通过滤波器级剪枝取得了可喜的经验成果。在本文中,我们从理论上研究了这一问题,并提出了一种与理论结果相一致的有效算法。首先,我们提出了微调凸性假说,以解释为什么最近的几次压缩算法不会出现过拟合问题。在此基础上,我们进一步建立了理论,首次解释了这些方法。事实证明,与天真地对剪枝网络进行微调相比,特征模仿能获得更低的参数方差,因此更容易优化。根据我们的理论结论,我们认为从根本上讲,丢弃数据块是一种更优越的少量压缩方案,因为它能实现更多的凸优化和更高的加速比。为了选择丢弃哪些区块,我们提出了一个新指标--可恢复性,以有效衡量恢复压缩网络的难度。最后,我们提出了一种名为 Practise 的算法,仅使用极小的训练图像集来加速网络。Practise 明显优于之前的方法。在 ImageNet-1k 上,Practise 的延迟降低了 22%,平均比以前的方法高出 7 个百分点。它还具有很强的泛化能力,在无数据或域外数据设置下也能很好地工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models. Neural Eigenfunctions are Structured Representation Learners. SSD: Making Face Forgery Clues Evident Again With Self-Steganographic Detection. Disentangling Consistent and Specific Information for Double Incomplete Multi-View Multi-Label Classification. Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1