超压缩:通过超函数压缩模型

Fenglei Fan, Juntong Fan, Dayang Wang, Jingbo Zhang, Zelin Dong, Shijun Zhang, Ge Wang, Tieyong Zeng
{"title":"超压缩:通过超函数压缩模型","authors":"Fenglei Fan, Juntong Fan, Dayang Wang, Jingbo Zhang, Zelin Dong, Shijun Zhang, Ge Wang, Tieyong Zeng","doi":"arxiv-2409.00592","DOIUrl":null,"url":null,"abstract":"The rapid growth of large models' size has far outpaced that of GPU memory.\nTo bridge this gap, inspired by the succinct relationship between genotype and\nphenotype, we turn the model compression problem into the issue of parameter\nrepresentation to propose the so-called hyper-compression. The\nhyper-compression uses a hyperfunction to represent the parameters of the\ntarget network, and notably, here the hyperfunction is designed per ergodic\ntheory that relates to a problem: if a low-dimensional dynamic system can fill\nthe high-dimensional space eventually. Empirically, the proposed\nhyper-compression enjoys the following merits: 1) \\textbf{P}referable\ncompression ratio; 2) \\textbf{N}o post-hoc retraining; 3) \\textbf{A}ffordable\ninference time; and 4) \\textbf{S}hort compression time. It compresses LLaMA2-7B\nin an hour and achieves close-to-int4-quantization performance, without\nretraining and with a performance drop of less than 1\\%. Our work has the\npotential to invigorate the field of model compression, towards a harmony\nbetween the scaling law and the stagnation of hardware upgradation.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hyper-Compression: Model Compression via Hyperfunction\",\"authors\":\"Fenglei Fan, Juntong Fan, Dayang Wang, Jingbo Zhang, Zelin Dong, Shijun Zhang, Ge Wang, Tieyong Zeng\",\"doi\":\"arxiv-2409.00592\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid growth of large models' size has far outpaced that of GPU memory.\\nTo bridge this gap, inspired by the succinct relationship between genotype and\\nphenotype, we turn the model compression problem into the issue of parameter\\nrepresentation to propose the so-called hyper-compression. The\\nhyper-compression uses a hyperfunction to represent the parameters of the\\ntarget network, and notably, here the hyperfunction is designed per ergodic\\ntheory that relates to a problem: if a low-dimensional dynamic system can fill\\nthe high-dimensional space eventually. Empirically, the proposed\\nhyper-compression enjoys the following merits: 1) \\\\textbf{P}referable\\ncompression ratio; 2) \\\\textbf{N}o post-hoc retraining; 3) \\\\textbf{A}ffordable\\ninference time; and 4) \\\\textbf{S}hort compression time. It compresses LLaMA2-7B\\nin an hour and achieves close-to-int4-quantization performance, without\\nretraining and with a performance drop of less than 1\\\\%. Our work has the\\npotential to invigorate the field of model compression, towards a harmony\\nbetween the scaling law and the stagnation of hardware upgradation.\",\"PeriodicalId\":501168,\"journal\":{\"name\":\"arXiv - CS - Emerging Technologies\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.00592\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

为了弥补这一差距,受基因型和表型之间简洁关系的启发,我们将模型压缩问题转化为参数表示问题,提出了所谓的超压缩。超压缩使用一个超函数来表示目标网络的参数,值得注意的是,这里的超函数是根据与一个问题相关的麦格理论设计的:一个低维动态系统最终能否填充高维空间。从经验来看,所提出的超压缩具有以下优点:1)可取的压缩比;2)无需事后重新训练;3)可负担的推理时间;4)压缩时间短。它能在一小时内压缩 LLaMA2-7B,并实现接近int4 量化的性能,无需重新训练,性能下降不到 1%。我们的工作有可能为模型压缩领域注入新的活力,从而实现扩展规律与硬件升级停滞之间的和谐。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Hyper-Compression: Model Compression via Hyperfunction
The rapid growth of large models' size has far outpaced that of GPU memory. To bridge this gap, inspired by the succinct relationship between genotype and phenotype, we turn the model compression problem into the issue of parameter representation to propose the so-called hyper-compression. The hyper-compression uses a hyperfunction to represent the parameters of the target network, and notably, here the hyperfunction is designed per ergodic theory that relates to a problem: if a low-dimensional dynamic system can fill the high-dimensional space eventually. Empirically, the proposed hyper-compression enjoys the following merits: 1) \textbf{P}referable compression ratio; 2) \textbf{N}o post-hoc retraining; 3) \textbf{A}ffordable inference time; and 4) \textbf{S}hort compression time. It compresses LLaMA2-7B in an hour and achieves close-to-int4-quantization performance, without retraining and with a performance drop of less than 1\%. Our work has the potential to invigorate the field of model compression, towards a harmony between the scaling law and the stagnation of hardware upgradation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond Analysing Attacks on Blockchain Systems in a Layer-based Approach Exploring Utility in a Real-World Warehouse Optimization Problem: Formulation Based on Quantun Annealers and Preliminary Results High Definition Map Mapping and Update: A General Overview and Future Directions Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1