Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview

Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao
{"title":"Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview","authors":"Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao","doi":"arxiv-2409.11650","DOIUrl":null,"url":null,"abstract":"This paper provides a comprehensive overview of the principles, challenges,\nand methodologies associated with quantizing large-scale neural network models.\nAs neural networks have evolved towards larger and more complex architectures\nto address increasingly sophisticated tasks, the computational and energy costs\nhave escalated significantly. We explore the necessity and impact of model size\ngrowth, highlighting the performance benefits as well as the computational\nchallenges and environmental considerations. The core focus is on model\nquantization as a fundamental approach to mitigate these challenges by reducing\nmodel size and improving efficiency without substantially compromising\naccuracy. We delve into various quantization techniques, including both\npost-training quantization (PTQ) and quantization-aware training (QAT), and\nanalyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),\nZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine\nhow these methods address issues like outliers, importance weighting, and\nactivation quantization, ultimately contributing to more sustainable and\naccessible deployment of large-scale models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper provides a comprehensive overview of the principles, challenges, and methodologies associated with quantizing large-scale neural network models. As neural networks have evolved towards larger and more complex architectures to address increasingly sophisticated tasks, the computational and energy costs have escalated significantly. We explore the necessity and impact of model size growth, highlighting the performance benefits as well as the computational challenges and environmental considerations. The core focus is on model quantization as a fundamental approach to mitigate these challenges by reducing model size and improving efficiency without substantially compromising accuracy. We delve into various quantization techniques, including both post-training quantization (PTQ) and quantization-aware training (QAT), and analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q), ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine how these methods address issues like outliers, importance weighting, and activation quantization, ultimately contributing to more sustainable and accessible deployment of large-scale models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
量化大规模模型的艺术与科学:全面概述
本文全面概述了量化大规模神经网络模型的相关原理、挑战和方法。随着神经网络向更大型、更复杂的架构演进,以解决日益复杂的任务,计算和能源成本显著上升。我们探讨了模型规模增长的必要性和影响,强调了性能优势以及计算挑战和环境因素。核心重点是将模型量化作为一种基本方法,通过缩小模型规模和提高效率来缓解这些挑战,同时又不大幅降低精度。我们深入研究了各种量化技术,包括训练后量化(PTQ)和量化感知训练(QAT),并分析了几种最先进的算法,如 LLM-QAT、PEQA(L4Q)、ZeroQuant、SmoothQuant 等。通过比较分析,我们研究了这些方法如何解决异常值、重要性加权和激活量化等问题,最终为更可持续、更可访问的大规模模型部署做出了贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features The Impact of Element Ordering on LM Agent Performance Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques Extended Deep Submodular Functions Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1