Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview

arXiv - CS - Machine Learning Pub Date : 2024-09-18 DOI:arxiv-2409.11650

Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao

{"title":"Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview","authors":"Yanshu Wang, Tong Yang, Xiyan Liang, Guoan Wang, Hanning Lu, Xu Zhe, Yaoming Li, Li Weitao","doi":"arxiv-2409.11650","DOIUrl":null,"url":null,"abstract":"This paper provides a comprehensive overview of the principles, challenges,\nand methodologies associated with quantizing large-scale neural network models.\nAs neural networks have evolved towards larger and more complex architectures\nto address increasingly sophisticated tasks, the computational and energy costs\nhave escalated significantly. We explore the necessity and impact of model size\ngrowth, highlighting the performance benefits as well as the computational\nchallenges and environmental considerations. The core focus is on model\nquantization as a fundamental approach to mitigate these challenges by reducing\nmodel size and improving efficiency without substantially compromising\naccuracy. We delve into various quantization techniques, including both\npost-training quantization (PTQ) and quantization-aware training (QAT), and\nanalyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q),\nZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine\nhow these methods address issues like outliers, importance weighting, and\nactivation quantization, ultimately contributing to more sustainable and\naccessible deployment of large-scale models.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper provides a comprehensive overview of the principles, challenges, and methodologies associated with quantizing large-scale neural network models. As neural networks have evolved towards larger and more complex architectures to address increasingly sophisticated tasks, the computational and energy costs have escalated significantly. We explore the necessity and impact of model size growth, highlighting the performance benefits as well as the computational challenges and environmental considerations. The core focus is on model quantization as a fundamental approach to mitigate these challenges by reducing model size and improving efficiency without substantially compromising accuracy. We delve into various quantization techniques, including both post-training quantization (PTQ) and quantization-aware training (QAT), and analyze several state-of-the-art algorithms such as LLM-QAT, PEQA(L4Q), ZeroQuant, SmoothQuant, and others. Through comparative analysis, we examine how these methods address issues like outliers, importance weighting, and activation quantization, ultimately contributing to more sustainable and accessible deployment of large-scale models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

量化大规模模型的艺术与科学：全面概述

本文全面概述了量化大规模神经网络模型的相关原理、挑战和方法。随着神经网络向更大型、更复杂的架构演进，以解决日益复杂的任务，计算和能源成本显著上升。我们探讨了模型规模增长的必要性和影响，强调了性能优势以及计算挑战和环境因素。核心重点是将模型量化作为一种基本方法，通过缩小模型规模和提高效率来缓解这些挑战，同时又不大幅降低精度。我们深入研究了各种量化技术，包括训练后量化（PTQ）和量化感知训练（QAT），并分析了几种最先进的算法，如 LLM-QAT、PEQA(L4Q)、ZeroQuant、SmoothQuant 等。通过比较分析，我们研究了这些方法如何解决异常值、重要性加权和激活量化等问题，最终为更可持续、更可访问的大规模模型部署做出了贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量