Shiguang Wang , Zhongyu Zhang , Guo Ai , Jian Cheng
{"title":"采用共同学习和一次性搜索的可部署混合精度量化技术","authors":"Shiguang Wang , Zhongyu Zhang , Guo Ai , Jian Cheng","doi":"10.1016/j.neunet.2024.106812","DOIUrl":null,"url":null,"abstract":"<div><div>Mixed-precision quantization plays a pivotal role in deploying deep neural networks in resource-constrained environments. However, the task of finding the optimal bit-width configurations for different layers under <strong>deployable mixed-precision quantization</strong> has barely been explored and remains a challenge. In this work, we present Cobits, an efficient and effective deployable mixed-precision quantization framework based on the relationship between the range of real-valued input and the range of quantized real-valued. It assigns a higher bit-width to the quantizer with a narrower quantized real-valued range and a lower bit-width to the quantizer with a wider quantized real-valued range. Cobits employs a co-learning approach to entangle and learn quantization parameters across various bit-widths, distinguishing between shared and specific parts. The shared part collaborates, while the specific part isolates precision conflicts. Additionally, we upgrade the normal quantizer to dynamic quantizer to mitigate statistical issues in the deployable mixed-precision supernet. Over the trained mixed-precision supernet, we utilize the quantized real-valued ranges to derive <em>quantized-bit-sensitivity</em>, which can serve as importance indicators for efficiently determining bit-width configurations, eliminating the need for iterative validation dataset evaluations. Extensive experiments show that Cobits outperforms previous state-of-the-art quantization methods on the ImageNet and COCO datasets while retaining superior efficiency. We show this approach dynamically adapts to varying bit-width and can generalize to various deployable backends. The code will be made public in <span><span>https://github.com/sunnyxiaohu/cobits</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"181 ","pages":"Article 106812"},"PeriodicalIF":6.0000,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deployable mixed-precision quantization with co-learning and one-time search\",\"authors\":\"Shiguang Wang , Zhongyu Zhang , Guo Ai , Jian Cheng\",\"doi\":\"10.1016/j.neunet.2024.106812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Mixed-precision quantization plays a pivotal role in deploying deep neural networks in resource-constrained environments. However, the task of finding the optimal bit-width configurations for different layers under <strong>deployable mixed-precision quantization</strong> has barely been explored and remains a challenge. In this work, we present Cobits, an efficient and effective deployable mixed-precision quantization framework based on the relationship between the range of real-valued input and the range of quantized real-valued. It assigns a higher bit-width to the quantizer with a narrower quantized real-valued range and a lower bit-width to the quantizer with a wider quantized real-valued range. Cobits employs a co-learning approach to entangle and learn quantization parameters across various bit-widths, distinguishing between shared and specific parts. The shared part collaborates, while the specific part isolates precision conflicts. Additionally, we upgrade the normal quantizer to dynamic quantizer to mitigate statistical issues in the deployable mixed-precision supernet. Over the trained mixed-precision supernet, we utilize the quantized real-valued ranges to derive <em>quantized-bit-sensitivity</em>, which can serve as importance indicators for efficiently determining bit-width configurations, eliminating the need for iterative validation dataset evaluations. Extensive experiments show that Cobits outperforms previous state-of-the-art quantization methods on the ImageNet and COCO datasets while retaining superior efficiency. We show this approach dynamically adapts to varying bit-width and can generalize to various deployable backends. The code will be made public in <span><span>https://github.com/sunnyxiaohu/cobits</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"181 \",\"pages\":\"Article 106812\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608024007366\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024007366","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Deployable mixed-precision quantization with co-learning and one-time search
Mixed-precision quantization plays a pivotal role in deploying deep neural networks in resource-constrained environments. However, the task of finding the optimal bit-width configurations for different layers under deployable mixed-precision quantization has barely been explored and remains a challenge. In this work, we present Cobits, an efficient and effective deployable mixed-precision quantization framework based on the relationship between the range of real-valued input and the range of quantized real-valued. It assigns a higher bit-width to the quantizer with a narrower quantized real-valued range and a lower bit-width to the quantizer with a wider quantized real-valued range. Cobits employs a co-learning approach to entangle and learn quantization parameters across various bit-widths, distinguishing between shared and specific parts. The shared part collaborates, while the specific part isolates precision conflicts. Additionally, we upgrade the normal quantizer to dynamic quantizer to mitigate statistical issues in the deployable mixed-precision supernet. Over the trained mixed-precision supernet, we utilize the quantized real-valued ranges to derive quantized-bit-sensitivity, which can serve as importance indicators for efficiently determining bit-width configurations, eliminating the need for iterative validation dataset evaluations. Extensive experiments show that Cobits outperforms previous state-of-the-art quantization methods on the ImageNet and COCO datasets while retaining superior efficiency. We show this approach dynamically adapts to varying bit-width and can generalize to various deployable backends. The code will be made public in https://github.com/sunnyxiaohu/cobits.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.