{"title":"高效模型压缩的多目标凸量化","authors":"Chunxiao Fan;Dan Guo;Ziqi Wang;Meng Wang","doi":"10.1109/TPAMI.2024.3521589","DOIUrl":null,"url":null,"abstract":"Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2313-2329"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Objective Convex Quantization for Efficient Model Compression\",\"authors\":\"Chunxiao Fan;Dan Guo;Ziqi Wang;Meng Wang\",\"doi\":\"10.1109/TPAMI.2024.3521589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 4\",\"pages\":\"2313-2329\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10812914/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10812914/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Objective Convex Quantization for Efficient Model Compression
Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.