{"title":"Structured segment rescaling with Gaussian processes for parameter efficient ConvNets","authors":"Bilal Siddiqui , Adel Alaeddini , Dakai Zhu","doi":"10.1016/j.sysarc.2024.103246","DOIUrl":null,"url":null,"abstract":"<div><p>We introduce a novel mechanism for structured pruning on ConvNet blocks and channels. Our mechanism, <em>Structured Segment Rescaling (SSR)</em> down-samples a ConvNet’s dimensions using depth and width modifiers that respectively remove whole blocks and channels. SSR is a systematic approach for constructing ConvNets that can replace arbitrary design heuristics. The SSR modifiers rescale logical partitions (segments) of a ConvNet with grouped layers. Different modifiers on segments yield many different architectures with unique rescales for their blocks. This diversity of architectures is then systemically explored using a Gaussian Process (GP) that optimizes for modifiers that maintain accuracy and reduce parameters. We analyze SSR in the context of resource constrained environments using ResNets trained on the CIFAR datasets. An initial set of <em>depth</em> and <em>width</em> modifiers explore extreme rescales of ResNet segments, where we find up to 70% parameter reduction. The GP then generalizes on these initial rescales by being trained on them and then predicts the accuracy of other rescaled ConvNet given their segment modifiers. SSR produces over <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>5</mn></mrow></msup></mrow></math></span> ConvNets that can be trained selectively based on their GP predicted accuracy. The GP enabled SSR pushes compression to over 80% with minimal accuracy impact. While both depth and width modifiers can reduce parameters, we show reducing blocks is better for reducing latency with up to 80% faster ConvNets. Using our mechanism, we can efficiently customize ConvNets using their parameter-accuracy trade-offs. SSR only requires <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mn>1</mn></mrow></msup></mrow></math></span> GPU hours and modest engineering to yield efficient new ConvNets that can facilitate edge inference.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"154 ","pages":"Article 103246"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001838","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce a novel mechanism for structured pruning on ConvNet blocks and channels. Our mechanism, Structured Segment Rescaling (SSR) down-samples a ConvNet’s dimensions using depth and width modifiers that respectively remove whole blocks and channels. SSR is a systematic approach for constructing ConvNets that can replace arbitrary design heuristics. The SSR modifiers rescale logical partitions (segments) of a ConvNet with grouped layers. Different modifiers on segments yield many different architectures with unique rescales for their blocks. This diversity of architectures is then systemically explored using a Gaussian Process (GP) that optimizes for modifiers that maintain accuracy and reduce parameters. We analyze SSR in the context of resource constrained environments using ResNets trained on the CIFAR datasets. An initial set of depth and width modifiers explore extreme rescales of ResNet segments, where we find up to 70% parameter reduction. The GP then generalizes on these initial rescales by being trained on them and then predicts the accuracy of other rescaled ConvNet given their segment modifiers. SSR produces over ConvNets that can be trained selectively based on their GP predicted accuracy. The GP enabled SSR pushes compression to over 80% with minimal accuracy impact. While both depth and width modifiers can reduce parameters, we show reducing blocks is better for reducing latency with up to 80% faster ConvNets. Using our mechanism, we can efficiently customize ConvNets using their parameter-accuracy trade-offs. SSR only requires GPU hours and modest engineering to yield efficient new ConvNets that can facilitate edge inference.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.