{"title":"资源极度受限的二元神经网络压缩累加器","authors":"Azat Azamat, Jaewoo Park, Jongeun Lee","doi":"10.1145/3508352.3549418","DOIUrl":null,"url":null,"abstract":"The cost and power consumption of BNN (Binarized Neural Network) hardware is dominated by additions. In particular, accumulators account for a large fraction of hardware overhead, which could be effectively reduced by using reduced-width accumulators. However, it is not straightforward to find the optimal accumulator width due to the complex interplay between width, scale, and the effect of training. In this paper we present algorithmic and hardware-level methods to find the optimal accumulator size for BNN hardware with minimal impact on the quality of result. First, we present partial sum scaling, a top-down approach to minimize the BNN accumulator size based on advanced quantization techniques. We also present an efficient, zero-overhead hardware design for partial sum scaling. Second, we evaluate a bottom-up approach that is to use saturating accumulator, which is more robust against overflows. Our experimental results using CIFAR-10 dataset demonstrate that our partial sum scaling along with our optimized accumulator architecture can reduce the area and power consumption of datapath by 15.50% and 27.03%, respectively, with little impact on inference performance (less than 2%), compared to using 16-bit accumulator.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"62 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications\",\"authors\":\"Azat Azamat, Jaewoo Park, Jongeun Lee\",\"doi\":\"10.1145/3508352.3549418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The cost and power consumption of BNN (Binarized Neural Network) hardware is dominated by additions. In particular, accumulators account for a large fraction of hardware overhead, which could be effectively reduced by using reduced-width accumulators. However, it is not straightforward to find the optimal accumulator width due to the complex interplay between width, scale, and the effect of training. In this paper we present algorithmic and hardware-level methods to find the optimal accumulator size for BNN hardware with minimal impact on the quality of result. First, we present partial sum scaling, a top-down approach to minimize the BNN accumulator size based on advanced quantization techniques. We also present an efficient, zero-overhead hardware design for partial sum scaling. Second, we evaluate a bottom-up approach that is to use saturating accumulator, which is more robust against overflows. Our experimental results using CIFAR-10 dataset demonstrate that our partial sum scaling along with our optimized accumulator architecture can reduce the area and power consumption of datapath by 15.50% and 27.03%, respectively, with little impact on inference performance (less than 2%), compared to using 16-bit accumulator.\",\"PeriodicalId\":270592,\"journal\":{\"name\":\"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)\",\"volume\":\"62 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508352.3549418\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications
The cost and power consumption of BNN (Binarized Neural Network) hardware is dominated by additions. In particular, accumulators account for a large fraction of hardware overhead, which could be effectively reduced by using reduced-width accumulators. However, it is not straightforward to find the optimal accumulator width due to the complex interplay between width, scale, and the effect of training. In this paper we present algorithmic and hardware-level methods to find the optimal accumulator size for BNN hardware with minimal impact on the quality of result. First, we present partial sum scaling, a top-down approach to minimize the BNN accumulator size based on advanced quantization techniques. We also present an efficient, zero-overhead hardware design for partial sum scaling. Second, we evaluate a bottom-up approach that is to use saturating accumulator, which is more robust against overflows. Our experimental results using CIFAR-10 dataset demonstrate that our partial sum scaling along with our optimized accumulator architecture can reduce the area and power consumption of datapath by 15.50% and 27.03%, respectively, with little impact on inference performance (less than 2%), compared to using 16-bit accumulator.