{"title":"Flipping Bits to Share Crossbars in ReRAM-Based DNN Accelerator","authors":"Lei Zhao, Youtao Zhang, Jun Yang","doi":"10.1109/ICCD53106.2021.00016","DOIUrl":null,"url":null,"abstract":"Future deep neural networks (DNNs) tend to grow deeper and contain more trainable weights. Although methods such as pruning and quantization are widely adopted to reduce DNN’s model size and computation, they are less applicable in the area of ReRAM-based DNN accelerators. On the one hand, because the cells in crossbars are accessed uniformly, it is difficult to explore fine-grained pruning in ReRAM-based DNN accelerators. On the other hand, aggressive quantization results in poor accuracy coupled with the low precision of ReRAM cells to represent weight values.In this paper, we propose BFlip – a novel model size and computation reduction technique – to share crossbars among multiple bit matrices. BFlip clusters similar bit matrices together, and finds a combination of row and column flips for each bit matrix to minimize its distance to the centroid of the cluster. Therefore, only the centroid bit matrix is stored in the crossbar, which is shared by all other bit matrices in that cluster. We also propose a calibration method to improve the accuracy as well as a ReRAM-based DNN accelerator to fully reap the storage and computation benefits of BFlip. Our experiments show that BFlip effectively reduces model size and computation with negligible accuracy impact. The proposed accelerator achieves 2.45 × speedup and 85% energy reduction over the ISAAC baseline.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Future deep neural networks (DNNs) tend to grow deeper and contain more trainable weights. Although methods such as pruning and quantization are widely adopted to reduce DNN’s model size and computation, they are less applicable in the area of ReRAM-based DNN accelerators. On the one hand, because the cells in crossbars are accessed uniformly, it is difficult to explore fine-grained pruning in ReRAM-based DNN accelerators. On the other hand, aggressive quantization results in poor accuracy coupled with the low precision of ReRAM cells to represent weight values.In this paper, we propose BFlip – a novel model size and computation reduction technique – to share crossbars among multiple bit matrices. BFlip clusters similar bit matrices together, and finds a combination of row and column flips for each bit matrix to minimize its distance to the centroid of the cluster. Therefore, only the centroid bit matrix is stored in the crossbar, which is shared by all other bit matrices in that cluster. We also propose a calibration method to improve the accuracy as well as a ReRAM-based DNN accelerator to fully reap the storage and computation benefits of BFlip. Our experiments show that BFlip effectively reduces model size and computation with negligible accuracy impact. The proposed accelerator achieves 2.45 × speedup and 85% energy reduction over the ISAAC baseline.