Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2023-06-11 DOI:10.1109/AICAS57966.2023.10168618

Siqi He, Hongyi Zhang, Mengjie Li, Haozhe Zhu, Chixiao Chen, Qi Liu, Xiaoyang Zeng

{"title":"Bit-Offsetter: A Bit-serial DNN Accelerator with Weight-offset MAC for Bit-wise Sparsity Exploitation","authors":"Siqi He, Hongyi Zhang, Mengjie Li, Haozhe Zhu, Chixiao Chen, Qi Liu, Xiaoyang Zeng","doi":"10.1109/AICAS57966.2023.10168618","DOIUrl":null,"url":null,"abstract":"With the rapid evolution of deep neural networks (DNNs), the massive computational burden brings about the difficulty of deploying DNN on edge devices. This situation gives rise to specialized hardware aiming at exploiting the sparsity of DNN parameters. Bit-serial architectures (BSAs) possess great performance potential by leveraging the abundant bit-wise sparsity. However, the distribution of effective bits of weights confines the performance of BSA designs. To improve the efficiency of BSA, we propose a weight-offset multiply-accumulation (MAC) scheme and an associated hardware design called Bit-offsetter in this paper. Weight-offsetting not only significantly boosts bit-wise sparsity but also brings out a more balanced distribution of essential bits. For Bit-offsetter, aside from leveraging the abundant bitwise sparsity induced by weight-offsetting, it’s also equipped with a load-balancing scheduler to reduce idle cycles and mitigate utilization degradation. According to our experiment on a series of DNN models, weight-offsetting can increase bit-wise sparsity for pre-trained weight up to 77.4% on average. The weight-offset MAC scheme associated with Bit-offsetter achieves 3.28×/2.94× speedup/energy efficiency over the baseline.","PeriodicalId":296649,"journal":{"name":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS57966.2023.10168618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid evolution of deep neural networks (DNNs), the massive computational burden brings about the difficulty of deploying DNN on edge devices. This situation gives rise to specialized hardware aiming at exploiting the sparsity of DNN parameters. Bit-serial architectures (BSAs) possess great performance potential by leveraging the abundant bit-wise sparsity. However, the distribution of effective bits of weights confines the performance of BSA designs. To improve the efficiency of BSA, we propose a weight-offset multiply-accumulation (MAC) scheme and an associated hardware design called Bit-offsetter in this paper. Weight-offsetting not only significantly boosts bit-wise sparsity but also brings out a more balanced distribution of essential bits. For Bit-offsetter, aside from leveraging the abundant bitwise sparsity induced by weight-offsetting, it’s also equipped with a load-balancing scheduler to reduce idle cycles and mitigate utilization degradation. According to our experiment on a series of DNN models, weight-offsetting can increase bit-wise sparsity for pre-trained weight up to 77.4% on average. The weight-offset MAC scheme associated with Bit-offsetter achieves 3.28×/2.94× speedup/energy efficiency over the baseline.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

位偏移器:位串行DNN加速器，具有用于逐位稀疏性开发的权重偏移MAC

随着深度神经网络的快速发展，巨大的计算负担给在边缘设备上部署深度神经网络带来了困难。这种情况产生了专门的硬件，旨在利用深度神经网络参数的稀疏性。位串行体系结构(BSAs)利用了丰富的逐位稀疏性，具有巨大的性能潜力。然而，有效位权重的分布限制了BSA设计的性能。为了提高BSA的效率，本文提出了一种权重偏移乘积累(MAC)方案和一种相关的硬件设计，称为位偏移。权重偏移不仅显著提高了比特稀疏性，而且使基本比特的分布更加均衡。对于位偏移，除了利用由权重偏移引起的丰富的位稀疏性外，它还配备了负载平衡调度器，以减少空闲周期并减轻利用率下降。根据我们对一系列DNN模型的实验，权重偏移可以将预训练权重的按位稀疏度平均提高77.4%。与Bit-offsetter相关的权重偏移MAC方案在基线上实现了3.28×/2.94×的加速/能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量