BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Embedded Computing Systems Pub Date : 2023-09-09 DOI:10.1145/3609093

Yunjie Pan, Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke

{"title":"BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks","authors":"Yunjie Pan, Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke","doi":"10.1145/3609093","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at the cost of substantial computation and energy consumption, making it difficult to be deployed on mobile and embedded devices. In CNNs, the compute-intensive convolutional layers are usually followed by a ReLU activation layer, which clamps negative outputs to zeros, resulting in large activation sparsity. By exploiting such sparsity in CNN models, we propose a software-hardware co-design BitSET, that aggressively saves energy during CNN inference. The bit-serial BitSET accelerator adopts a prediction-based bit-level early termination technique that terminates the ineffectual computation of negative outputs early. To assist the algorithm, we propose a novel weight encoding that allows more accurate predictions with fewer bits. BitSET leverages the bit-level computation reduction both in the predictive early termination algorithm and in the non-predictive, energy-efficient bit-serial architecture. Compared to UNPU, an energy-efficient bit-serial CNN accelerator, BitSET yields an average 1.5× speedup and 1.4× energy efficiency improvement with no accuracy loss due to a 48% reduction in bit-level computations. Relaxing the allowed accuracy loss to 1% increases the gains to an average of 1.6× speedup and 1.4× energy efficiency improvement.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"44 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609093","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at the cost of substantial computation and energy consumption, making it difficult to be deployed on mobile and embedded devices. In CNNs, the compute-intensive convolutional layers are usually followed by a ReLU activation layer, which clamps negative outputs to zeros, resulting in large activation sparsity. By exploiting such sparsity in CNN models, we propose a software-hardware co-design BitSET, that aggressively saves energy during CNN inference. The bit-serial BitSET accelerator adopts a prediction-based bit-level early termination technique that terminates the ineffectual computation of negative outputs early. To assist the algorithm, we propose a novel weight encoding that allows more accurate predictions with fewer bits. BitSET leverages the bit-level computation reduction both in the predictive early termination algorithm and in the non-predictive, energy-efficient bit-serial architecture. Compared to UNPU, an energy-efficient bit-serial CNN accelerator, BitSET yields an average 1.5× speedup and 1.4× energy efficiency improvement with no accuracy loss due to a 48% reduction in bit-level computations. Relaxing the allowed accuracy loss to 1% increases the gains to an average of 1.6× speedup and 1.4× energy efficiency improvement.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BitSET:用于卷积神经网络减少计算量的位序列提前终止

卷积神经网络(cnn)在广泛的机器学习任务中表现出了卓越的性能。然而，高精度通常是以大量的计算和能源消耗为代价的，这使得它很难部署在移动和嵌入式设备上。在cnn中，计算密集型的卷积层之后通常是一个ReLU激活层，它将负输出箝制为零，从而产生很大的激活稀疏性。通过利用CNN模型中的这种稀疏性，我们提出了一种软硬件协同设计的BitSET，它在CNN推理过程中大大节省了能量。位串行BitSET加速器采用基于预测的位级提前终止技术，提前终止负输出的无效计算。为了辅助算法，我们提出了一种新的权重编码，可以用更少的比特进行更准确的预测。BitSET在预测性早期终止算法和非预测性、节能的位串行体系结构中都利用了比特级计算的减少。与UNPU(一种节能的位串行CNN加速器)相比，BitSET的平均加速速度提高了1.5倍，能效提高了1.4倍，而且由于比特级计算减少了48%，没有精度损失。将允许的精度损失放宽到1%，可以将增益提高到平均1.6倍的加速和1.4倍的能效改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.