SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs

IF 2.6 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Embedded Computing Systems Pub Date : 2023-11-09 DOI:10.1145/3624582

Jun-Shen Wu, Tsen-Wei Hsu, Ren-Shuo Liu

{"title":"SG-Float: Achieving Memory Access and Computing Power Reduction Using Self-Gating Float in CNNs","authors":"Jun-Shen Wu, Tsen-Wei Hsu, Ren-Shuo Liu","doi":"10.1145/3624582","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more accessible to energy-constrained devices, researchers are exploring new algorithmic techniques and hardware designs that can reduce memory and computation requirements. In this work, we present self-gating float (SG-Float), algorithm hardware co-design of a novel binary number format, which can significantly reduce memory access and computing power requirements in CNNs. SG-Float is a self-gating format that uses the exponent to self-gate the mantissa to zero, exploiting the characteristic of floating-point that the exponent determines the magnitude of a floating-point value and the error tolerance property of CNNs. SG-Float represents relatively small values using only the exponent, which increases the proportion of ineffective mantissas, corresponding to reducing mantissa multiplications of floating-point numbers. To minimize the accuracy loss caused by the approximation error introduced by SG-Float, we propose a fine-tuning process to determine the exponent thresholds of SG-Float and reclaim the accuracy loss. We also develop a hardware optimization technique, called the SG-Float buffering strategy, to best match SG-Float with CNN accelerators and further reduce memory access. We apply the SG-Float buffering strategy to vector-vector multiplication processing elements (PEs), which NVDLA adopts, in TSMC 40nm technology. Our evaluation results demonstrate that SG-Float can achieve up to 35% reduction in memory access power and up to 54% reduction in computing power compared with AdaptivFloat, a state-of-the-art format, with negligible power and area overhead. Additionally, we show that SG-Float can be combined with neural network pruning methods to further reduce memory access and mantissa multiplications in pruned CNN models. Overall, our work shows that SG-Float is a promising solution to the problem of CNN memory access and computing power.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":" 98","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3624582","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs) are essential for advancing the field of artificial intelligence. However, since these networks are highly demanding in terms of memory and computation, implementing CNNs can be challenging. To make CNNs more accessible to energy-constrained devices, researchers are exploring new algorithmic techniques and hardware designs that can reduce memory and computation requirements. In this work, we present self-gating float (SG-Float), algorithm hardware co-design of a novel binary number format, which can significantly reduce memory access and computing power requirements in CNNs. SG-Float is a self-gating format that uses the exponent to self-gate the mantissa to zero, exploiting the characteristic of floating-point that the exponent determines the magnitude of a floating-point value and the error tolerance property of CNNs. SG-Float represents relatively small values using only the exponent, which increases the proportion of ineffective mantissas, corresponding to reducing mantissa multiplications of floating-point numbers. To minimize the accuracy loss caused by the approximation error introduced by SG-Float, we propose a fine-tuning process to determine the exponent thresholds of SG-Float and reclaim the accuracy loss. We also develop a hardware optimization technique, called the SG-Float buffering strategy, to best match SG-Float with CNN accelerators and further reduce memory access. We apply the SG-Float buffering strategy to vector-vector multiplication processing elements (PEs), which NVDLA adopts, in TSMC 40nm technology. Our evaluation results demonstrate that SG-Float can achieve up to 35% reduction in memory access power and up to 54% reduction in computing power compared with AdaptivFloat, a state-of-the-art format, with negligible power and area overhead. Additionally, we show that SG-Float can be combined with neural network pruning methods to further reduce memory access and mantissa multiplications in pruned CNN models. Overall, our work shows that SG-Float is a promising solution to the problem of CNN memory access and computing power.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SG-Float:在cnn中使用自门控Float实现内存访问和计算能力降低

卷积神经网络(cnn)对于推进人工智能领域至关重要。然而，由于这些网络在内存和计算方面要求很高，因此实现cnn可能具有挑战性。为了让能量受限的设备更容易使用cnn，研究人员正在探索新的算法技术和硬件设计，以减少内存和计算需求。在这项工作中，我们提出了自门控浮点数(SG-Float)，一种新型二进制数格式的算法硬件协同设计，可以显着降低cnn的内存访问和计算能力需求。SG-Float是一种利用指数将尾数自门为零的自门格式，利用了浮点数的特点，即指数决定浮点值的大小和cnn的容错特性。SG-Float只使用指数表示相对较小的值，这增加了无效尾数的比例，对应于减少浮点数的尾数乘法。为了最大限度地减少SG-Float近似误差带来的精度损失，我们提出了一种微调过程来确定SG-Float的指数阈值并回收精度损失。我们还开发了一种硬件优化技术，称为SG-Float缓冲策略，以最佳地匹配SG-Float与CNN加速器，并进一步减少内存访问。我们将SG-Float缓冲策略应用于NVDLA在台积电40nm工艺中采用的矢量-矢量乘法处理元件(pe)。我们的评估结果表明，与AdaptivFloat(一种最先进的格式)相比，SG-Float可以实现高达35%的内存访问功耗降低和高达54%的计算能力降低，而功耗和面积开销可以忽略不计。此外，我们表明SG-Float可以与神经网络修剪方法相结合，以进一步减少修剪后的CNN模型中的内存访问和尾数乘法。总的来说，我们的工作表明SG-Float是解决CNN内存访问和计算能力问题的一个很有前途的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.