A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Emerging Topics in Computing Pub Date : 2023-08-10 DOI:10.1109/TETC.2023.3301590
Vasilis Sakellariou;Vassilis Paliouras;Ioannis Kouretas;Hani Saleh;Thanos Stouraitis
{"title":"A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity","authors":"Vasilis Sakellariou;Vassilis Paliouras;Ioannis Kouretas;Hani Saleh;Thanos Stouraitis","doi":"10.1109/TETC.2023.3301590","DOIUrl":null,"url":null,"abstract":"In this work, a Residue Numbering System (RNS)-based Convolutional Neural Network (CNN) accelerator utilizing a multiplier-free distributed-arithmetic Processing Element (PE) is proposed. A method for maximizing the utilization of the arithmetic hardware resources is presented. It leads to an increase of the system's throughput, by exploiting bit-level sparsity within the weight vectors. The proposed PE design takes advantage of the properties of RNS and Canonical Signed Digit (CSD) encoding to achieve higher energy efficiency and effective processing rate, without requiring any compression mechanism or introducing any approximation. An extensive design space exploration for various parameters (RNS base, PE micro-architecture, encoding) using analytical models as well as experimental results from CNN benchmarks is conducted and the various trade-offs are analyzed. A complete end-to-end RNS accelerator is developed based on the proposed PE. The introduced accelerator is compared to traditional binary and RNS counterparts as well as to other state-of-the-art systems. Implementation results in a 22-nm process show that the proposed PE can lead to \n<inline-formula><tex-math>$1.85\\times$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$1.54\\times$</tex-math></inline-formula>\n more energy-efficient processing compared to binary and conventional RNS, respectively, with a \n<inline-formula><tex-math>$1.88\\times$</tex-math></inline-formula>\n maximum increase of effective throughput for the employed benchmarks. Compared to a state-of-the-art, all-digital, RNS-based system, the proposed accelerator is \n<inline-formula><tex-math>$8.87\\times$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$1.11\\times$</tex-math></inline-formula>\n more energy- and area-efficient, respectively.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":null,"pages":null},"PeriodicalIF":5.1000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10214485/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In this work, a Residue Numbering System (RNS)-based Convolutional Neural Network (CNN) accelerator utilizing a multiplier-free distributed-arithmetic Processing Element (PE) is proposed. A method for maximizing the utilization of the arithmetic hardware resources is presented. It leads to an increase of the system's throughput, by exploiting bit-level sparsity within the weight vectors. The proposed PE design takes advantage of the properties of RNS and Canonical Signed Digit (CSD) encoding to achieve higher energy efficiency and effective processing rate, without requiring any compression mechanism or introducing any approximation. An extensive design space exploration for various parameters (RNS base, PE micro-architecture, encoding) using analytical models as well as experimental results from CNN benchmarks is conducted and the various trade-offs are analyzed. A complete end-to-end RNS accelerator is developed based on the proposed PE. The introduced accelerator is compared to traditional binary and RNS counterparts as well as to other state-of-the-art systems. Implementation results in a 22-nm process show that the proposed PE can lead to $1.85\times$ and $1.54\times$ more energy-efficient processing compared to binary and conventional RNS, respectively, with a $1.88\times$ maximum increase of effective throughput for the employed benchmarks. Compared to a state-of-the-art, all-digital, RNS-based system, the proposed accelerator is $8.87\times$ and $1.11\times$ more energy- and area-efficient, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用位级稀疏性的无乘法器 RNS 型 CNN 加速器
本研究提出了一种基于残差编码系统(RNS)的卷积神经网络(CNN)加速器,该加速器采用了无乘法器分布式算术处理元件(PE)。它提出了一种最大化算术硬件资源利用率的方法。通过利用权重向量中的位级稀疏性,该方法提高了系统的吞吐量。拟议的 PE 设计利用了 RNS 和 Canonical Signed Digit (CSD) 编码的特性,实现了更高的能效和有效处理率,而不需要任何压缩机制或引入任何近似值。利用分析模型和 CNN 基准的实验结果,对各种参数(RNS 基础、PE 微体系结构、编码)进行了广泛的设计空间探索,并对各种权衡进行了分析。基于所提出的 PE,开发了一个完整的端到端 RNS 加速器。将引入的加速器与传统的二进制和 RNS 对应系统以及其他最先进的系统进行了比较。在 22 纳米工艺中的实现结果表明,与二进制和传统 RNS 相比,所提出的 PE 可使能效处理分别提高 1.85 倍和 1.54 倍,所采用基准的有效吞吐量最大提高 1.88 倍。与最先进的全数字 RNS 系统相比,所提出的加速器的能效和面积效率分别提高了 8.87 倍和 1.11 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Emerging Topics in Computing
IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)
CiteScore
12.10
自引率
5.10%
发文量
113
期刊介绍: IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.
期刊最新文献
Table of Contents Front Cover IEEE Transactions on Emerging Topics in Computing Information for Authors Special Section on Emerging Social Computing DALTON - Deep Local Learning in SNNs via local Weights and Surrogate-Derivative Transfer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1