DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Angelo Garofalo;Yvan Tortorella;Matteo Perotti;Luca Valente;Alessandro Nadalini;Luca Benini;Davide Rossi;Francesco Conti
{"title":"DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training","authors":"Angelo Garofalo;Yvan Tortorella;Matteo Perotti;Luca Valente;Alessandro Nadalini;Luca Benini;Davide Rossi;Francesco Conti","doi":"10.1109/OJSSCS.2022.3210082","DOIUrl":null,"url":null,"abstract":"On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.","PeriodicalId":100633,"journal":{"name":"IEEE Open Journal of the Solid-State Circuits Society","volume":"2 ","pages":"231-243"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8782712/9733783/09903915.pdf","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Solid-State Circuits Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9903915/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DARKSIDE:一个用于极端边缘片上DNN推理和训练的异构RISC-V计算集群
片上深度神经网络(DNN)推理和极限边缘训练(TinyML)对延迟、吞吐量、准确性和灵活性提出了严格的要求。异构集群将DSP增强内核的灵活性与专用加速器的性能和能量提升相结合,是应对这一挑战的有前景的解决方案。我们介绍了DARKSIDE,这是一种片上系统,具有八个RISC-V核心的异构集群,并使用2-b到32-b混合精度整数算法进行了增强。为了提高关键计算密集型DNN内核的性能和效率,该集群配备了三个数字加速器:1)用于低数据重用深度卷积内核的专用引擎(高达30 MAC/周期);2) 一个最小开销的数据移动器,用于动态编组1–32-b数据;以及3)用于拼接矩阵乘法加速的16-b浮点张量积引擎(TPE)。DARKSIDE采用65nm CMOS技术实现。当处理2-b整数DNN内核时,该集群实现了65 GOPS的峰值整数性能和835 GOPS/W的峰值效率。当针对浮点张量运算时,TPE提供高达18.2 GFLOPS的性能或300 GFLOPS/W的效率,足以实现具有竞争力的速度下的片上浮点训练以及超低功耗量化推理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
相关文献
Towards improving an Area of Concern: Main-channel habitat rehabilitation priorities for the Maumee River
IF 2.2 3区 环境科学与生态学Journal of Great Lakes ResearchPub Date : 2021-10-01 DOI: 10.1016/j.jglr.2021.08.001
Keith D. Shane , Melissa J. Oubre , Todd D. Crail , Jeffrey G. Miner , Christine M. Mayer , Taylor E. Sasak , Robin L. DeBruyne , Joshua J. Miller , Edward F. Roseman , William D. Hintz
The neoliberal turn in environmental governance in the Detroit River Area of Concern
IF 2.5 Environmental SociologyPub Date : 2015-07-03 DOI: 10.1080/23251042.2015.1045332
Andrew Van Alstyne
Thinking outside the “water box” in the Detroit River Area of Concern
IF 2.2 3区 环境科学与生态学Journal of Great Lakes ResearchPub Date : 2020-12-01 DOI: 10.1016/j.jglr.2020.09.016
Allison J. Egan , Robert C. de Loë
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
New Associate Editors Table of Contents 2024 Index IEEE Open Journal of the Solid-State Circuits Society Vol. 4 Editorial Message From the Incoming Editor-in-Chief Editorial Special section on High-Performance Wireline Transceiver Circuits
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1