首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
HyDe: A brid PCM/FeFET/SRAM vice-Search for Optimizing Area and Energy-Efficiencies in Analog IMC Platforms HyDe:桥式 PCM/FeFET/SRAM 副搜索,用于优化模拟 IMC 平台的面积和能效
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-10-26 DOI: 10.1109/JETCAS.2023.3327748
Abhiroop Bhattacharjee;Abhishek Moitra;Priyadarshini Panda
Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs & FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC device offers its own pros & cons during inference of Deep Neural Networks (DNNs) on crossbars in terms of area overhead, programming energy and non-idealities. A design-space exploration is, therefore, imperative to derive a hybrid-device architecture optimized for accurate DNN inference under the impact of non-idealities from multiple devices, while maintaining competitive area & energy-efficiencies. We propose a two-phase search framework (HyDe) that exploits the best of all worlds offered by multiple devices to determine an optimal hybrid-device architecture for a given DNN topology. Our hybrid models achieve upto $2.30-2.74times $ higher $TOPS/mm^{2}$ at 22 – 26% higher energy-efficiencies than baseline homogeneous models for a VGG16 DNN topology. We further propose a feasible implementation of the HyDe-derived hybrid-device architectures in the 2.5D design space using chiplets to reduce design effort and cost in the hardware fabrication involving multiple technology processes.
如今,有大量的内存计算(IMC)设备--SRAM、PCM 和 FeFET--能以高吞吐量模拟交叉阵列上的卷积。在交叉棒上进行深度神经网络(DNN)推理时,每种 IMC 设备在面积开销、编程能量和非理想性方面都有各自的优缺点。因此,设计空间探索势在必行,以便在多个设备的非理想性影响下,推导出优化的混合设备架构,实现精确的 DNN 推断,同时保持有竞争力的面积和能效。我们提出了一个两阶段搜索框架(HyDe),它利用多种设备提供的各种优势,为给定的 DNN 拓扑确定最佳混合设备架构。对于 VGG16 DNN 拓扑,我们的混合模型比基线同构模型的能效高出 22 - 26%,TOPS/mm^{2}$比基线同构模型高出多达 2.30 - 2.74 倍。我们进一步提出了在 2.5D 设计空间中使用芯片实现 HyDe 衍生混合器件架构的可行方案,以减少涉及多种技术工艺的硬件制造过程中的设计工作量和成本。
{"title":"HyDe: A brid PCM/FeFET/SRAM vice-Search for Optimizing Area and Energy-Efficiencies in Analog IMC Platforms","authors":"Abhiroop Bhattacharjee;Abhishek Moitra;Priyadarshini Panda","doi":"10.1109/JETCAS.2023.3327748","DOIUrl":"10.1109/JETCAS.2023.3327748","url":null,"abstract":"Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs & FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC device offers its own pros & cons during inference of Deep Neural Networks (DNNs) on crossbars in terms of area overhead, programming energy and non-idealities. A design-space exploration is, therefore, imperative to derive a hybrid-device architecture optimized for accurate DNN inference under the impact of non-idealities from multiple devices, while maintaining competitive area & energy-efficiencies. We propose a two-phase search framework (HyDe) that exploits the best of all worlds offered by multiple devices to determine an optimal hybrid-device architecture for a given DNN topology. Our hybrid models achieve upto \u0000<inline-formula> <tex-math>$2.30-2.74times $ </tex-math></inline-formula>\u0000 higher \u0000<inline-formula> <tex-math>$TOPS/mm^{2}$ </tex-math></inline-formula>\u0000 at 22 – 26% higher energy-efficiencies than baseline homogeneous models for a VGG16 DNN topology. We further propose a feasible implementation of the HyDe-derived hybrid-device architectures in the 2.5D design space using chiplets to reduce design effort and cost in the hardware fabrication involving multiple technology processes.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136159507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multicore Spiking Neuromorphic Chip in 180-nm With ReRAM Synapses and Digital Neurons 采用 ReRAM 突触和数字神经元的 180 纳米多核尖峰神经形态芯片
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-10-16 DOI: 10.1109/JETCAS.2023.3325158
Hao Jiang;Jikai Lu;Chenggao Zhang;Shuangzhu Tang;Junjie An;Lingli Cheng;Jian Lu;Jinsong Wei;Keji Zhou;Xumeng Zhang;Tuo Shi;Qi Liu
Neuromorphic computing based on spike neural networks (SNNs) exhibits great potential in reducing energy consumption in hardware systems. Resistive random-access memory (ReRAM) is regarded as a promising candidate to construct neuromorphic hardware, attributing to their high-density, nonvolatile, and compute-in-memory capability. However, the ReRAM-based neuromorphic chips are still in their infancy, cannot support multicore or with limited neuron configurability. To alleviate these problems, we propose a hybrid multicore SNN chip based on 60K-ReRAM synapses and 480-digital neurons in the 180 nm node, achieving a synaptic density of 20K bit/mm2 per core. To improve the efficiency of inter-core communication, we adopt a network-on-chip architecture with a bit character encoding strategy. In addition, an adaptive multiplier-less digital neuron is designed to support both Izhikevich and leaky integrate-and-fire models through register bit control, meeting different application scenarios. Finally, we evaluate the performance of our chip on the MNIST dataset recognition tasks, achieving 97.65% accuracy. Also, a minimum energy per synaptic operation (SOP) of 6.6 pJ in the 180 nm node is obtained, outperforming the TrueNorth’s 26 pJ in 28 nm. These results show that our design has a great potential for large-scale SNN implementations and may pave the way for designing high-efficient neuromorphic hardware with ReRAM technology.
基于尖峰神经网络(SNN)的神经形态计算在降低硬件系统能耗方面具有巨大潜力。电阻式随机存取存储器(ReRAM)因其高密度、非易失性和内存计算能力而被视为构建神经形态硬件的理想候选方案。然而,基于 ReRAM 的神经形态芯片仍处于起步阶段,不能支持多核或神经元可配置性有限。为了缓解这些问题,我们提出了一种基于 60K-ReRAM 突触和 480 个数字神经元的混合多核 SNN 芯片,该芯片采用 180 纳米节点,每个内核的突触密度达到 20K 比特/平方毫米。为了提高内核间通信的效率,我们采用了带有位字符编码策略的片上网络架构。此外,我们还设计了一种自适应无乘法器数字神经元,通过寄存器位控制,同时支持 Izhikevich 和泄漏积分发射模型,以满足不同的应用场景。最后,我们评估了芯片在 MNIST 数据集识别任务中的性能,准确率达到 97.65%。此外,在 180 纳米节点上,每次突触操作(SOP)的最小能量为 6.6 pJ,优于 TrueNorth 在 28 纳米节点上的 26 pJ。这些结果表明,我们的设计在大规模 SNN 实现方面具有巨大潜力,并可能为利用 ReRAM 技术设计高效神经形态硬件铺平道路。
{"title":"Multicore Spiking Neuromorphic Chip in 180-nm With ReRAM Synapses and Digital Neurons","authors":"Hao Jiang;Jikai Lu;Chenggao Zhang;Shuangzhu Tang;Junjie An;Lingli Cheng;Jian Lu;Jinsong Wei;Keji Zhou;Xumeng Zhang;Tuo Shi;Qi Liu","doi":"10.1109/JETCAS.2023.3325158","DOIUrl":"10.1109/JETCAS.2023.3325158","url":null,"abstract":"Neuromorphic computing based on spike neural networks (SNNs) exhibits great potential in reducing energy consumption in hardware systems. Resistive random-access memory (ReRAM) is regarded as a promising candidate to construct neuromorphic hardware, attributing to their high-density, nonvolatile, and compute-in-memory capability. However, the ReRAM-based neuromorphic chips are still in their infancy, cannot support multicore or with limited neuron configurability. To alleviate these problems, we propose a hybrid multicore SNN chip based on 60K-ReRAM synapses and 480-digital neurons in the 180 nm node, achieving a synaptic density of 20K bit/mm2 per core. To improve the efficiency of inter-core communication, we adopt a network-on-chip architecture with a bit character encoding strategy. In addition, an adaptive multiplier-less digital neuron is designed to support both Izhikevich and leaky integrate-and-fire models through register bit control, meeting different application scenarios. Finally, we evaluate the performance of our chip on the MNIST dataset recognition tasks, achieving 97.65% accuracy. Also, a minimum energy per synaptic operation (SOP) of 6.6 pJ in the 180 nm node is obtained, outperforming the TrueNorth’s 26 pJ in 28 nm. These results show that our design has a great potential for large-scale SNN implementations and may pave the way for designing high-efficient neuromorphic hardware with ReRAM technology.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136366271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
C-DNN V2: Complementary Deep-Neural-Network Processor With Full-Adder/OR-Based Reduction Tree and Reconfigurable Spatial Weight Reuse C-DNN V2:基于全加法器/OR 的还原树和可重构空间权重复用的互补式深度神经网络处理器
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-10-04 DOI: 10.1109/JETCAS.2023.3321771
Sangyeob Kim;Hoi-Jun Yoo
In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.
在本文中,我们提出了互补深度神经网络(C-DNN)处理器 V2,通过优化 CNN 和 SNN 的组合来提高性能。C-DNN V1 展示了通过结合 CNN 和 SNN 实现更高能效的潜力。然而,它遇到了 5 项挑战,阻碍了这一潜力的充分发挥:时钟门控累加器效率低下、不同时间步的尖峰稀疏性不平衡、时间权重重用导致的冗余缓存功耗、SNN 内核在处理密集尖峰列车时性能有限,以及基于瓦片的工作负载划分导致的非最佳运行。为了克服这些挑战,并通过 CNN-SNN 组合实现更高的能效,我们开发了 C-DNN V2。它通过实施基于全加法器/OR 的还原树来应对这些挑战,从而在高尖峰稀疏性条件下降低 SNN 内核的功耗。此外,它还通过同时集成密集和稀疏 SNN 内核,有效地管理了它们之间的尖峰稀疏不平衡。所提出的可重构空间权重重用方法减少了冗余寄存器文件的数量及其功耗。尖峰翻转和抑制方法有助于在 SNN 内核中高效处理具有高尖峰稀疏性的输入数据。此外,我们还引入了细粒度工作负载划分和高稀疏感知 CNN 内核,以确保以最高能效优化处理域中的每个数据。总之,我们提出的 C-DNN V2 是一种最佳的互补 DNN 处理器,可为 ImageNet 分类提供 76.9% 的准确率,能效达到最先进的 32.8 TOPS/W。
{"title":"C-DNN V2: Complementary Deep-Neural-Network Processor With Full-Adder/OR-Based Reduction Tree and Reconfigurable Spatial Weight Reuse","authors":"Sangyeob Kim;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2023.3321771","DOIUrl":"10.1109/JETCAS.2023.3321771","url":null,"abstract":"In this article, we propose a Complementary Deep-Neural-Network (C-DNN) processor V2 by optimizing the performance improvement from combination of CNN and SNN. C-DNN V1 showcased the potential for achieving higher energy efficiency by combining CNN and SNN. However, it encountered 5 challenges that hindered the full realization of this potential: Inefficiency of the clock gating accumulator, imbalance in spike sparsity across different time-steps, redundant cache power stemming from temporal weight reuse, limited performance of the SNN core for dense spike trains, and nonoptimal operation resulting from tile-based workload division. To overcome these challenges and achieve enhanced energy efficiency through the CNN-SNN combination, C-DNN V2 is developed. It addresses these challenges by implementing a Full-Adder/OR-based reduction tree, which reduces power consumption in the SNN core under high spike sparsity conditions. Additionally, it efficiently manages spike sparsity imbalances between dense and sparse SNN cores by integrating them simultaneously. The proposed reconfigurable spatial weight reuse method decreases the number of redundant register files and their power consumption. The spike flipping and inhibition method facilitate efficient processing of input data with high spike sparsity in the SNN core. Furthermore, fine-grained workload division and a high sparsity-aware CNN core are introduced to ensure optimal processing of each data in the domain with the highest energy efficiency. In conclusion, we propose the C-DNN V2 as an optimal complementary DNN processor, delivering 76.9% accuracy for ImageNet classification with a state-of-the-art energy efficiency of 32.8 TOPS/W.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135955106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Low Thermal Sensitivity Subthreshold-Current to Pulse-Frequency Converter for Neuromorphic Chips 用于神经形态芯片的低热灵敏度阈下电流脉冲频率转换器
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-10-02 DOI: 10.1109/JETCAS.2023.3321105
Ben Varkey Benjamin;Richelle L. Smith;Kwabena A. Boahen
To convert a subthreshold current to a pulse frequency efficiently and predictably, we designed a silicon soma that conserves energy with current feedback and lessens thermal sensitivity with voltage feedback. When the input current charges a capacitor close to the inversion point of an inverter, its short-circuit current wastes energy. To shorten this period, existing designs accelerate the charging rate with positive feedback: Either a capacitive divider feeds back voltage or a current mirror feeds back current. Voltage feedback is less effective because it kicks in only at the inversion point. Current feedback is less predictable because its leakage current is exponentially sensitive to temperature variation. By quantifying this thermal sensitivity with an analytic model of the subthreshold MOS transistor, we successfully combined current feedback with voltage feedback to design a silicon soma 10-fold less sensitive to temperature than a previous current-feedback-only design that uses 7.6-fold more silicon area. This advance allowed a mixed-signal neuromorphic chip to be predictably programmed for the first time.
为了高效、可预测地将阈下电流转换为脉冲频率,我们设计了一种硅体,它通过电流反馈节省能量,通过电压反馈降低热敏感性。当输入电流对接近反相器反相点的电容器充电时,其短路电流会浪费能量。为了缩短这段时间,现有设计通过正反馈加快充电速度:要么是电容分压器反馈电压,要么是电流镜反馈电流。电压反馈的效果较差,因为它只在反相点起作用。电流反馈的可预测性较差,因为其泄漏电流对温度变化呈指数级敏感。通过使用亚阈值 MOS 晶体管的分析模型量化这种热敏感性,我们成功地将电流反馈与电压反馈结合起来,设计出了一种对温度敏感性比以前的纯电流反馈设计低 10 倍的硅 Soma,其硅面积比以前的设计大 7.6 倍。这一进步首次实现了混合信号神经形态芯片的可预测性编程。
{"title":"A Low Thermal Sensitivity Subthreshold-Current to Pulse-Frequency Converter for Neuromorphic Chips","authors":"Ben Varkey Benjamin;Richelle L. Smith;Kwabena A. Boahen","doi":"10.1109/JETCAS.2023.3321105","DOIUrl":"10.1109/JETCAS.2023.3321105","url":null,"abstract":"To convert a subthreshold current to a pulse frequency efficiently and predictably, we designed a silicon soma that conserves energy with current feedback and lessens thermal sensitivity with voltage feedback. When the input current charges a capacitor close to the inversion point of an inverter, its short-circuit current wastes energy. To shorten this period, existing designs accelerate the charging rate with positive feedback: Either a capacitive divider feeds back voltage or a current mirror feeds back current. Voltage feedback is less effective because it kicks in only at the inversion point. Current feedback is less predictable because its leakage current is exponentially sensitive to temperature variation. By quantifying this thermal sensitivity with an analytic model of the subthreshold MOS transistor, we successfully combined current feedback with voltage feedback to design a silicon soma 10-fold less sensitive to temperature than a previous current-feedback-only design that uses 7.6-fold more silicon area. This advance allowed a mixed-signal neuromorphic chip to be predictably programmed for the first time.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135845360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tempo-CIM: A RRAM Compute-in-Memory Neuromorphic Accelerator With Area-Efficient LIF Neuron and Split-Train-Merged-Inference Algorithm for Edge AI Applications Tempo-CIM:面向边缘人工智能应用的具有面积效率 LIF 神经元和分割-训练-合并-推理算法的 RRAM 内存计算神经形态加速器
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-10-02 DOI: 10.1109/JETCAS.2023.3321107
Jingwen Jiang;Keji Zhou;Jinhao Liang;Fengshi Tian;Chenyang Zhao;Jianguo Yang;Xiaoyong Xue;Xiaoyang Zeng
Spiking neural network (SNN)-based compute-in-memory (CIM) accelerator provides a prospective implementation for intelligent edge devices with higher energy efficiency compared with artificial neural networks (ANN) deployed on conventional Von Neumann architectures. However, the costly circuit implementation of biological neurons and the immature training algorithm of discrete-pulse networks hinder efficient hardware implementation and high recognition rate. In this work, we present a 40nm RRAM CIM macro (Tempo-CIM) with charge-pump-based leaky-integrate-and-fire (LIF) neurons and split-train-merged-inference algorithm for efficient SNN acceleration with improved accuracy. The single-spike latency coding is employed to reduce the number of pulses in each time step. The voltage-type LIF neuron uses a charge pump structure to achieve efficient accumulation and thus reduce the requirement for large capacitance remarkably. The split-train-merged-inference algorithm is proposed to dynamically adjust the input of each neuron to alleviate the spike stall problem. The macro measures 0.084mm2 in a 40nm process with an energy efficiency of 68.51 TOPS/W and an area efficiency of 0.1956 TOPS/mm2 for 4b input and 8b weight.
与部署在传统冯-诺依曼架构上的人工神经网络(ANN)相比,基于尖峰神经网络(SNN)的内存计算(CIM)加速器为智能边缘设备提供了一种能效更高的前瞻性实施方案。然而,生物神经元昂贵的电路实现和离散脉冲网络不成熟的训练算法阻碍了高效的硬件实现和高识别率。在这项工作中,我们提出了一种 40nm RRAM CIM 宏(Tempo-CIM),该宏采用基于电荷泵的泄漏-整合-发射(LIF)神经元和分割-训练-合并-推理算法,可实现高效的 SNN 加速并提高准确率。采用单尖峰延迟编码来减少每个时间步中的脉冲数。电压型 LIF 神经元使用电荷泵结构实现高效积累,从而显著降低了对大电容的要求。此外,还提出了分割-训练-合并-推理算法,以动态调整每个神经元的输入,从而缓解尖峰失速问题。在 4b 输入和 8b 权重的情况下,宏的尺寸为 0.084 mm2,采用 40nm 工艺,能效为 68.51 TOPS/W,面积效率为 0.1956 TOPS/mm2。
{"title":"Tempo-CIM: A RRAM Compute-in-Memory Neuromorphic Accelerator With Area-Efficient LIF Neuron and Split-Train-Merged-Inference Algorithm for Edge AI Applications","authors":"Jingwen Jiang;Keji Zhou;Jinhao Liang;Fengshi Tian;Chenyang Zhao;Jianguo Yang;Xiaoyong Xue;Xiaoyang Zeng","doi":"10.1109/JETCAS.2023.3321107","DOIUrl":"10.1109/JETCAS.2023.3321107","url":null,"abstract":"Spiking neural network (SNN)-based compute-in-memory (CIM) accelerator provides a prospective implementation for intelligent edge devices with higher energy efficiency compared with artificial neural networks (ANN) deployed on conventional Von Neumann architectures. However, the costly circuit implementation of biological neurons and the immature training algorithm of discrete-pulse networks hinder efficient hardware implementation and high recognition rate. In this work, we present a 40nm RRAM CIM macro (Tempo-CIM) with charge-pump-based leaky-integrate-and-fire (LIF) neurons and split-train-merged-inference algorithm for efficient SNN acceleration with improved accuracy. The single-spike latency coding is employed to reduce the number of pulses in each time step. The voltage-type LIF neuron uses a charge pump structure to achieve efficient accumulation and thus reduce the requirement for large capacitance remarkably. The split-train-merged-inference algorithm is proposed to dynamically adjust the input of each neuron to alleviate the spike stall problem. The macro measures 0.084mm2 in a 40nm process with an energy efficiency of 68.51 TOPS/W and an area efficiency of 0.1956 TOPS/mm2 for 4b input and 8b weight.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135845515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking Efficiency in BNNs: Global by Local Thresholding for Analog-Based HW Accelerators 释放 BNN 的效率:基于模拟的硬件加速器的全局局部阈值法
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-09-14 DOI: 10.1109/JETCAS.2023.3315561
Mikail Yayla;Fabio Frustaci;Fanny Spagnolo;Jian-Jia Chen;Hussam Amrouch
For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.
为了加速二值化神经网络(BNN),有人提出了基于模拟计算的交叉条加速器,利用 XNOR 门和额外的接口电路。这类加速器需要大量模数转换器(ADC)和寄存器,导致设计成本高昂。为了提高推理效率,最新技术将接口电路分为利用(廉价)模拟比较器的模拟路径(AP)和利用(昂贵)模数转换器和寄存器的数字路径(DP)。在 BNN 执行过程中,会选择性地触发某个路径。理想情况下,由于通过 AP 进行推理效率更高,因此应尽可能频繁地触发 AP。然而,我们发现,除非权重的数量非常小,否则 AP 很少被触发。为了克服这一问题,我们提出了一种新颖的 BNN 推理方案,称为局部阈值近似(LTA)。它通过局部阈值来近似 BNN 中的全局阈值。这样就能在大部分执行过程中使用 AP,从而显著提高接口电路的效率。在我们使用两种 BNN 架构进行的评估中,与最先进的基于交叉条的 BNN 加速器相比,使用 LTA 的面积分别减少了 42 倍和 54 倍,能耗分别减少了 2.7 倍和 4.2 倍,延迟分别减少了 3.8 倍和 1.15 倍。
{"title":"Unlocking Efficiency in BNNs: Global by Local Thresholding for Analog-Based HW Accelerators","authors":"Mikail Yayla;Fabio Frustaci;Fanny Spagnolo;Jian-Jia Chen;Hussam Amrouch","doi":"10.1109/JETCAS.2023.3315561","DOIUrl":"10.1109/JETCAS.2023.3315561","url":null,"abstract":"For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10251514","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135443296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Circuits and Systems Society 电路与系统分会
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-09-13 DOI: 10.1109/JETCAS.2023.3308285
{"title":"IEEE Circuits and Systems Society","authors":"","doi":"10.1109/JETCAS.2023.3308285","DOIUrl":"https://doi.org/10.1109/JETCAS.2023.3308285","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/5503868/10251074/10251080.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrections to “Digital implementation of Radial Basis Function Neural Networks Based on Stochastic Computing” [Mar 23 257-269] 对“基于随机计算的径向基函数神经网络的数字实现”的更正[Mar 23 257-269]
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-09-13 DOI: 10.1109/JETCAS.2023.3287741
Alejandro Morán Costoya;Luis Parrilla Roure;Miquel Roca;Joan Font-Rossello;Eugeni Isern;Vincent Canals
In the above article [1], according to our institution, the text in project funding (bottom left on the first page) requires a minor modification. The following text:
在上述文章[1]中,根据我们的机构,项目资助中的文本(第一页左下角)需要稍作修改。以下文本:
{"title":"Corrections to “Digital implementation of Radial Basis Function Neural Networks Based on Stochastic Computing” [Mar 23 257-269]","authors":"Alejandro Morán Costoya;Luis Parrilla Roure;Miquel Roca;Joan Font-Rossello;Eugeni Isern;Vincent Canals","doi":"10.1109/JETCAS.2023.3287741","DOIUrl":"https://doi.org/10.1109/JETCAS.2023.3287741","url":null,"abstract":"In the above article \u0000<xref>[1]</xref>\u0000, according to our institution, the text in project funding (bottom left on the first page) requires a minor modification. The following text:","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/5503868/10251074/10251075.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50424424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information IEEE电路与系统新兴和选定主题期刊出版信息
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-09-13 DOI: 10.1109/JETCAS.2023.3308287
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information","authors":"","doi":"10.1109/JETCAS.2023.3308287","DOIUrl":"https://doi.org/10.1109/JETCAS.2023.3308287","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/5503868/10251074/10251078.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors IEEE电路与系统新兴和精选主题期刊作者信息
IF 4.6 2区 工程技术 Q1 Engineering Pub Date : 2023-09-13 DOI: 10.1109/JETCAS.2023.3308283
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors","authors":"","doi":"10.1109/JETCAS.2023.3308283","DOIUrl":"https://doi.org/10.1109/JETCAS.2023.3308283","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/5503868/10251074/10251077.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50347856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1