首页 > 最新文献

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design最新文献

英文 中文
Directed Acyclic Graph-based Neural Networks for Tunable Low-Power Computer Vision 面向可调低功耗计算机视觉的有向无环图神经网络
Abhinav Goel, Caleb Tung, Nick Eliopoulos, Xiao Hu, G. Thiruvathukal, James C. Davis, Yung-Hsiang Lu
Processing visual data on mobile devices has many applications, e.g., emergency response and tracking. State-of-the-art computer vision techniques rely on large Deep Neural Networks (DNNs) that are usually too power-hungry to be deployed on resource-constrained edge devices. Many techniques improve DNN efficiency of DNNs by compromising accuracy. However, the accuracy and efficiency of these techniques cannot be adapted for diverse edge applications with different hardware constraints and accuracy requirements. This paper demonstrates that a recent, efficient tree-based DNN architecture, called the hierarchical DNN, can be converted into a Directed Acyclic Graph-based (DAG) architecture to provide tunable accuracy-efficiency tradeoff options. We propose a systematic method that identifies the connections that must be added to convert the tree to a DAG to improve accuracy. We conduct experiments on popular edge devices and show that increasing the connectivity of the DAG improves the accuracy to within 1% of the existing high accuracy techniques. Our approach requires 93% less memory, 43% less energy, and 49% fewer operations than the high accuracy techniques, thus providing more accuracy-efficiency configurations.
在移动设备上处理可视数据有许多应用,例如,应急响应和跟踪。最先进的计算机视觉技术依赖于大型深度神经网络(dnn),这些网络通常过于耗电,无法部署在资源受限的边缘设备上。许多技术通过牺牲精度来提高深度神经网络的效率。然而,这些技术的精度和效率不能适应具有不同硬件约束和精度要求的各种边缘应用。本文证明了一种最新的,高效的基于树的深度神经网络架构,称为分层深度神经网络,可以转换为基于有向无环图(DAG)的架构,以提供可调的精度-效率权衡选项。我们提出了一种系统的方法来识别必须添加的连接,以将树转换为DAG以提高准确性。我们在流行的边缘设备上进行了实验,结果表明,增加DAG的连通性可以将精度提高到现有高精度技术的1%以内。与高精度技术相比,我们的方法需要减少93%的内存、43%的能量和49%的操作,从而提供更高的精度效率配置。
{"title":"Directed Acyclic Graph-based Neural Networks for Tunable Low-Power Computer Vision","authors":"Abhinav Goel, Caleb Tung, Nick Eliopoulos, Xiao Hu, G. Thiruvathukal, James C. Davis, Yung-Hsiang Lu","doi":"10.1145/3531437.3539723","DOIUrl":"https://doi.org/10.1145/3531437.3539723","url":null,"abstract":"Processing visual data on mobile devices has many applications, e.g., emergency response and tracking. State-of-the-art computer vision techniques rely on large Deep Neural Networks (DNNs) that are usually too power-hungry to be deployed on resource-constrained edge devices. Many techniques improve DNN efficiency of DNNs by compromising accuracy. However, the accuracy and efficiency of these techniques cannot be adapted for diverse edge applications with different hardware constraints and accuracy requirements. This paper demonstrates that a recent, efficient tree-based DNN architecture, called the hierarchical DNN, can be converted into a Directed Acyclic Graph-based (DAG) architecture to provide tunable accuracy-efficiency tradeoff options. We propose a systematic method that identifies the connections that must be added to convert the tree to a DAG to improve accuracy. We conduct experiments on popular edge devices and show that increasing the connectivity of the DAG improves the accuracy to within 1% of the existing high accuracy techniques. Our approach requires 93% less memory, 43% less energy, and 49% fewer operations than the high accuracy techniques, thus providing more accuracy-efficiency configurations.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126310513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Layerwise Disaggregated Evaluation of Spiking Neural Networks 脉冲神经网络的分层分解评价
Abinand Nallathambi, Sanchari Sen, A. Raghunathan, N. Chandrachoodan
Spiking Neural Networks (SNNs) have attracted considerable attention due to their suitability to processing temporal input streams, as well as the emergence of highly power-efficient neuromorphic hardware platforms. The computational cost of evaluating a Spiking Neural Network (SNN) is strongly correlated with the number of timesteps for which it is evaluated. To improve the computational efficiency of SNN evaluation, we propose layerwise disaggregated SNNs (LD-SNNs), wherein the number of timesteps is independently optimized for each layer of the network. In effect, LD-SNNs allow for a better allocation of computational effort across layers in a network, resulting in an improved tradeoff between accuracy and efficiency. We propose a methodology to design optimized LD-SNNs from any given SNN. Across four benchmark networks, LD-SNNs achieve 1.67-3.84x reduction in synaptic updates and 1.2-2.56x reduction in neurons evaluated. These improvements translate to 1.25-3.45x faster inference on four different hardware platforms including two server-class platforms, a desktop platform and an edge SoC.
尖峰神经网络(snn)由于其处理时间输入流的适用性以及高能效神经形态硬件平台的出现而引起了人们的广泛关注。评估峰值神经网络(SNN)的计算成本与评估的时间步数密切相关。为了提高SNN评估的计算效率,我们提出了分层分解SNN (ld -SNN),其中对网络的每一层独立优化时间步数。实际上,ld - snn允许在网络中的各层之间更好地分配计算工作量,从而改善了准确性和效率之间的权衡。我们提出了一种从任意给定SNN设计优化的ld -SNN的方法。在四个基准网络中,ld - snn的突触更新减少了1.67-3.84倍,评估的神经元减少了1.2-2.56倍。这些改进在四个不同的硬件平台上转化为1.25-3.45倍的推理速度,包括两个服务器级平台,一个桌面平台和一个边缘SoC。
{"title":"Layerwise Disaggregated Evaluation of Spiking Neural Networks","authors":"Abinand Nallathambi, Sanchari Sen, A. Raghunathan, N. Chandrachoodan","doi":"10.1145/3531437.3539708","DOIUrl":"https://doi.org/10.1145/3531437.3539708","url":null,"abstract":"Spiking Neural Networks (SNNs) have attracted considerable attention due to their suitability to processing temporal input streams, as well as the emergence of highly power-efficient neuromorphic hardware platforms. The computational cost of evaluating a Spiking Neural Network (SNN) is strongly correlated with the number of timesteps for which it is evaluated. To improve the computational efficiency of SNN evaluation, we propose layerwise disaggregated SNNs (LD-SNNs), wherein the number of timesteps is independently optimized for each layer of the network. In effect, LD-SNNs allow for a better allocation of computational effort across layers in a network, resulting in an improved tradeoff between accuracy and efficiency. We propose a methodology to design optimized LD-SNNs from any given SNN. Across four benchmark networks, LD-SNNs achieve 1.67-3.84x reduction in synaptic updates and 1.2-2.56x reduction in neurons evaluated. These improvements translate to 1.25-3.45x faster inference on four different hardware platforms including two server-class platforms, a desktop platform and an edge SoC.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126665581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy Efficient Cache Design with Piezoelectric FETs 基于压电场效应管的高能效缓存设计
Reena Elangovan, Ashish Ranjan, Niharika Thakuria, S. Gupta, A. Raghunathan
Piezoelectric FETs (PeFETs) are a promising class of ferroelectric devices that use the piezoelectric effect to modulate strain in the channel. They present several desirable properties for on-chip memory, such as non-volatility, high-density, and low-power write capability. In this work, we present the first effort to design and evaluate cache architectures using PeFETs. Two key goals in cache design are to maximize capacity and minimize latency. Accordingly, we consider two different variants of PeFET bit-cells - a high-density variant (HD-PeFET) that does not use a separate access transistor, and a high-performance 1T-1PeFET variant (HP-PeFET) that sacrifices density for lower access latency. We note that at the application level, there exists significant heterogeneity in the sensitivity of applications to cache capacity and latency. To enable a better tradeoff between these conflicting design goals, we propose a hybrid PeFET cache comprising of both HP-PeFET and HD-PeFET regions at the granularity of cache ways. We make the key observation that frequently reused blocks residing in the HD-PeFET region are detrimental to overall cache performance due to the higher access latency. Hence, we also propose a cache management policy to identify and migrate these blocks from the HD-PeFET region to the HP-PeFET region at runtime. We develop models of HD-PeFET and HP-PeFET caches using the CACTI framework and evaluate their benefits across a suite of PARSEC and SPLASH-2X benchmarks. We demonstrate 1.11x and 4.55x average improvements in performance and energy, respectively, using the proposed hybrid PeFET last-level cache against a baseline with traditional SRAM cache at iso-area.
压电场效应管(pefet)是一类很有前途的铁电器件,它利用压电效应来调制通道中的应变。它们为片上存储器提供了一些理想的特性,如非易失性、高密度和低功耗写入能力。在这项工作中,我们首次尝试使用pefet来设计和评估缓存架构。缓存设计中的两个关键目标是最大化容量和最小化延迟。因此,我们考虑了PeFET位单元的两种不同变体——高密度变体(HD-PeFET),它不使用单独的接入晶体管,以及高性能1T-1PeFET变体(HP-PeFET),它牺牲密度以降低接入延迟。我们注意到,在应用程序级别,应用程序对缓存容量和延迟的敏感性存在显著的异质性。为了在这些相互冲突的设计目标之间实现更好的权衡,我们提出了一种混合PeFET缓存,该缓存在缓存方式的粒度上由HP-PeFET和HD-PeFET区域组成。我们观察到,由于更高的访问延迟,频繁重用驻留在HD-PeFET区域的块对整体缓存性能有害。因此,我们还提出了一种缓存管理策略,以便在运行时识别并将这些块从HD-PeFET区域迁移到HP-PeFET区域。我们使用CACTI框架开发了HD-PeFET和HP-PeFET缓存模型,并在一套PARSEC和SPLASH-2X基准测试中评估了它们的优势。我们展示了使用所提出的混合PeFET最后一级缓存与传统SRAM缓存在等面积下的基线相比,性能和能量的平均提高分别为1.11倍和4.55倍。
{"title":"Energy Efficient Cache Design with Piezoelectric FETs","authors":"Reena Elangovan, Ashish Ranjan, Niharika Thakuria, S. Gupta, A. Raghunathan","doi":"10.1145/3531437.3539727","DOIUrl":"https://doi.org/10.1145/3531437.3539727","url":null,"abstract":"Piezoelectric FETs (PeFETs) are a promising class of ferroelectric devices that use the piezoelectric effect to modulate strain in the channel. They present several desirable properties for on-chip memory, such as non-volatility, high-density, and low-power write capability. In this work, we present the first effort to design and evaluate cache architectures using PeFETs. Two key goals in cache design are to maximize capacity and minimize latency. Accordingly, we consider two different variants of PeFET bit-cells - a high-density variant (HD-PeFET) that does not use a separate access transistor, and a high-performance 1T-1PeFET variant (HP-PeFET) that sacrifices density for lower access latency. We note that at the application level, there exists significant heterogeneity in the sensitivity of applications to cache capacity and latency. To enable a better tradeoff between these conflicting design goals, we propose a hybrid PeFET cache comprising of both HP-PeFET and HD-PeFET regions at the granularity of cache ways. We make the key observation that frequently reused blocks residing in the HD-PeFET region are detrimental to overall cache performance due to the higher access latency. Hence, we also propose a cache management policy to identify and migrate these blocks from the HD-PeFET region to the HP-PeFET region at runtime. We develop models of HD-PeFET and HP-PeFET caches using the CACTI framework and evaluate their benefits across a suite of PARSEC and SPLASH-2X benchmarks. We demonstrate 1.11x and 4.55x average improvements in performance and energy, respectively, using the proposed hybrid PeFET last-level cache against a baseline with traditional SRAM cache at iso-area.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126238772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Canopy: A CNFET-based Process Variation Aware Systolic DNN Accelerator Canopy:一个基于cnfet的过程变化感知收缩DNN加速器
Cheng Chu, Dawen Xu, Ying Wang, Fan Chen
Although systolic accelerators have become the dominant method for executing Deep Neural Networks (DNNs), their performance efficiency (quantified as Energy-Delay Product or EDP) is limited by the capabilities of silicon Field-Effect Transistors (FETs). FETs constructed from Carbon Nanotubes (CNTs) have demonstrated > 10 × EDP benefits, however, the processing variations inherent in carbon nanotube FETs (CNFETs) fabrication compromise the EDP benefits, resulting > 40% performance degradation. In this work, we study the impact of CNT process variations and present Canopy, a process variation aware systolic DNN accelerator by leveraging the spatial correlation in CNT variations. Canopy co-optimizes the architecture and dataflow to allow computing engines in a systolic array run at their best performance with non-uniform latency, minimizing the performance degradation incurred by CNT variations. Furthermore, we devise Canopy with dynamic reconfigurability such that the microarchitectural capability and its associated flexibility achieves an extra degree of adaptability with regard to the DNN topology and processing hyper-parameters (e.g., batch size). Experimental results show that Canopy improves the performance by 5.85 × (4.66 ×) and reduces the energy by 34% (90%) when inferencing a single (a batch of) input compared to the baseline design under an iso-area comparison across seven DNN workloads.
虽然收缩加速器已经成为执行深度神经网络(dnn)的主要方法,但它们的性能效率(量化为能量延迟积或EDP)受到硅场效应晶体管(fet)能力的限制。由碳纳米管(CNTs)构建的场效应管(fet)具有> 10倍的EDP优势,然而,碳纳米管fet (cnfet)制造中固有的工艺变化损害了EDP优势,导致> 40%的性能下降。在这项工作中,我们研究了碳纳米管过程变化的影响,并通过利用碳纳米管变化的空间相关性,提出了一个过程变化感知的收缩DNN加速器Canopy。Canopy共同优化了架构和数据流,使计算引擎在收缩阵列中以非均匀延迟的最佳性能运行,最大限度地减少碳纳米管变化引起的性能下降。此外,我们设计了具有动态可重构性的Canopy,使得微架构能力及其相关的灵活性在DNN拓扑和处理超参数(例如,批量大小)方面实现了额外程度的适应性。实验结果表明,在七个深度神经网络工作负载的等面积比较下,当推断单个(一批)输入时,与基线设计相比,Canopy提高了5.85倍(4.66倍)的性能,减少了34%(90%)的能量。
{"title":"Canopy: A CNFET-based Process Variation Aware Systolic DNN Accelerator","authors":"Cheng Chu, Dawen Xu, Ying Wang, Fan Chen","doi":"10.1145/3531437.3539703","DOIUrl":"https://doi.org/10.1145/3531437.3539703","url":null,"abstract":"Although systolic accelerators have become the dominant method for executing Deep Neural Networks (DNNs), their performance efficiency (quantified as Energy-Delay Product or EDP) is limited by the capabilities of silicon Field-Effect Transistors (FETs). FETs constructed from Carbon Nanotubes (CNTs) have demonstrated > 10 × EDP benefits, however, the processing variations inherent in carbon nanotube FETs (CNFETs) fabrication compromise the EDP benefits, resulting > 40% performance degradation. In this work, we study the impact of CNT process variations and present Canopy, a process variation aware systolic DNN accelerator by leveraging the spatial correlation in CNT variations. Canopy co-optimizes the architecture and dataflow to allow computing engines in a systolic array run at their best performance with non-uniform latency, minimizing the performance degradation incurred by CNT variations. Furthermore, we devise Canopy with dynamic reconfigurability such that the microarchitectural capability and its associated flexibility achieves an extra degree of adaptability with regard to the DNN topology and processing hyper-parameters (e.g., batch size). Experimental results show that Canopy improves the performance by 5.85 × (4.66 ×) and reduces the energy by 34% (90%) when inferencing a single (a batch of) input compared to the baseline design under an iso-area comparison across seven DNN workloads.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122665311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Efficient Dataflows for Spiking Neural Networks 识别脉冲神经网络的高效数据流
Deepika Sharma, Aayush Ankit, K. Roy
Deep feed-forward Spiking Neural Networks (SNNs) trained using appropriate learning algorithms have been shown to match the performance of state-of-the-art Artificial Neural Networks (ANNs). The inputs to an SNN layer are 1-bit spikes distributed over several timesteps. In addition, along with the standard artificial neural network (ANN) data structures, SNNs require one additional data structure – the membrane potential (Vmem) for each neuron which is updated every timestep. Hence, the dataflow requirements for energy-efficient hardware implementation of SNNs can be different from the standard ANNs. In this paper, we propose optimal dataflows for deep spiking neural network layers. To evaluate the energy and latency of different dataflows, we considered three hardware architectures with varying on-chip resources to represent a class of spatial accelerators. We developed a set of rules leading to optimum dataflow for SNNs that achieve more than 90% improvement in Energy-Delay Product (EDP) compared to the baseline for some workloads and architectures.
使用适当的学习算法训练的深度前馈脉冲神经网络(SNNs)已被证明与最先进的人工神经网络(ann)的性能相匹配。SNN层的输入是分布在几个时间步长的1位尖峰。此外,除了标准的人工神经网络(ANN)数据结构外,snn还需要一个额外的数据结构-每个神经元的膜电位(Vmem),该数据结构在每个时间步长更新。因此,snn对节能硬件实现的数据流要求可能与标准ann不同。在本文中,我们提出了深度尖峰神经网络层的最优数据流。为了评估不同数据流的能量和延迟,我们考虑了三种具有不同片上资源的硬件架构来表示一类空间加速器。我们为snn开发了一套优化数据流的规则,与某些工作负载和架构的基线相比,snn的能量延迟产品(EDP)提高了90%以上。
{"title":"Identifying Efficient Dataflows for Spiking Neural Networks","authors":"Deepika Sharma, Aayush Ankit, K. Roy","doi":"10.1145/3531437.3539704","DOIUrl":"https://doi.org/10.1145/3531437.3539704","url":null,"abstract":"Deep feed-forward Spiking Neural Networks (SNNs) trained using appropriate learning algorithms have been shown to match the performance of state-of-the-art Artificial Neural Networks (ANNs). The inputs to an SNN layer are 1-bit spikes distributed over several timesteps. In addition, along with the standard artificial neural network (ANN) data structures, SNNs require one additional data structure – the membrane potential (Vmem) for each neuron which is updated every timestep. Hence, the dataflow requirements for energy-efficient hardware implementation of SNNs can be different from the standard ANNs. In this paper, we propose optimal dataflows for deep spiking neural network layers. To evaluate the energy and latency of different dataflows, we considered three hardware architectures with varying on-chip resources to represent a class of spatial accelerators. We developed a set of rules leading to optimum dataflow for SNNs that achieve more than 90% improvement in Energy-Delay Product (EDP) compared to the baseline for some workloads and architectures.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130117219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Predictive Model Attack for Embedded FPGA Logic Locking 嵌入式FPGA逻辑锁定的预测模型攻击
Prattay Chowdhury, Chaitali Sathe, Benjamin Carrion Schaefer
With most VLSI design companies now being fabless it is imperative to develop methods to protect their Intellectual Property (IP). One approach that has become very popular due to its relative simplicity and practicality is logic locking. One of the problems with traditional locking mechanisms is that the locking circuitry is built into the netlist that the VLSI design company delivers to the foundry which has now access to the entire design including the locking mechanism. This implies that they could potentially tamper with this circuitry or reverse engineer it to obtain the locking key. One relatively new approach that has been coined logic locking through omission, or hardware redaction, maps a portion of the design to an embedded FPGA (eFPGA). The bitstream of the eFPGA now acts as the locking key. This new approach has been shown to be more secure as the foundry has no access to the bitstream during the manufacturing stage. The obvious drawbacks are the increase in design complexity and the area and performance overheads associated with the eFPGA. In this work we propose, to the best of our knowledge, the first attack on these type of new locking mechanisms by substituting the exact logic mapped onto the eFPGA by a synthesizable predictive model that replicates the behavior of the exact logic. We show that this approach is applicable in the context of approximate computing where hardware accelerators tolerate certain degree of errors at their outputs. Experimental results show that our proposed approach is very effective finding suitable predictive models while simultaneously reducing the overall power consumption.
由于大多数VLSI设计公司现在都是无晶圆厂,因此必须开发保护其知识产权(IP)的方法。一种由于相对简单和实用而变得非常流行的方法是逻辑锁定。传统锁定机制的一个问题是,锁定电路被内置到VLSI设计公司交付给代工厂的网络列表中,代工厂现在可以访问包括锁定机制在内的整个设计。这意味着他们有可能篡改这个电路或对其进行逆向工程以获得锁定密钥。一种相对较新的方法是通过省略或硬件编校来创造逻辑锁定,将设计的一部分映射到嵌入式FPGA (eFPGA)。eFPGA的位流现在充当锁定密钥。这种新方法已被证明更加安全,因为代工厂在制造阶段无法访问比特流。明显的缺点是设计复杂性的增加以及与eFPGA相关的面积和性能开销。在这项工作中,据我们所知,我们提出了对这些类型的新锁定机制的第一次攻击,方法是用复制精确逻辑行为的可合成预测模型取代映射到eFPGA上的精确逻辑。我们证明这种方法适用于近似计算的环境,其中硬件加速器在其输出中容忍一定程度的误差。实验结果表明,该方法能够有效地找到合适的预测模型,同时降低整体功耗。
{"title":"Predictive Model Attack for Embedded FPGA Logic Locking","authors":"Prattay Chowdhury, Chaitali Sathe, Benjamin Carrion Schaefer","doi":"10.1145/3531437.3539728","DOIUrl":"https://doi.org/10.1145/3531437.3539728","url":null,"abstract":"With most VLSI design companies now being fabless it is imperative to develop methods to protect their Intellectual Property (IP). One approach that has become very popular due to its relative simplicity and practicality is logic locking. One of the problems with traditional locking mechanisms is that the locking circuitry is built into the netlist that the VLSI design company delivers to the foundry which has now access to the entire design including the locking mechanism. This implies that they could potentially tamper with this circuitry or reverse engineer it to obtain the locking key. One relatively new approach that has been coined logic locking through omission, or hardware redaction, maps a portion of the design to an embedded FPGA (eFPGA). The bitstream of the eFPGA now acts as the locking key. This new approach has been shown to be more secure as the foundry has no access to the bitstream during the manufacturing stage. The obvious drawbacks are the increase in design complexity and the area and performance overheads associated with the eFPGA. In this work we propose, to the best of our knowledge, the first attack on these type of new locking mechanisms by substituting the exact logic mapped onto the eFPGA by a synthesizable predictive model that replicates the behavior of the exact logic. We show that this approach is applicable in the context of approximate computing where hardware accelerators tolerate certain degree of errors at their outputs. Experimental results show that our proposed approach is very effective finding suitable predictive models while simultaneously reducing the overall power consumption.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126711533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Bit-level Sparsity-aware SAR ADC with Direct Hybrid Encoding for Signed Expressions for AIoT Applications 一个位级稀疏感知SAR ADC,用于AIoT应用的签名表达式的直接混合编码
Ruicong Chen, H. Kung, A. Chandrakasan, Hae-Seung Lee
In this work, we propose the first bit-level sparsity-aware SAR ADC with direct hybrid encoding for signed expressions (HESE) for AIoT applications. ADCs are typically a bottleneck in reducing the energy consumption of analog neural networks (ANNs). For a pre-trained Convolutional Neural Network (CNN) inference, a HESE SAR for an ANN can reduce the number of non-zero signed digit terms to be output, and thus enables a reduction in energy along with the term quantization (TQ). The proposed SAR ADC directly produces the HESE signed-digit representation (SDR) using two thresholds per cycle for 2-bit look-ahead (LA). A prototype in 65nm shows that the HESE SAR provides sparsity encoding with a Walden FoM of 15.2fJ/conv.-step at 45MS/s. The core area is 0.072mm2.
在这项工作中,我们提出了第一个位级稀疏感知SAR ADC,该ADC具有用于AIoT应用的符号表达式(HESE)的直接混合编码。adc通常是降低模拟神经网络(ann)能量消耗的瓶颈。对于预训练的卷积神经网络(CNN)推理,用于人工神经网络的HESE SAR可以减少输出的非零符号数字项的数量,从而减少能量以及项量化(TQ)。所提出的SAR ADC直接产生HESE签名数字表示(SDR),每个周期使用两个阈值进行2位预读(LA)。在65nm下的原型表明,HESE SAR提供了15.2fJ/conv的Walden FoM稀疏性编码。-步进速度为45MS/s。核心面积为0.072mm2。
{"title":"A Bit-level Sparsity-aware SAR ADC with Direct Hybrid Encoding for Signed Expressions for AIoT Applications","authors":"Ruicong Chen, H. Kung, A. Chandrakasan, Hae-Seung Lee","doi":"10.1145/3531437.3539700","DOIUrl":"https://doi.org/10.1145/3531437.3539700","url":null,"abstract":"In this work, we propose the first bit-level sparsity-aware SAR ADC with direct hybrid encoding for signed expressions (HESE) for AIoT applications. ADCs are typically a bottleneck in reducing the energy consumption of analog neural networks (ANNs). For a pre-trained Convolutional Neural Network (CNN) inference, a HESE SAR for an ANN can reduce the number of non-zero signed digit terms to be output, and thus enables a reduction in energy along with the term quantization (TQ). The proposed SAR ADC directly produces the HESE signed-digit representation (SDR) using two thresholds per cycle for 2-bit look-ahead (LA). A prototype in 65nm shows that the HESE SAR provides sparsity encoding with a Walden FoM of 15.2fJ/conv.-step at 45MS/s. The core area is 0.072mm2.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130940654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of the Effect of Hot Carrier Injection in An Integrated Inductive Voltage Regulator 热载流子注入对整合式电感稳压器的影响分析
Shida Zhang, Nael Mizanur Rahman, Venkata Chaitanya Krishna Chekuri, Carlos Tokunaga, S. Mukhopadhyay
This paper presents a simulation-based study to evaluate the effect of Hot Carrier Injection (HCI) on the characteristics of an on-chip, digitally-controlled, switched inductor voltage regulator (IVR) architecture. Our methodology integrates device-level aging models, circuit simulations in SPICE, and control loop simulations in Simulink. We characterize the effect of HCI on individual components of an IVR, and their combined effect on the efficiency and transient performance. Our analysis using an IVR designed in 65nm CMOS shows that aging of the power stages has a smaller impact on performance compared to that of the control loop. Further, we perform a comparative analysis to show that, with a 1.8V supply, HCI leads to higher aging-induced degradation of IVR than Negative Bias Temperature Instability (NBTI). Finally, our simulation shows that parasitic inductance near IVR input aggravates NBTI and parasitic capacitance near IVR output aggravates HCI effects on IVR’s performance.
本文提出了一项基于仿真的研究,以评估热载流子注入(HCI)对片上、数字控制、开关电感电压调节器(IVR)架构特性的影响。我们的方法集成了器件级老化模型,SPICE中的电路仿真和Simulink中的控制回路仿真。我们描述了人工智能对IVR各个组件的影响,以及它们对效率和瞬态性能的综合影响。我们使用65nm CMOS设计的IVR分析表明,与控制回路相比,功率级的老化对性能的影响较小。此外,我们进行了比较分析,表明在1.8V电源下,HCI比负偏置温度不稳定性(NBTI)导致更高的老化诱导的IVR降解。最后,我们的仿真表明,IVR输入附近的寄生电感加剧了NBTI,而IVR输出附近的寄生电容加剧了HCI对IVR性能的影响。
{"title":"Analysis of the Effect of Hot Carrier Injection in An Integrated Inductive Voltage Regulator","authors":"Shida Zhang, Nael Mizanur Rahman, Venkata Chaitanya Krishna Chekuri, Carlos Tokunaga, S. Mukhopadhyay","doi":"10.1145/3531437.3539710","DOIUrl":"https://doi.org/10.1145/3531437.3539710","url":null,"abstract":"This paper presents a simulation-based study to evaluate the effect of Hot Carrier Injection (HCI) on the characteristics of an on-chip, digitally-controlled, switched inductor voltage regulator (IVR) architecture. Our methodology integrates device-level aging models, circuit simulations in SPICE, and control loop simulations in Simulink. We characterize the effect of HCI on individual components of an IVR, and their combined effect on the efficiency and transient performance. Our analysis using an IVR designed in 65nm CMOS shows that aging of the power stages has a smaller impact on performance compared to that of the control loop. Further, we perform a comparative analysis to show that, with a 1.8V supply, HCI leads to higher aging-induced degradation of IVR than Negative Bias Temperature Instability (NBTI). Finally, our simulation shows that parasitic inductance near IVR input aggravates NBTI and parasitic capacitance near IVR output aggravates HCI effects on IVR’s performance.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131136867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D IC Tier Partitioning of Memory Macros: PPA vs. Thermal Tradeoffs 内存宏的3D IC层划分:PPA与热权衡
Lingjun Zhu, Nesara Eranna Bethur, Yi-Chen Lu, Youngsang Cho, Yunhyeok Im, S. Lim
Micro-bump and hybrid bonding technologies have enabled 3D ICs and provided remarkable performance gain, but the memory macro partitioning problem also becomes more complicated due to the limited 3D connection density. In this paper, we evaluate and quantify the impacts of various macro partitioning on the performance and temperature in commercial-grade 3D ICs. In addition, we propose a set of partitioning guidelines and a quick constraint-graph-based approach to create floorplans for logic-on-memory 3D ICs. Experimental results show that the optimized macro partitioning can help improve the performance of logic-on-memory 3D ICs by up to 15%, at the cost of 8°C temperature increase. Assuming air cooling, our simulation shows the 3D ICs are thermally sustainable with 97°C maximum temperature.
微碰撞和混合键合技术使3D集成电路成为可能,并提供了显著的性能提升,但由于有限的3D连接密度,内存宏分区问题也变得更加复杂。在本文中,我们评估和量化了各种宏分区对商业级3D集成电路性能和温度的影响。此外,我们提出了一套分区指南和一种基于约束图的快速方法来创建内存上逻辑3D ic的平面图。实验结果表明,优化后的宏分区可以使内存上逻辑3D集成电路的性能提高15%,但代价是温度升高8°C。假设空气冷却,我们的模拟表明,3D集成电路在97°C的最高温度下是热可持续的。
{"title":"3D IC Tier Partitioning of Memory Macros: PPA vs. Thermal Tradeoffs","authors":"Lingjun Zhu, Nesara Eranna Bethur, Yi-Chen Lu, Youngsang Cho, Yunhyeok Im, S. Lim","doi":"10.1145/3531437.3539724","DOIUrl":"https://doi.org/10.1145/3531437.3539724","url":null,"abstract":"Micro-bump and hybrid bonding technologies have enabled 3D ICs and provided remarkable performance gain, but the memory macro partitioning problem also becomes more complicated due to the limited 3D connection density. In this paper, we evaluate and quantify the impacts of various macro partitioning on the performance and temperature in commercial-grade 3D ICs. In addition, we propose a set of partitioning guidelines and a quick constraint-graph-based approach to create floorplans for logic-on-memory 3D ICs. Experimental results show that the optimized macro partitioning can help improve the performance of logic-on-memory 3D ICs by up to 15%, at the cost of 8°C temperature increase. Assuming air cooling, our simulation shows the 3D ICs are thermally sustainable with 97°C maximum temperature.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126224855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hier-3D: A Hierarchical Physical Design Methodology for Face-to-Face-Bonded 3D ICs Hier-3D:面对面键合3D集成电路的分层物理设计方法
Anthony Agnesina, Moritz Brunion, A. Ortiz, F. Catthoor, D. Milojevic, M. Komalan, Matheus A. Cavalcante, Samuel Riedel, L. Benini, S. Lim
Hierarchical very-large-scale integration (VLSI) flows are an understudied yet critical approach to achieving design closure at giga-scale complexity and gigahertz frequency targets. This paper proposes a novel hierarchical physical design flow enabling the building of high-density and commercial-quality two-tier face-to-face-bonded hierarchical 3D ICs. We significantly reduce the associated manufacturing cost compared to existing 3D implementation flows and, for the first time, achieve cost competitiveness against the 2D reference in large modern designs. Experimental results on complex industrial and open manycore processors demonstrate in two advanced nodes that the proposed flow provides major power, performance, and area/cost (PPAC) improvements of 1.2 to 2.2 × compared with 2D, where all metrics are improved simultaneously, including up to power savings.
分层超大规模集成电路(VLSI)流程是一种尚未得到充分研究的关键方法,可以实现千兆级复杂性和千兆赫兹频率目标的设计闭合。本文提出了一种新的分层物理设计流程,可以构建高密度和商业质量的两层面对面键合分层3D集成电路。与现有的3D实现流程相比,我们显著降低了相关的制造成本,并且首次在大型现代设计中实现了与2D参考的成本竞争力。在复杂的工业和开放多核处理器上的实验结果表明,在两个高级节点上,与2D相比,所提出的流程提供了1.2到2.2倍的主要功率,性能和面积/成本(PPAC)改进,其中所有指标同时得到改进,包括功耗节省。
{"title":"Hier-3D: A Hierarchical Physical Design Methodology for Face-to-Face-Bonded 3D ICs","authors":"Anthony Agnesina, Moritz Brunion, A. Ortiz, F. Catthoor, D. Milojevic, M. Komalan, Matheus A. Cavalcante, Samuel Riedel, L. Benini, S. Lim","doi":"10.1145/3531437.3539702","DOIUrl":"https://doi.org/10.1145/3531437.3539702","url":null,"abstract":"Hierarchical very-large-scale integration (VLSI) flows are an understudied yet critical approach to achieving design closure at giga-scale complexity and gigahertz frequency targets. This paper proposes a novel hierarchical physical design flow enabling the building of high-density and commercial-quality two-tier face-to-face-bonded hierarchical 3D ICs. We significantly reduce the associated manufacturing cost compared to existing 3D implementation flows and, for the first time, achieve cost competitiveness against the 2D reference in large modern designs. Experimental results on complex industrial and open manycore processors demonstrate in two advanced nodes that the proposed flow provides major power, performance, and area/cost (PPAC) improvements of 1.2 to 2.2 × compared with 2D, where all metrics are improved simultaneously, including up to power savings.","PeriodicalId":116486,"journal":{"name":"Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132752521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1