2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文中文

Benchmarking of SoC-Level Hardware Vulnerabilities: A Complete Walkthrough soc级硬件漏洞的基准测试:完整的演练

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238612

Shams Tarek, Hasan Al Shaikh, Sree Ranjani Rajendran, Farimah Farahmandi

Due to the increasing complexity of modern system-on-chips (SoCs) and the diversity of the attack surface, popular SoC verification approaches used in industry and academia for detecting security-critical vulnerabilities confront several challenges. Although novel SoC security verification techniques are being proposed to overcome these challenges, qualitative and quantitative critical comparisons among them are becoming increasingly difficult due to the lack of suitable, well-validated SoC-level hardware vulnerability benchmarks that can be used to evaluate the efficacy of these security verification techniques/tools on a level playing field. In this paper, we offer a comprehensive database of SoC vulnerabilities, with a particular emphasis on emerging hardware threats that may be exploited from the software layer by attackers to violate the security requirements of the system. In this regard, 32 register transfer level (RTL) hardware vulnerability benchmarks based on three distinct RISC-V-based ISA implementations have been established and made open-source to stimulate standardized research efforts in the community. In addition, we provide a comprehensive taxonomy of the benchmarks, complete with security implications and classifications. We also offer a discussion on exploitation strategies that attackers may employ, a set of security properties associated with each vulnerability in order to detect them formally, and the difficulties encountered by typical security verification methods when attempting to detect them.

由于现代片上系统(SoC)的复杂性和攻击面多样性的增加，工业和学术界用于检测安全关键漏洞的流行SoC验证方法面临着几个挑战。尽管新的SoC安全验证技术被提出来克服这些挑战，但由于缺乏合适的，经过良好验证的SoC级硬件漏洞基准，可用于评估这些安全验证技术/工具在公平竞争环境中的有效性，因此它们之间的定性和定量关键比较变得越来越困难。在本文中，我们提供了一个SoC漏洞的综合数据库，特别强调了可能被攻击者从软件层利用来违反系统安全要求的新兴硬件威胁。在这方面，基于三种不同的基于risc - v的ISA实现的32寄存器传输级(RTL)硬件漏洞基准已经建立并开放源代码，以刺激社区的标准化研究工作。此外，我们还提供了基准测试的综合分类，包括安全含义和分类。我们还讨论了攻击者可能采用的利用策略、与每个漏洞相关联的一组安全属性，以便正式检测它们，以及在尝试检测它们时典型的安全验证方法遇到的困难。

{"title":"Benchmarking of SoC-Level Hardware Vulnerabilities: A Complete Walkthrough","authors":"Shams Tarek, Hasan Al Shaikh, Sree Ranjani Rajendran, Farimah Farahmandi","doi":"10.1109/ISVLSI59464.2023.10238612","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238612","url":null,"abstract":"Due to the increasing complexity of modern system-on-chips (SoCs) and the diversity of the attack surface, popular SoC verification approaches used in industry and academia for detecting security-critical vulnerabilities confront several challenges. Although novel SoC security verification techniques are being proposed to overcome these challenges, qualitative and quantitative critical comparisons among them are becoming increasingly difficult due to the lack of suitable, well-validated SoC-level hardware vulnerability benchmarks that can be used to evaluate the efficacy of these security verification techniques/tools on a level playing field. In this paper, we offer a comprehensive database of SoC vulnerabilities, with a particular emphasis on emerging hardware threats that may be exploited from the software layer by attackers to violate the security requirements of the system. In this regard, 32 register transfer level (RTL) hardware vulnerability benchmarks based on three distinct RISC-V-based ISA implementations have been established and made open-source to stimulate standardized research efforts in the community. In addition, we provide a comprehensive taxonomy of the benchmarks, complete with security implications and classifications. We also offer a discussion on exploitation strategies that attackers may employ, a set of security properties associated with each vulnerability in order to detect them formally, and the difficulties encountered by typical security verification methods when attempting to detect them.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134484710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

iTPM: Exploring PUF-based Keyless TPM for Security-by-Design of Smart Electronics iTPM:探索基于puf的智能电子安全设计的无钥匙TPM

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238586

Vishnu Bathalapalli, S. Mohanty, E. Kougianos, Vasanth Iyer, Bibhudutta Rout

The scope of Smart electronics and its increasing market worldwide has made cybersecurity an important challenge. The Security-by-Design (SbD) principle, an emerging cybersecurity area, focuses on building security/privacy-enabled primitives at the design stage of an electronic system. This paper proposes a novel Physical Unclonable Function (PUF) based Trusted Platform Module (TPM) for SbD primitive. The proposed SbD primitive works by performing secure verification of the PUF key using TPM’s Encryption and Decryption engine. The securely verified PUF Key is then bound to TPM using Platform Configuration Registers (PCR). PCRs in TPM facilitate a secure boot process and effective access control to TPM’s NonVolatile memory through an enhanced authorization policy. By binding PUF with PCR in TPM, a novel PUF-based access control policy can be defined, bringing in a new security ecosystem for the emerging Internet-of-Everything era. The proposed SbD approach has been experimentally validated by successfully integrating various PUF topologies with Hardware TPM.

智能电子产品的范围及其在全球范围内不断增长的市场使网络安全成为一个重要的挑战。基于设计的安全(SbD)原则是一个新兴的网络安全领域，其重点是在电子系统的设计阶段构建支持安全/隐私的原语。提出了一种基于物理不可克隆函数(PUF)的可信平台模块(TPM)。建议的SbD原语通过使用TPM的加密和解密引擎对PUF密钥执行安全验证来工作。然后使用平台配置寄存器(Platform Configuration Registers, PCR)将经过安全验证的PUF密钥绑定到TPM。TPM中的pcr通过增强的授权策略促进了安全引导过程和对TPM的非易失性内存的有效访问控制。通过在TPM中绑定PUF和PCR，可以定义一种新的基于PUF的访问控制策略，为新兴的万物互联时代带来新的安全生态系统。通过成功地将各种PUF拓扑与Hardware TPM集成，实验验证了所提出的SbD方法。

{"title":"iTPM: Exploring PUF-based Keyless TPM for Security-by-Design of Smart Electronics","authors":"Vishnu Bathalapalli, S. Mohanty, E. Kougianos, Vasanth Iyer, Bibhudutta Rout","doi":"10.1109/ISVLSI59464.2023.10238586","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238586","url":null,"abstract":"The scope of Smart electronics and its increasing market worldwide has made cybersecurity an important challenge. The Security-by-Design (SbD) principle, an emerging cybersecurity area, focuses on building security/privacy-enabled primitives at the design stage of an electronic system. This paper proposes a novel Physical Unclonable Function (PUF) based Trusted Platform Module (TPM) for SbD primitive. The proposed SbD primitive works by performing secure verification of the PUF key using TPM’s Encryption and Decryption engine. The securely verified PUF Key is then bound to TPM using Platform Configuration Registers (PCR). PCRs in TPM facilitate a secure boot process and effective access control to TPM’s NonVolatile memory through an enhanced authorization policy. By binding PUF with PCR in TPM, a novel PUF-based access control policy can be defined, bringing in a new security ecosystem for the emerging Internet-of-Everything era. The proposed SbD approach has been experimentally validated by successfully integrating various PUF topologies with Hardware TPM.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134026320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

L-BANCS: A Multi-Phase Tile Design for Nanomagnetic Logic L-BANCS:纳米磁逻辑的多相磁片设计

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238640

R. E. Formigoni, Ricardo S. Ferreira, O. P. V. Neto, J. Nacif

The CMOS (Complementary Metal Oxide Semiconductor) technology is the industry standard for chip fabrication. Currently, CMOS faces ever-increasing thermal, power, and miniaturization challenges. As a result, researchers are putting efforts into novel alternative technologies to handle these issues, such as nanomagnetic logic (NML), which uses nanomagnets to perform binary logic. This paper presents a novel L-Shaped clocking scheme to synchronize NML circuits. Our proposal is scalable, simple to use, and reduces the number of constraints for placement and routing algorithms for circuit generation. In addition, the L-Shape clocking scheme introduces tiles with a multi-phase design, which allows for a reduced area overhead at the cost of latency, solves feedback path issues, and introduces a model to work with modern NML features. Our results demonstrate a small latency trade-off for a considerable area reduction. Finally, we validate our work with layouts in Topolinano.

CMOS(互补金属氧化物半导体)技术是芯片制造的行业标准。目前，CMOS面临着越来越多的热、功率和小型化挑战。因此，研究人员正在努力开发新的替代技术来处理这些问题，例如纳米磁逻辑(NML)，它使用纳米磁体来执行二进制逻辑。提出了一种新颖的l型时钟同步方案。我们的建议是可扩展的，易于使用，并减少了电路生成的放置和路由算法的限制数量。此外，L-Shape时钟方案引入了具有多相设计的磁贴，以延迟为代价减少了面积开销，解决了反馈路径问题，并引入了一个与现代NML功能一起工作的模型。我们的结果表明，一个小的延迟权衡相当大的面积减少。最后，我们用Topolinano中的布局验证我们的工作。

引用次数: 0

Using Lyapunov Exponents and Entropy to Estimate Sensitivity to Process Variability 利用李雅普诺夫指数和熵估计对过程变异性的敏感性

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238486

E. A. Ramos, Ricardo Reis

The technology scaling of transistors makes them more susceptible to faults, such as those due to radiation effects and process variability. Faults related to process variability can cause circuits to operate outside their specification ranges. In most cases, simulations are used to analyze such effects, but simulations have high computational costs. This work aims to use the Mathematical Chaos Theory through the Lyapunov Exponents and the Entropy of a Circuit to analytically estimate the effects caused by the variability of the manufacturing process, resulting in a method that can estimate the variability to Power, Delay, and Power Delay Product (PDP) with an accuracy equivalent to simulation-based methods, but on average three hundred times faster.

晶体管的技术缩放使它们更容易受到故障的影响，例如由于辐射效应和工艺可变性造成的故障。与工艺变异性相关的故障可能导致电路在其规范范围外运行。在大多数情况下，模拟用于分析这种效应，但模拟具有很高的计算成本。这项工作旨在通过李雅普诺夫指数和电路熵使用数学混沌理论来分析估计由制造过程的可变性引起的影响，从而产生一种方法，可以估计功率，延迟和功率延迟积(PDP)的可变性，其精度相当于基于仿真的方法，但平均速度快300倍。

引用次数: 0

Machine Learning and Polynomial Chaos models for Accurate Prediction of SET Pulse Current SET脉冲电流精确预测的机器学习和多项式混沌模型

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238585

Vishu Saxena, Yash Jain, Sparsh Mittal

This research investigates how heavy-ion irradiation affects the single event transient (SET) response of 14nm silicon-on-insulator (SOI) FinFET. The researchers generally use a TCAD tool (e.g., Sentauras TCAD) for developing a SET pulse current model. However, the TCAD simulations are time-consuming, which prohibits efficient design-space exploration. We propose efficient models for predicting SET pulse current with high accuracy. We use (1) polynomial chaos (PC) based models (2) ML regression techniques (3) artificial neural networks and 1Dconvolution neural network based models. Striking of a heavy-ion leads to transient behavior, which is very different from the normal behavior. Hence, for all the above predictors, we also evaluate the corresponding piecewise predictors. While TCAD tools take 4 hours for each simulation on a high-end computer, our proposed models take much lower latency (e.g., few seconds). This allows designers to explore a larger design space. Our proposed piecewise 1D-CNN model achieves state-of-the-art MSE which is 2.15× 1$0^{-6}$ mA-squared. Overall, our study provides insights into how PC and ML-based regression models can be used to enhance the efficiency of SET analysis in circuit design.

本文研究了重离子辐照对14nm绝缘体上硅(SOI) FinFET单事件瞬态(SET)响应的影响。研究人员通常使用TCAD工具(例如Sentauras TCAD)来开发SET脉冲电流模型。然而，TCAD模拟是耗时的，这阻碍了有效的设计空间探索。我们提出了预测SET脉冲电流的有效模型，具有较高的精度。我们使用(1)基于多项式混沌(PC)的模型(2)ML回归技术(3)基于人工神经网络和1Dconvolution神经网络的模型。重离子的撞击会导致与正常行为大不相同的瞬态行为。因此，对于上述所有预测因子，我们也评估相应的分段预测因子。TCAD工具在高端计算机上进行每次模拟需要4个小时，而我们提出的模型需要更低的延迟(例如，几秒钟)。这使得设计师可以探索更大的设计空间。我们提出的分段1D-CNN模型实现了最先进的MSE，为2.15× 1$0^{-6}$ ma平方。总的来说，我们的研究提供了如何使用基于PC和ml的回归模型来提高电路设计中SET分析的效率的见解。

{"title":"Machine Learning and Polynomial Chaos models for Accurate Prediction of SET Pulse Current","authors":"Vishu Saxena, Yash Jain, Sparsh Mittal","doi":"10.1109/ISVLSI59464.2023.10238585","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238585","url":null,"abstract":"This research investigates how heavy-ion irradiation affects the single event transient (SET) response of 14nm silicon-on-insulator (SOI) FinFET. The researchers generally use a TCAD tool (e.g., Sentauras TCAD) for developing a SET pulse current model. However, the TCAD simulations are time-consuming, which prohibits efficient design-space exploration. We propose efficient models for predicting SET pulse current with high accuracy. We use (1) polynomial chaos (PC) based models (2) ML regression techniques (3) artificial neural networks and 1Dconvolution neural network based models. Striking of a heavy-ion leads to transient behavior, which is very different from the normal behavior. Hence, for all the above predictors, we also evaluate the corresponding piecewise predictors. While TCAD tools take 4 hours for each simulation on a high-end computer, our proposed models take much lower latency (e.g., few seconds). This allows designers to explore a larger design space. Our proposed piecewise 1D-CNN model achieves state-of-the-art MSE which is 2.15× 1$0^{-6}$ mA-squared. Overall, our study provides insights into how PC and ML-based regression models can be used to enhance the efficiency of SET analysis in circuit design.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122553141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reverse Engineering of RTL Controllers from Look-Up Table Netlists 从查找表网络列表的RTL控制器的逆向工程

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238540

Sundarakumar Muthukumaran, Aparajithan Nathamuni Venkatesan, Kishore Pula, Ram Venkat Narayanan, Ranga Vemuri, John Emmert

Verification of FPGA-based designs and comprehension of legacy designs can be aided by the process of reverse engineering the flattened Look-up Table (LUT) level netlists to high-level RTL representations. We propose a tool flow to extract Finite State Controllers by identifying control registers and progressively improving the accuracy of register classification. A control unit consists of one or more Finite State Machines (FSMs) which manage the execution of datapath units. The proposed tool flow has two phases. Phase 1 extracts the potential state/control registers. Phase 2 identifies the exact list of state/control registers and groups FSMs. The main goal of the proposed work is to improve the accuracy of control register identification. Three types of controllers used for experimental evaluation are standalone FSM designs with no datapath units, datapaths with a single FSM, and datapaths with multiple FSMs. Accuracy is observed to be 73% to 100% in controllers with multiple FSMs, 100% in controllers with a single FSM and standalone FSM controller designs. The average accuracy of control register detection over all the real-world designs considered is 98%.

基于fpga的设计的验证和对遗留设计的理解可以通过将扁平查找表(LUT)级别的网络列表反向工程到高级RTL表示的过程来帮助。我们提出了一种通过识别控制寄存器来提取有限状态控制器的工具流程，并逐步提高寄存器分类的准确性。控制单元由一个或多个管理数据路径单元执行的有限状态机(fsm)组成。所提出的刀具流程有两个阶段。阶段1提取潜在的状态/控制寄存器。阶段2确定状态/控制寄存器和fsm组的确切列表。所提出的工作的主要目标是提高控制寄存器识别的准确性。用于实验评估的三种类型的控制器是没有数据路径单元的独立FSM设计，具有单个FSM的数据路径和具有多个FSM的数据路径。在具有多个FSM的控制器中观察到精度为73%至100%，在具有单个FSM和独立FSM控制器设计的控制器中观察到精度为100%。在考虑的所有实际设计中，控制寄存器检测的平均精度为98%。

{"title":"Reverse Engineering of RTL Controllers from Look-Up Table Netlists","authors":"Sundarakumar Muthukumaran, Aparajithan Nathamuni Venkatesan, Kishore Pula, Ram Venkat Narayanan, Ranga Vemuri, John Emmert","doi":"10.1109/ISVLSI59464.2023.10238540","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238540","url":null,"abstract":"Verification of FPGA-based designs and comprehension of legacy designs can be aided by the process of reverse engineering the flattened Look-up Table (LUT) level netlists to high-level RTL representations. We propose a tool flow to extract Finite State Controllers by identifying control registers and progressively improving the accuracy of register classification. A control unit consists of one or more Finite State Machines (FSMs) which manage the execution of datapath units. The proposed tool flow has two phases. Phase 1 extracts the potential state/control registers. Phase 2 identifies the exact list of state/control registers and groups FSMs. The main goal of the proposed work is to improve the accuracy of control register identification. Three types of controllers used for experimental evaluation are standalone FSM designs with no datapath units, datapaths with a single FSM, and datapaths with multiple FSMs. Accuracy is observed to be 73% to 100% in controllers with multiple FSMs, 100% in controllers with a single FSM and standalone FSM controller designs. The average accuracy of control register detection over all the real-world designs considered is 98%.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123004087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IoMT Synthetic Cardiac Arrest Dataset for eHealth with AI-based Validation IoMT合成心脏骤停数据集，用于基于人工智能验证的电子健康

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238552

Joydeb Dutta, Deepak Puthal

In the present era, data plays a crucial role across various disciplines, serving as the foundation for exploration and advancements. However, in the domain of eHealth, a readily available dataset for training AI models to predict cardiac arrest using the internet of medical things (IoMT) is lacking. To bridge this gap, this research article addresses the need for a synthesized dataset that can be utilized by researchers in the eHealth field to evaluate the effectiveness of their AI/ML models. The article presents a synthesized IoMT dataset specifically designed for cardiac arrest prediction, incorporating valid ranges of IoMT-based medical features sourced from peer-reviewed journals and articles. This study offers the capability to generate synthetic datasets of varying sizes, catering to the specific requirements of researchers focused on cardiac arrest prediction for individual subjects (patients). The availability of such a dataset will contribute to the advancement of AI-driven research in the eHealth domain.

在当今时代，数据在各个学科中发挥着至关重要的作用，是探索和进步的基础。然而，在电子健康领域，缺乏一个现成的数据集来训练人工智能模型，以使用医疗物联网(IoMT)预测心脏骤停。为了弥补这一差距，这篇研究文章解决了对一个综合数据集的需求，该数据集可以被电子健康领域的研究人员用来评估他们的AI/ML模型的有效性。本文介绍了一个专门为心脏骤停预测设计的综合IoMT数据集，结合了来自同行评审期刊和文章的基于IoMT的医学特征的有效范围。这项研究提供了生成不同大小的合成数据集的能力，以满足研究人员对个体受试者(患者)心脏骤停预测的特定要求。这样一个数据集的可用性将有助于在电子卫生领域推进人工智能驱动的研究。

引用次数: 1

Fe-GCN: A 3D FeFET Memory Based PIM Accelerator for Graph Convolutional Networks 基于三维ffet存储器的图形卷积网络PIM加速器Fe-GCN

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238622

Hongtao Zhong, Yu Zhu, Longfei Luo, Taixin Li, Chen Wang, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Yongpan Liu, Liang Shi, Huazhong Yang, Xueqing Li

Graph convolutional network (GCN) has emerged as a powerful model for many graph-related tasks. In conventional von Neumann architectures, massive data movement and irregular memory access in GCN computation severely degrade the performance and computation efficiency. For GCN acceleration, processing-in-memory (PIM) is promising by reducing the data movement. However, with the emergence of large GCN computation tasks, existing 2D PIM GCN accelerators face the challenge of storing all the necessary data on chip due to the limited PIM memory capacity, resulting in unwanted external memory access and degradation of performance and energy efficiency. This paper presents Fe-GCN, a 3D PIM GCN accelerator with high memory density based on the ferroelectric field-effect transistor (FeFET) memory. Besides, to mitigate the impact of the increased latency of the 3D memory structure, several software-hardware co-optimizations are proposed. Furthermore, an edge merging technique is also proposed to increase the memory utilization for the 3D GCN mapping and computing. Experimental results show that Fe-GCN achieves on average 2,647x, 58x, 18x, and 35x speedup and 26,708x, 1,246x, 25x, and 57x energy efficiency improvement over CPU, GPU, the state-of-the-art accelerators based on RRAM PIM and ASIC, respectively.

图卷积网络(GCN)已成为许多图相关任务的强大模型。在传统的von Neumann架构中，GCN计算中大量的数据移动和不规则的内存访问严重降低了性能和计算效率。对于GCN加速，通过减少数据移动，内存处理(PIM)是有希望的。然而，随着大型GCN计算任务的出现，现有2D PIM GCN加速器由于PIM存储器容量有限，面临着将所有必要数据存储在芯片上的挑战，导致不必要的外部存储器访问，性能和能效下降。提出了一种基于铁电场效应晶体管(FeFET)存储器的高存储密度三维PIM GCN加速器Fe-GCN。此外，为了减轻三维存储结构延迟增加的影响，提出了几种软硬件协同优化方法。此外，为了提高三维GCN映射和计算的内存利用率，还提出了一种边缘合并技术。实验结果表明，Fe-GCN与CPU、GPU、基于RRAM PIM和ASIC的最先进加速器相比，分别实现了2647倍、58倍、18倍和35倍的平均加速提升，以及26708倍、1246倍、25倍和57倍的能效提升。

{"title":"Fe-GCN: A 3D FeFET Memory Based PIM Accelerator for Graph Convolutional Networks","authors":"Hongtao Zhong, Yu Zhu, Longfei Luo, Taixin Li, Chen Wang, Yixin Xu, Tian Wang, Yao Yu, N. Vijaykrishnan, Yongpan Liu, Liang Shi, Huazhong Yang, Xueqing Li","doi":"10.1109/ISVLSI59464.2023.10238622","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238622","url":null,"abstract":"Graph convolutional network (GCN) has emerged as a powerful model for many graph-related tasks. In conventional von Neumann architectures, massive data movement and irregular memory access in GCN computation severely degrade the performance and computation efficiency. For GCN acceleration, processing-in-memory (PIM) is promising by reducing the data movement. However, with the emergence of large GCN computation tasks, existing 2D PIM GCN accelerators face the challenge of storing all the necessary data on chip due to the limited PIM memory capacity, resulting in unwanted external memory access and degradation of performance and energy efficiency. This paper presents Fe-GCN, a 3D PIM GCN accelerator with high memory density based on the ferroelectric field-effect transistor (FeFET) memory. Besides, to mitigate the impact of the increased latency of the 3D memory structure, several software-hardware co-optimizations are proposed. Furthermore, an edge merging technique is also proposed to increase the memory utilization for the 3D GCN mapping and computing. Experimental results show that Fe-GCN achieves on average 2,647x, 58x, 18x, and 35x speedup and 26,708x, 1,246x, 25x, and 57x energy efficiency improvement over CPU, GPU, the state-of-the-art accelerators based on RRAM PIM and ASIC, respectively.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130699289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Versatile Signal Distribution Networks for Scalable Placement and Routing of Field-coupled Nanocomputing Technologies 场耦合纳米计算技术的可扩展布局和路由的多功能信号分配网络

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238604

Marcel Walter, B. Hien, R. Wille

Field-coupled Nanocomputing (FCN) is a promising beyond-CMOS technology that leverages physical field repulsion instead of electrical current flow to transmit information and perform computations, potentially leading to energy dissipation below the Landauer Limit and clock frequencies in the terahertz regime. Despite recent progress in the experimental realization of FCN using Silicon Dangling Bonds (SiDBs), the physical design of FCN circuits remains a challenging task due to different design constraints compared to CMOS technologies. In this paper, we present three core contributions to the FCN physical design problem, building on top of the fastest heuristic algorithm in the FCN literature, ortho. Via special routing structures called Signal Distribution Networks (SDNs), we 1) reduce area overhead, wire costs, and the number of wire-crossings in routing solutions by approximately 25%, 10%, and 17%, respectively; 2) allow the use of Majority gates to quantify their routing costs, which occur to be immense; and 3) enable the automatic placement and routing of sequential logic for the first time in the literature. Our approach can potentially pave the way for the practical implementation of the FCN technology and its advancement as a viable green alternative to conventional computing technologies.

场耦合纳米计算(FCN)是一种很有前途的超越cmos的技术，它利用物理场排斥而不是电流来传输信息和执行计算，有可能导致能量耗散低于兰道尔极限和太赫兹频率的时钟频率。尽管最近在利用硅悬空键(sidb)实现FCN的实验方面取得了进展，但与CMOS技术相比，由于不同的设计限制，FCN电路的物理设计仍然是一项具有挑战性的任务。在本文中，我们在FCN文献中最快的启发式算法ortho的基础上，提出了对FCN物理设计问题的三个核心贡献。通过称为信号分配网络(sdn)的特殊路由结构，我们1)将路由解决方案中的面积开销、线路成本和线路交叉次数分别减少了约25%、10%和17%;2)允许使用多数门来量化它们的路由成本，这似乎是巨大的;3)在文献中首次实现了顺序逻辑的自动放置和路由。我们的方法可以潜在地为FCN技术的实际实施铺平道路，并将其作为传统计算技术的可行绿色替代品。

{"title":"Versatile Signal Distribution Networks for Scalable Placement and Routing of Field-coupled Nanocomputing Technologies","authors":"Marcel Walter, B. Hien, R. Wille","doi":"10.1109/ISVLSI59464.2023.10238604","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238604","url":null,"abstract":"Field-coupled Nanocomputing (FCN) is a promising beyond-CMOS technology that leverages physical field repulsion instead of electrical current flow to transmit information and perform computations, potentially leading to energy dissipation below the Landauer Limit and clock frequencies in the terahertz regime. Despite recent progress in the experimental realization of FCN using Silicon Dangling Bonds (SiDBs), the physical design of FCN circuits remains a challenging task due to different design constraints compared to CMOS technologies. In this paper, we present three core contributions to the FCN physical design problem, building on top of the fastest heuristic algorithm in the FCN literature, ortho. Via special routing structures called Signal Distribution Networks (SDNs), we 1) reduce area overhead, wire costs, and the number of wire-crossings in routing solutions by approximately 25%, 10%, and 17%, respectively; 2) allow the use of Majority gates to quantify their routing costs, which occur to be immense; and 3) enable the automatic placement and routing of sequential logic for the first time in the literature. Our approach can potentially pave the way for the practical implementation of the FCN technology and its advancement as a viable green alternative to conventional computing technologies.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks 一种用于细粒度混合精度量化神经网络推理的3 TOPS/W RISC-V并行聚类

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2023-06-20 DOI: 10.1109/ISVLSI59464.2023.10238679

Alessandro Nadalini, Georg Rutishauser, A. Burrello, Nazareno Bruschi, Angelo Garofalo, L. Benini, Francesco Conti, D. Rossi

The emerging trend of deploying complex algorithms, such as Deep Neural networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN’s memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac&Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to $ 8.5 times$ speed-up, and an area overhead of only 5.6% with respect to the baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by $ 2 times-2.5 times$ with respect to existing solutions using fully flexible programmable processors.

随着深度神经网络(dnn)等复杂算法的部署，对物联网(IoT)终端节点的内存和能效要求越来越高。混合精度量化作为一种最小化深度神经网络内存占用和最大化其执行效率的技术，可以忽略端到端精度退化。在这项工作中，我们提出了一种新的硬件和软件堆栈，用于混合精度量化神经网络(QNNs)的节能推理。我们介绍Flex-V，一个基于RISC-V指令集架构(ISA)的处理器，它融合了mac和load混合精度点积指令;为了避免由于混合精度变量导致的编码空间的指数增长，我们将格式编码到控制状态寄存器(CSRs)中。Flex-V核心集成到一个紧密耦合的8个处理器集群中;此外，我们还为dnn的端到端部署提供了一个完整的框架，包括编译器、优化库和内存感知部署流。我们的研究结果显示，在商用22nm FDX技术中实现的集群上，高达91.5 MAC/cycle和3.26 TOPS/W，加速高达8.5倍，面积开销仅为5.6%。为了展示该架构的能力，我们使用端到端的实际qnn对其进行基准测试，相对于使用完全灵活的可编程处理器的现有解决方案，性能提高了2倍至2.5倍。

{"title":"A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks","authors":"Alessandro Nadalini, Georg Rutishauser, A. Burrello, Nazareno Bruschi, Angelo Garofalo, L. Benini, Francesco Conti, D. Rossi","doi":"10.1109/ISVLSI59464.2023.10238679","DOIUrl":"https://doi.org/10.1109/ISVLSI59464.2023.10238679","url":null,"abstract":"The emerging trend of deploying complex algorithms, such as Deep Neural networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN’s memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac&Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to $ 8.5 times$ speed-up, and an area overhead of only 5.6% with respect to the baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by $ 2 times-2.5 times$ with respect to existing solutions using fully flexible programmable processors.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124041917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀