首页 > 最新文献

ACM Transactions on Design Automation of Electronic Systems最新文献

英文 中文
A Single Bitline Highly Stable, Low Power With High Speed Half-Select Disturb Free 11T SRAM Cell 单比特线高稳定、低功耗、高速半选择无干扰 11T SRAM 单元
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-19 DOI: 10.1145/3653675
Lokesh Soni, Neeta Pandey

A half-select disturb-free 11T (HF11T) static random access memory (SRAM) cell with low power, better stability and high speed is presented in this paper. The proposed SRAM cell works well with bit-interleaving design, which enhances soft-error immunity. A comparison of the proposed HF11T cell with other cutting-edge designs such as single-ended HS free 11T (SEHF11T), a shared-pass-gate 11T (SPG11T), data-dependent stack PMOS switching 10T (DSPS10T), a single-ended half-selected robust 12T (HSR12T), and 11T SRAM cells has been made. It exhibits 4.85 × /9.19 × less read delay (TRA) and write delay (TWA), respectively as compared to other considered SRAM cells. It achieves 1.07 × /1.02 × better read and write stability, respectively than the considered SRAM cells. It shows maximum reduction of 1.68 × /4.58 × /94.72 × /9 × /145 × leakage power, read power, write power consumption, read power delay product (PDP) and write PDP respectively, than the considered SRAM cells. In addition, the proposed HF11T cell achieves 10.14 × higher Ion/Ioff ratio than the other compared cells. These improvements come with a trade-off, resulting in 1.13 × more TRA compared to SPG11T. The simulation is performed with Cadence Virtuoso 45nm CMOS technology at supply voltage (VDD) of 0.6 V.

本文介绍了一种具有低功耗、更好稳定性和高速度的半选择无干扰 11T (HF11T) 静态随机存取存储器 (SRAM) 单元。所提出的 SRAM 单元采用位交错设计,能很好地增强软抗错能力。本文将拟议的 HF11T 单元与单端无 HS 11T (SEHF11T)、共享通门 11T (SPG11T)、数据依赖堆栈 PMOS 开关 10T (DSPS10T)、单端半选择稳健 12T (HSR12T) 和 11T SRAM 单元等其他先进设计进行了比较。与其他考虑过的 SRAM 单元相比,它的读取延迟(TRA)和写入延迟(TWA)分别减少了 4.85 × /9.19 ×。与其他 SRAM 单元相比,它的读取和写入稳定性分别提高了 1.07 × /1.02 ×。与所考虑的 SRAM 单元相比,它最大限度地降低了 1.68 × /4.58 × /94.72 × /9 × /145 × 漏功率、读功率、写功耗、读功率延迟积(PDP)和写功率延迟积(PDP)。此外,拟议的 HF11T 单元的离子/离子交换比比其他单元高出 10.14 倍。这些改进是有代价的,与 SPG11T 相比,TRA 增加了 1.13 倍。仿真采用 Cadence Virtuoso 45nm CMOS 技术,电源电压(VDD)为 0.6 V。
{"title":"A Single Bitline Highly Stable, Low Power With High Speed Half-Select Disturb Free 11T SRAM Cell","authors":"Lokesh Soni, Neeta Pandey","doi":"10.1145/3653675","DOIUrl":"https://doi.org/10.1145/3653675","url":null,"abstract":"<p>A half-select disturb-free 11T (HF11T) static random access memory (SRAM) cell with low power, better stability and high speed is presented in this paper. The proposed SRAM cell works well with bit-interleaving design, which enhances soft-error immunity. A comparison of the proposed HF11T cell with other cutting-edge designs such as single-ended HS free 11T (SEHF11T), a shared-pass-gate 11T (SPG11T), data-dependent stack PMOS switching 10T (DSPS10T), a single-ended half-selected robust 12T (HSR12T), and 11T SRAM cells has been made. It exhibits 4.85 × /9.19 × less read delay (<i>T<sub>RA</sub></i>) and write delay (<i>T<sub>WA</sub></i>), respectively as compared to other considered SRAM cells. It achieves 1.07 × /1.02 × better read and write stability, respectively than the considered SRAM cells. It shows maximum reduction of 1.68 × /4.58 × /94.72 × /9 × /145 × leakage power, read power, write power consumption, read power delay product (PDP) and write PDP respectively, than the considered SRAM cells. In addition, the proposed HF11T cell achieves 10.14 × higher <i>I<sub>on</sub></i>/<i>I<sub>off</sub></i> ratio than the other compared cells. These improvements come with a trade-off, resulting in 1.13 × more <i>T<sub>RA</sub></i> compared to SPG11T. The simulation is performed with Cadence Virtuoso 45nm CMOS technology at supply voltage (<i>V<sub>DD</sub></i>) of 0.6 V.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"59 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Hyperdimensional Computing Based on Trainable Encoding and Adaptive Training for Efficient and Accurate Learning 推进基于可训练编码和自适应训练的超维计算,实现高效准确学习
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-04 DOI: 10.1145/3665891
Jiseung Kim, Hyunsei Lee, Mohsen Imani, Yeseong Kim

Hyperdimensional computing (HDC) is a computing paradigm inspired by the mechanisms of human memory, characterizing data through high-dimensional vector representations, known as hypervectors. Recent advancements in HDC have explored its potential as a learning model, leveraging its straightforward arithmetic and high efficiency. The traditional HDC frameworks are hampered by two primary static elements: randomly generated encoders and fixed learning rates. These static components significantly limit model adaptability and accuracy. The static, randomly generated encoders, while ensuring high-dimensional representation, fail to adapt to evolving data relationships, thereby constraining the model’s ability to accurately capture and learn from complex patterns. Similarly, the fixed nature of the learning rate does not account for the varying needs of the training process over time, hindering efficient convergence and optimal performance. This paper introduces (mathsf {TrainableHD} ), a novel HDC framework that enables dynamic training of the randomly generated encoder depending on the feedback of the learning data, thereby addressing the static nature of conventional HDC encoders. (mathsf {TrainableHD} ) also enhances the training performance by incorporating adaptive optimizer algorithms in learning the hypervectors. We further refine (mathsf {TrainableHD} ) with effective quantization to enhance efficiency, allowing the execution of the inference phase in low-precision accelerators. Our evaluations demonstrate that (mathsf {TrainableHD} ) significantly improves HDC accuracy by up to 27.99% (averaging 7.02%) without additional computational costs during inference, achieving a performance level comparable to state-of-the-art deep learning models. Furthermore, (mathsf {TrainableHD} ) is optimized for execution speed and energy efficiency. Compared to deep learning on a low-power GPU platform like NVIDIA Jetson Xavier, (mathsf {TrainableHD} ) is 56.4 times faster and 73 times more energy efficient. This efficiency is further augmented through the use of Encoder Interval Training (EIT) and adaptive optimizer algorithms, enhancing the training process without compromising the model’s accuracy.

超维计算(HDC)是一种受人类记忆机制启发的计算范式,它通过高维向量表示(称为超向量)来描述数据特征。HDC 的最新进展探索了其作为学习模型的潜力,利用了其简单的运算和高效率。传统的 HDC 框架受到两个主要静态元素的阻碍:随机生成的编码器和固定的学习率。这些静态元素极大地限制了模型的适应性和准确性。随机生成的静态编码器虽然能确保高维表示,但却无法适应不断变化的数据关系,从而限制了模型准确捕捉和学习复杂模式的能力。同样,学习率的固定性也没有考虑到训练过程随时间变化的需求,从而阻碍了高效收敛和最佳性能的实现。本文介绍了一种新型 HDC 框架,它能够根据学习数据的反馈动态训练随机生成的编码器,从而解决传统 HDC 编码器的静态特性。在学习超向量的过程中,(mathsf {TrainableHD}) 还加入了自适应优化算法,从而提高了训练性能。我们通过有效量化进一步完善了 (mathsf {TrainableHD} ),以提高效率,允许在低精度加速器中执行推理阶段。我们的评估表明,在推理过程中,(mathsf {TrainableHD} )在不增加额外计算成本的情况下,将HDC的准确率显著提高了27.99%(平均为7.02%),达到了与最先进的深度学习模型相当的性能水平。此外,(mathsf {TrainableHD} )还针对执行速度和能效进行了优化。与英伟达 Jetson Xavier 等低功耗 GPU 平台上的深度学习相比,(mathsf {TrainableHD}) 的速度快 56.4 倍,能效高 73 倍。通过使用编码器间隔训练(Encoder Interval Training,EIT)和自适应优化算法,在不影响模型准确性的情况下增强了训练过程,从而进一步提高了效率。
{"title":"Advancing Hyperdimensional Computing Based on Trainable Encoding and Adaptive Training for Efficient and Accurate Learning","authors":"Jiseung Kim, Hyunsei Lee, Mohsen Imani, Yeseong Kim","doi":"10.1145/3665891","DOIUrl":"https://doi.org/10.1145/3665891","url":null,"abstract":"<p>Hyperdimensional computing (HDC) is a computing paradigm inspired by the mechanisms of human memory, characterizing data through high-dimensional vector representations, known as hypervectors. Recent advancements in HDC have explored its potential as a learning model, leveraging its straightforward arithmetic and high efficiency. The traditional HDC frameworks are hampered by two primary static elements: randomly generated encoders and fixed learning rates. These static components significantly limit model adaptability and accuracy. The static, randomly generated encoders, while ensuring high-dimensional representation, fail to adapt to evolving data relationships, thereby constraining the model’s ability to accurately capture and learn from complex patterns. Similarly, the fixed nature of the learning rate does not account for the varying needs of the training process over time, hindering efficient convergence and optimal performance. This paper introduces (mathsf {TrainableHD} ), a novel HDC framework that enables dynamic training of the randomly generated encoder depending on the feedback of the learning data, thereby addressing the static nature of conventional HDC encoders. (mathsf {TrainableHD} ) also enhances the training performance by incorporating adaptive optimizer algorithms in learning the hypervectors. We further refine (mathsf {TrainableHD} ) with effective quantization to enhance efficiency, allowing the execution of the inference phase in low-precision accelerators. Our evaluations demonstrate that (mathsf {TrainableHD} ) significantly improves HDC accuracy by up to 27.99% (averaging 7.02%) without additional computational costs during inference, achieving a performance level comparable to state-of-the-art deep learning models. Furthermore, (mathsf {TrainableHD} ) is optimized for execution speed and energy efficiency. Compared to deep learning on a low-power GPU platform like NVIDIA Jetson Xavier, (mathsf {TrainableHD} ) is 56.4 times faster and 73 times more energy efficient. This efficiency is further augmented through the use of Encoder Interval Training (EIT) and adaptive optimizer algorithms, enhancing the training process without compromising the model’s accuracy.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"21 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141253460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators 面向机器学习加速器的基于 ML 的开源全栈优化框架
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-11 DOI: 10.1145/3664652
Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng

Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines power, performance, and area (PPA) analysis with frontend performance simulation, thereby achieving a realistic estimation of both backend PPA and system metrics such as runtime and energy. In addition, our framework includes a fully automated DSE technique, which optimizes backend and system metrics through an automated search of architectural and backend parameters. Experimental studies show that our approach consistently predicts backend PPA and system metrics with an average 7% or less prediction error for the ASIC implementation of two deep learning accelerator platforms, VTA and VeriGOOD-ML, in both a commercial 12 nm process and a research-oriented 45 nm process.

可参数化机器学习(ML)加速器是最近 ML 领域取得突破的产物。为了充分实现设计空间探索(DSE),我们为硬件加速的深度神经网络(DNN)和非DNN ML算法提出了一个物理设计驱动、基于学习的预测框架。它采用一种统一的方法,将功耗、性能和面积(PPA)分析与前端性能仿真相结合,从而实现对后端 PPA 以及运行时间和能耗等系统指标的真实估计。此外,我们的框架还包括全自动 DSE 技术,通过自动搜索架构和后端参数来优化后端和系统指标。实验研究表明,对于 VTA 和 VeriGOOD-ML 这两个深度学习加速器平台的 ASIC 实现,我们的方法在商用 12 纳米工艺和研究型 45 纳米工艺中都能以平均 7% 或更小的预测误差持续预测后端 PPA 和系统指标。
{"title":"An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators","authors":"Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng","doi":"10.1145/3664652","DOIUrl":"https://doi.org/10.1145/3664652","url":null,"abstract":"<p>Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines power, performance, and area (PPA) analysis with frontend performance simulation, thereby achieving a realistic estimation of both backend PPA and system metrics such as runtime and energy. In addition, our framework includes a fully automated DSE technique, which optimizes backend and system metrics through an automated search of architectural and backend parameters. Experimental studies show that our approach consistently predicts backend PPA and system metrics with an average 7% or less prediction error for the ASIC implementation of two deep learning accelerator platforms, VTA and VeriGOOD-ML, in both a commercial 12 nm process and a research-oriented 45 nm process.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"2674 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Pruning-enabled High Performance and Reliable Graph Neural Network Training on ReRAM-based Processing-in-Memory Accelerators 在基于 ReRAM 的内存处理加速器上进行数据剪枝,实现高性能、可靠的图神经网络训练
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-03 DOI: 10.1145/3656171
Chukwufumnanya Ogbogu, Biresh K. Joardar, Krishnendu Chakrabarty, Jana Doppa, Partha Pratim Pande

Graph Neural Networks (GNNs) have achieved remarkable accuracy in cognitive tasks such as predictive analytics on graph-structured data. Hence, they have become very popular in diverse real-world applications. However, GNN training with large real-world graph datasets in edge-computing scenarios is both memory- and compute-intensive. Traditional computing platforms such as CPUs and GPUs do not provide the energy efficiency and low latency required in edge intelligence applications due to their limited memory bandwidth. Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have been proposed as suitable candidates for accelerating AI applications at the edge, including GNN training. However, ReRAM-based PIM architectures suffer from low reliability due to their limited endurance, and low performance when they are used for GNN training in real-world scenarios with large graphs. In this work, we propose a learning-for-data-pruning framework, which leverages a trained Binary Graph Classifier (BGC) to reduce the size of the input data graph by pruning subgraphs early in the training process to accelerate the GNN training process on ReRAM-based architectures. The proposed light-weight BGC model reduces the amount of redundant information in input graph(s) to speed up the overall training process, improves the reliability of the ReRAM-based PIM accelerator, and reduces the overall training cost. This enables fast, energy-efficient, and reliable GNN training on ReRAM-based architectures. Our experimental results demonstrate that using this learning for data pruning framework, we can accelerate GNN training and improve the reliability of ReRAM-based PIM architectures by up to 1.6 ×, and reduce the overall training cost by 100 × compared to state-of-the-art data pruning techniques.

图神经网络(GNN)在认知任务(如对图结构数据的预测分析)中取得了卓越的准确性。因此,它们在现实世界的各种应用中非常受欢迎。然而,在边缘计算场景中使用大型真实图数据集进行 GNN 训练既耗费内存又耗费计算资源。由于内存带宽有限,CPU 和 GPU 等传统计算平台无法提供边缘智能应用所需的能效和低延迟。基于电阻式随机存取存储器(ReRAM)的内存处理(PIM)架构已被提出作为加速边缘人工智能应用(包括 GNN 训练)的合适候选方案。然而,基于 ReRAM 的 PIM 架构因其有限的耐用性而存在可靠性低的问题,而且在具有大型图的真实世界场景中用于 GNN 训练时性能低下。在这项工作中,我们提出了一种数据剪枝学习框架,它利用训练有素的二元图分类器(BGC),在训练过程的早期通过剪枝子图来减少输入数据图的大小,从而加速基于 ReRAM 架构的 GNN 训练过程。所提出的轻量级 BGC 模型减少了输入图中的冗余信息量,从而加快了整体训练过程,提高了基于 ReRAM 的 PIM 加速器的可靠性,并降低了整体训练成本。这样就能在基于 ReRAM 的架构上实现快速、节能和可靠的 GNN 训练。我们的实验结果表明,与最先进的数据剪枝技术相比,利用这种数据剪枝学习框架,我们可以加速 GNN 训练,并将基于 ReRAM 的 PIM 架构的可靠性提高 1.6 倍,将整体训练成本降低 100 倍。
{"title":"Data Pruning-enabled High Performance and Reliable Graph Neural Network Training on ReRAM-based Processing-in-Memory Accelerators","authors":"Chukwufumnanya Ogbogu, Biresh K. Joardar, Krishnendu Chakrabarty, Jana Doppa, Partha Pratim Pande","doi":"10.1145/3656171","DOIUrl":"https://doi.org/10.1145/3656171","url":null,"abstract":"<p>Graph Neural Networks (GNNs) have achieved remarkable accuracy in cognitive tasks such as predictive analytics on graph-structured data. Hence, they have become very popular in diverse real-world applications. However, GNN training with large real-world graph datasets in edge-computing scenarios is both memory- and compute-intensive. Traditional computing platforms such as CPUs and GPUs do not provide the energy efficiency and low latency required in edge intelligence applications due to their limited memory bandwidth. Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have been proposed as suitable candidates for accelerating AI applications at the edge, including GNN training. However, ReRAM-based PIM architectures suffer from low reliability due to their limited endurance, and low performance when they are used for GNN training in real-world scenarios with large graphs. In this work, we propose a learning-for-data-pruning framework, which leverages a trained Binary Graph Classifier (BGC) to reduce the size of the input data graph by pruning subgraphs early in the training process to accelerate the GNN training process on ReRAM-based architectures. The proposed light-weight BGC model reduces the amount of redundant information in input graph(s) to speed up the overall training process, improves the reliability of the ReRAM-based PIM accelerator, and reduces the overall training cost. This enables fast, energy-efficient, and reliable GNN training on ReRAM-based architectures. Our experimental results demonstrate that using this learning for data pruning framework, we can accelerate GNN training and improve the reliability of ReRAM-based PIM architectures by up to 1.6 ×, and reduce the overall training cost by 100 × compared to state-of-the-art data pruning techniques.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HLS-IRT: Hardware Trojan Insertion through Modification of Intermediate Representation During High-Level Synthesis HLS-IRT:在高层合成过程中通过修改中间表示法插入硬件木马
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-03 DOI: 10.1145/3663477
Rijoy Mukherjee, Archisman Ghosh, Rajat Subhra Chakraborty

Modern integrated circuit (IC) design incorporates the usage of proprietary computer-aided design (CAD) software and integration of third-party hardware intellectual property (IP) cores. Subsequently, the fabrication process for the design takes place in untrustworthy offshore foundries that raises concerns regarding security and reliability. Hardware Trojans (HTs) are difficult to detect malicious modifications to IC that constitute a major threat, which if undetected prior to deployment, can lead to catastrophic functional failures or the unauthorized leakage of confidential information. Apart from the risks posed by rogue human agents, recent studies have shown that high-level synthesis (HLS) CAD software can serve as a potent attack vector for inserting Hardware Trojans (HTs). In this paper, we introduce a novel automated attack vector, which we term “HLS-IRT”, by inserting HT in the register transfer logic (RTL) description of circuits generated during a HLS based IC design flow, by directly modifying the compiler-generated intermediate representation (IR) corresponding to the design. We demonstrate the attack using a design and implementation flow based on the open-source Bambu HLS software and Xilinx FPGA, on several hardware accelerators spanning different application domains. Our results show that the resulting HTs are surreptitious and effective, while incurring minimal design overhead. We also propose a novel detection scheme for HLS-IRT, since existing techniques are found to be inadequate to detect the proposed HTs.

现代集成电路(IC)设计采用了专有计算机辅助设计(CAD)软件,并集成了第三方硬件知识产权(IP)内核。随后,设计的制造过程是在不可信的离岸代工厂进行的,这引起了人们对安全性和可靠性的担忧。硬件特洛伊木马(HTs)是一种难以检测的对集成电路的恶意修改,构成了重大威胁,如果在部署前未被发现,可能会导致灾难性的功能故障或未经授权的机密信息泄露。除了不法人类代理带来的风险外,最近的研究还表明,高级合成(HLS)CAD 软件可以作为插入硬件木马(HTs)的有效攻击载体。在本文中,我们介绍了一种新的自动攻击载体,我们称之为 "HLS-IRT",它通过直接修改编译器生成的与设计相对应的中间表示(IR),在基于 HLS 的集成电路设计流程中生成的电路的寄存器传输逻辑(RTL)描述中插入 HT。我们使用基于开源 Bambu HLS 软件和赛灵思 FPGA 的设计和实现流程,在跨越不同应用领域的多个硬件加速器上演示了这种攻击。我们的结果表明,所产生的 HT 既隐蔽又有效,同时产生的设计开销极小。我们还为 HLS-IRT 提出了一种新型检测方案,因为现有技术不足以检测所提出的 HT。
{"title":"HLS-IRT: Hardware Trojan Insertion through Modification of Intermediate Representation During High-Level Synthesis","authors":"Rijoy Mukherjee, Archisman Ghosh, Rajat Subhra Chakraborty","doi":"10.1145/3663477","DOIUrl":"https://doi.org/10.1145/3663477","url":null,"abstract":"<p>Modern integrated circuit (IC) design incorporates the usage of proprietary computer-aided design (CAD) software and integration of third-party hardware intellectual property (IP) cores. Subsequently, the fabrication process for the design takes place in untrustworthy offshore foundries that raises concerns regarding security and reliability. Hardware Trojans (HTs) are difficult to detect malicious modifications to IC that constitute a major threat, which if undetected prior to deployment, can lead to catastrophic functional failures or the unauthorized leakage of confidential information. Apart from the risks posed by rogue human agents, recent studies have shown that high-level synthesis (HLS) CAD software can serve as a potent attack vector for inserting Hardware Trojans (HTs). In this paper, we introduce a novel automated attack vector, which we term “HLS-IRT”, by inserting HT in the register transfer logic (RTL) description of circuits generated during a HLS based IC design flow, by directly modifying the compiler-generated intermediate representation (IR) corresponding to the design. We demonstrate the attack using a design and implementation flow based on the open-source <i>Bambu</i> HLS software and <i>Xilinx</i> FPGA, on several hardware accelerators spanning different application domains. Our results show that the resulting HTs are surreptitious and effective, while incurring minimal design overhead. We also propose a novel detection scheme for HLS-IRT, since existing techniques are found to be inadequate to detect the proposed HTs.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"45 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepOTF: Learning Equations-constrained Prediction for Electromagnetic Behavior DeepOTF:学习受方程约束的电磁行为预测
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-01 DOI: 10.1145/3663476
Peng Xu, Siyuan XU, Tinghuan Chen, Guojin Chen, Tsungyi Ho, Bei Yu

High-quality passive devices are becoming increasingly important for the development of mobile devices and telecommunications, but obtaining such devices through simulation and analysis of electromagnetic (EM) behavior is time-consuming. To address this challenge, artificial neural network (ANN) models have emerged as an effective tool for modeling EM behavior, with NeuroTF being a representative example. However, these models are limited by the specific form of the transfer function, leading to discontinuity issues and high sensitivities. Moreover, previous methods have overlooked the physical relationship between distributed parameters, resulting in unacceptable numeric errors in the conversion results. To overcome these limitations, we propose two different neural network architectures: DeepOTF and ComplexTF. DeepOTF is a data-driven deep operator network for automatically learning feasible transfer functions for different geometric parameters. ComplexTF utilizes complex-valued neural networks to fit feasible transfer functions for different geometric parameters in the complex domain while maintaining causality and passivity. Our approach also employs an Equations-constraint Learning scheme to ensure the strict consistency of predictions and a dynamic weighting strategy to balance optimization objectives. The experimental results demonstrate that our framework shows superior performance than baseline methods, achieving up to 1700 × higher accuracy.

高质量的无源器件对移动设备和电信的发展越来越重要,但通过模拟和分析电磁(EM)行为来获得这种器件却非常耗时。为了应对这一挑战,人工神经网络(ANN)模型已成为电磁行为建模的有效工具,NeuroTF 就是一个典型的例子。然而,这些模型受到传递函数特定形式的限制,导致不连续性问题和高敏感性。此外,以前的方法忽略了分布参数之间的物理关系,导致转换结果出现不可接受的数值误差。为了克服这些局限性,我们提出了两种不同的神经网络架构:DeepOTF 和 ComplexTF。DeepOTF 是一种数据驱动的深度算子网络,用于自动学习不同几何参数的可行转换函数。ComplexTF 利用复值神经网络来拟合复域中不同几何参数的可行传递函数,同时保持因果性和被动性。我们的方法还采用了方程约束学习方案来确保预测的严格一致性,并采用动态加权策略来平衡优化目标。实验结果表明,与基线方法相比,我们的框架表现出更优越的性能,准确率最高提高了 1700 倍。
{"title":"DeepOTF: Learning Equations-constrained Prediction for Electromagnetic Behavior","authors":"Peng Xu, Siyuan XU, Tinghuan Chen, Guojin Chen, Tsungyi Ho, Bei Yu","doi":"10.1145/3663476","DOIUrl":"https://doi.org/10.1145/3663476","url":null,"abstract":"<p>High-quality passive devices are becoming increasingly important for the development of mobile devices and telecommunications, but obtaining such devices through simulation and analysis of electromagnetic (EM) behavior is time-consuming. To address this challenge, artificial neural network (ANN) models have emerged as an effective tool for modeling EM behavior, with NeuroTF being a representative example. However, these models are limited by the specific form of the transfer function, leading to discontinuity issues and high sensitivities. Moreover, previous methods have overlooked the physical relationship between distributed parameters, resulting in unacceptable numeric errors in the conversion results. To overcome these limitations, we propose two different neural network architectures: DeepOTF and ComplexTF. DeepOTF is a data-driven deep operator network for automatically learning feasible transfer functions for different geometric parameters. ComplexTF utilizes complex-valued neural networks to fit feasible transfer functions for different geometric parameters in the complex domain while maintaining causality and passivity. Our approach also employs an Equations-constraint Learning scheme to ensure the strict consistency of predictions and a dynamic weighting strategy to balance optimization objectives. The experimental results demonstrate that our framework shows superior performance than baseline methods, achieving up to 1700 × higher accuracy. </p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"53 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Permanent Stuck-At Fault injection attacks on Elephant and GIFT lightweight ciphers 针对大象和 GIFT 轻型密码的半永久性卡顿故障注入攻击
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-29 DOI: 10.1145/3662734
Priyanka Joshi, Bodhisatwa Mazumdar

Fault attacks pose a potent threat to modern cryptographic implementations, particularly those used in physically approachable embedded devices in IoT environments. Information security in such resource-constrained devices is ensured using lightweight ciphers, where combinational circuit implementations of SBox are preferable over look-up tables (LUT) as they are more efficient regarding area, power, and memory requirements. Most existing fault analysis techniques focus on fault injection in memory cells and registers. Recently, a novel fault model and analysis technique, namely Semi-Permanent Stuck-At (SPSA) fault analysis, has been proposed to evaluate the security of ciphers with combinational circuit implementation of Substitution layer elements, SBox. In this work, we propose optimized techniques to recover the key in a minimum number of ciphertexts in such implementations of lightweight ciphers. Based on the proposed techniques, a key recovery attack on the NIST lightweight cryptography (NIST-LWC) standardization process finalist, Elephant AEAD, has been proposed. The proposed key recovery attack is validated on two versions of Elephant cipher. The proposed fault analysis approach recovered the secret key within 85 − 240 ciphertexts, calculated over 1000 attack instances. To the best of our knowledge, this is the first work on fault analysis attacks on the Elephant scheme. Furthermore, an optimized combinational circuit implementation of Spongent SBox (SBox used in Elephant cipher) is proposed, having a smaller gate count than the optimized implementation reported in the literature. The proposed fault analysis techniques are validated on primary and optimized versions of Spongent SBox through Verilog simulations. Further, we pinpoint SPSA hotspots in the lightweight GIFT cipher SBox architecture. We observe that GIFT SBox exhibits resilience towards the proposed SPSA fault analysis technique under the single fault adversarial model. However, eight SPSA fault patterns reduce the nonlinearity of the SBox to zero, rendering it vulnerable to linear cryptanalysis. Conclusively, SPSA faults may adversely affect the cryptographic properties of an SBox, thereby leading to trivial key recovery. The GIFT cipher is used as an example to focus on two aspects: i) its SBox construction is resilient to the proposed SPSA analysis and therefore characterizing such constructions for SPSA resilience and, ii) an SBox even though resilient to the proposed SPSA analysis, may exhibit vulnerabilities towards other classical analysis techniques when subjected to SPSA faults. Our work reports new vulnerabilities in fault analysis in the combinational circuit implementations of cryptographic protocols.

故障攻击对现代加密实现,尤其是物联网环境中可物理接近的嵌入式设备中使用的加密实现构成了巨大威胁。在这种资源受限的设备中,使用轻量级密码可确保信息安全,其中 SBox 的组合电路实现优于查找表 (LUT),因为它们在面积、功耗和内存要求方面更高效。现有的大多数故障分析技术都侧重于内存单元和寄存器中的故障注入。最近,有人提出了一种新的故障模型和分析技术,即半永久卡住(SPSA)故障分析,用于评估采用组合电路实现替换层元素(SBox)的密码的安全性。在这项工作中,我们提出了优化技术,以便在这种轻量级密码实现中以最少的密码文本恢复密钥。基于所提出的技术,我们提出了一种针对 NIST 轻量级密码学(NIST-LWC)标准化进程入围者 Elephant AEAD 的密钥恢复攻击。提出的密钥恢复攻击在两个版本的大象密码上得到了验证。通过对 1000 个攻击实例的计算,所提出的故障分析方法在 85 - 240 个密码文本内恢复了密钥。据我们所知,这是对大象方案进行故障分析攻击的第一项工作。此外,我们还提出了 Spongent SBox(大象密码中使用的 SBox)的优化组合电路实现方法,其门数小于文献中报道的优化实现方法。通过 Verilog 仿真,我们在 Spongent SBox 的初级版本和优化版本上验证了所提出的故障分析技术。此外,我们还指出了轻量级 GIFT 密码 SBox 架构中的 SPSA 热点。我们发现,在单故障对抗模式下,GIFT SBox 对所提出的 SPSA 故障分析技术表现出很强的适应能力。然而,八种 SPSA 故障模式将 SBox 的非线性降低为零,使其容易受到线性密码分析的攻击。总之,SPSA 故障可能会对 SBox 的加密特性产生不利影响,从而导致琐碎的密钥恢复。我们以 GIFT 密码为例,重点讨论了两个方面:i) 其 SBox 结构对所提出的 SPSA 分析具有弹性,因此可以描述此类结构的 SPSA 弹性;ii) SBox 即使对所提出的 SPSA 分析具有弹性,但在受到 SPSA 故障影响时,也可能表现出对其他经典分析技术的脆弱性。我们的工作报告了密码协议组合电路实现中故障分析的新漏洞。
{"title":"Semi-Permanent Stuck-At Fault injection attacks on Elephant and GIFT lightweight ciphers","authors":"Priyanka Joshi, Bodhisatwa Mazumdar","doi":"10.1145/3662734","DOIUrl":"https://doi.org/10.1145/3662734","url":null,"abstract":"<p>Fault attacks pose a potent threat to modern cryptographic implementations, particularly those used in physically approachable embedded devices in IoT environments. Information security in such resource-constrained devices is ensured using lightweight ciphers, where combinational circuit implementations of SBox are preferable over look-up tables (LUT) as they are more efficient regarding area, power, and memory requirements. Most existing fault analysis techniques focus on fault injection in memory cells and registers. Recently, a novel fault model and analysis technique, namely <i>Semi-Permanent Stuck-At</i> (SPSA) fault analysis, has been proposed to evaluate the security of ciphers with combinational circuit implementation of <i>Substitution layer</i> elements, SBox. In this work, we propose optimized techniques to recover the key in a minimum number of ciphertexts in such implementations of lightweight ciphers. Based on the proposed techniques, a key recovery attack on the NIST lightweight cryptography (NIST-LWC) standardization process finalist, <monospace>Elephant</monospace> AEAD, has been proposed. The proposed key recovery attack is validated on two versions of <monospace>Elephant</monospace> cipher. The proposed fault analysis approach recovered the secret key within 85 − 240 ciphertexts, calculated over 1000 attack instances. To the best of our knowledge, this is the first work on fault analysis attacks on the <monospace>Elephant</monospace> scheme. Furthermore, an optimized combinational circuit implementation of <i>Spongent</i> SBox (SBox used in <monospace>Elephant</monospace> cipher) is proposed, having a smaller gate count than the optimized implementation reported in the literature. The proposed fault analysis techniques are validated on primary and optimized versions of <i>Spongent</i> SBox through Verilog simulations. Further, we pinpoint SPSA hotspots in the lightweight <monospace>GIFT</monospace> cipher SBox architecture. We observe that <monospace>GIFT</monospace> SBox exhibits resilience towards the proposed SPSA fault analysis technique under the single fault adversarial model. However, <i>eight</i> SPSA fault patterns reduce the nonlinearity of the SBox to zero, rendering it vulnerable to linear cryptanalysis. Conclusively, SPSA faults may adversely affect the cryptographic properties of an SBox, thereby leading to trivial key recovery. The <monospace>GIFT</monospace> cipher is used as an example to focus on two aspects: i) its SBox construction is resilient to the proposed SPSA analysis and therefore characterizing such constructions for SPSA resilience and, ii) an SBox even though resilient to the proposed SPSA analysis, may exhibit vulnerabilities towards other classical analysis techniques when subjected to SPSA faults. Our work reports new vulnerabilities in fault analysis in the combinational circuit implementations of cryptographic protocols.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-Chips gem5-NVDLA:人工智能片上系统的编译、调度和架构评估仿真框架
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-29 DOI: 10.1145/3661997
Chengtao Lai, Wei Zhang

Recent years have seen an increasing trend in designing AI accelerators together with the rest of the system, including CPUs and memory hierarchy. This trend calls for high-quality simulators or analytical models that enable such kind of co-exploration. Currently, the majority of such exploration is supported by AI accelerator analytical models. But such models usually overlook the non-trivial impact of congestion of shared resources, non-ideal hardware utilization and non-zero CPU scheduler overhead, which could only be modeled by cycle-level simulators. However, most simulators with full-stack toolchains are proprietary to corporations, and the few open-source simulators are suffering from either weak compilers or limited space of modeling. This framework resolves these issues by proposing a compilation and simulation flow to run arbitrary Caffe neural network models on the NVIDIA Deep Learning Accelerator (NVDLA) with gem5, a cycle-level simulator, and by adding more building blocks including scratchpad allocation, multi-accelerator scheduling, tensor-level prefetching mechanisms and a DMA-aided embedded buffer to map workload to multiple NVDLAs. The proposed framework has been tested and verified on a set of convolution neural networks, showcasing the capability of modeling complex buffer management strategies, scheduling policies and hardware architectures. As a case study of this framework, we demonstrate the importance of adopting different buffering strategies for activation and weight tensors in AI accelerators to acquire remarkable speedup.

近年来,将人工智能加速器与系统其他部分(包括中央处理器和内存层次结构)一起设计的趋势日益明显。这种趋势需要高质量的模拟器或分析模型来实现这种共同探索。目前,这种探索大多由人工智能加速器分析模型支持。但这些模型通常会忽略共享资源拥塞、非理想硬件利用率和 CPU 调度器开销不为零等非同小可的影响,而这些影响只能通过周期级模拟器来模拟。然而,大多数具有全栈工具链的模拟器都是企业专有的,而少数开源模拟器要么编译器功能不强,要么建模空间有限。为了解决这些问题,本框架提出了一种编译和仿真流程,利用周期级仿真器 gem5 在英伟达深度学习加速器(NVDLA)上运行任意 Caffe 神经网络模型,并添加了更多构建模块,包括抓板分配、多加速器调度、张量级预取机制和 DMA 辅助嵌入式缓冲区,以便将工作负载映射到多个 NVDLA。所提出的框架已在一组卷积神经网络上进行了测试和验证,展示了对复杂的缓冲区管理策略、调度策略和硬件架构进行建模的能力。作为该框架的一个案例研究,我们展示了在人工智能加速器中对激活和权重张量采用不同缓冲策略以获得显著加速的重要性。
{"title":"gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-Chips","authors":"Chengtao Lai, Wei Zhang","doi":"10.1145/3661997","DOIUrl":"https://doi.org/10.1145/3661997","url":null,"abstract":"<p>Recent years have seen an increasing trend in designing AI accelerators together with the rest of the system, including CPUs and memory hierarchy. This trend calls for high-quality simulators or analytical models that enable such kind of co-exploration. Currently, the majority of such exploration is supported by AI accelerator analytical models. But such models usually overlook the non-trivial impact of congestion of shared resources, non-ideal hardware utilization and non-zero CPU scheduler overhead, which could only be modeled by cycle-level simulators. However, most simulators with full-stack toolchains are proprietary to corporations, and the few open-source simulators are suffering from either weak compilers or limited space of modeling. This framework resolves these issues by proposing a compilation and simulation flow to run arbitrary Caffe neural network models on the NVIDIA Deep Learning Accelerator (NVDLA) with gem5, a cycle-level simulator, and by adding more building blocks including scratchpad allocation, multi-accelerator scheduling, tensor-level prefetching mechanisms and a DMA-aided embedded buffer to map workload to multiple NVDLAs. The proposed framework has been tested and verified on a set of convolution neural networks, showcasing the capability of modeling complex buffer management strategies, scheduling policies and hardware architectures. As a case study of this framework, we demonstrate the importance of adopting different buffering strategies for activation and weight tensors in AI accelerators to acquire remarkable speedup.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"31 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Watermarking for Paper-Based Digital Microfluidic Biochips 纸质数字微流控生物芯片的增强型水印技术
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-29 DOI: 10.1145/3661309
Jian-De Li, Sying-Jyan Wang, Katherine Shu-Min Li, Tsung-Yi Ho

Paper-based digital microfluidic biochip (PB-DMFB) technology provides a promising solution to many biochemical applications. However, the PB-DMFB manufacturing process may suffer from potential security threats. For example, Trojan insertion attack may affect the functionality of PB-DMFBs. To ensure the correct functionality of PB-DMFBs, we propose a watermarking scheme to hide information in the PB-DMFB layout, which allows users to check design integrity and authenticate the source of the PB-DMFB design. As a result, the proposed method serves as a countermeasure against Trojan insertion attacks in addition to proof of authorship.

纸基数字微流控生物芯片(PB-DMFB)技术为许多生化应用提供了前景广阔的解决方案。然而,PB-DMFB 的制造过程可能会受到潜在的安全威胁。例如,木马插入攻击可能会影响 PB-DMFB 的功能。为了确保 PB-DMFB 的正确功能,我们提出了一种水印方案,将信息隐藏在 PB-DMFB 布局中,从而使用户能够检查设计的完整性并验证 PB-DMFB 设计的来源。因此,除了证明作者身份外,所提出的方法还可作为一种对抗木马插入攻击的对策。
{"title":"Enhanced Watermarking for Paper-Based Digital Microfluidic Biochips","authors":"Jian-De Li, Sying-Jyan Wang, Katherine Shu-Min Li, Tsung-Yi Ho","doi":"10.1145/3661309","DOIUrl":"https://doi.org/10.1145/3661309","url":null,"abstract":"<p>Paper-based digital microfluidic biochip (PB-DMFB) technology provides a promising solution to many biochemical applications. However, the PB-DMFB manufacturing process may suffer from potential security threats. For example, Trojan insertion attack may affect the functionality of PB-DMFBs. To ensure the correct functionality of PB-DMFBs, we propose a watermarking scheme to hide information in the PB-DMFB layout, which allows users to check design integrity and authenticate the source of the PB-DMFB design. As a result, the proposed method serves as a countermeasure against Trojan insertion attacks in addition to proof of authorship.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140811286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Compiler Technology for Software-based Hardware Fault Detection 基于软件的硬件故障检测的增强型编译器技术
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-22 DOI: 10.1145/3660524
Davide Baroffio, Federico Reghenzani, William Fornaciari

Software-Implemented Hardware Fault Tolerance (SIHFT) is a modern approach for tackling random hardware faults of dependable systems employing solely software solutions. This work extends an automatic compiler-based SIHFT hardening tool called ASPIS, enhancing it with novel protection mechanisms and overhead-reduction techniques, also providing an extensive analysis of its compliance with the non-trivial workload of the open-source Real-Time Operating System FreeRTOS. A thorough experimental fault-injection campaign on an STM32 board shows how the system achieves remarkably high tolerance to single-event upsets and a comparison between the SIHFT mechanisms implemented summarises the trade-off between the overhead introduced and the detection capabilities of the various solutions.

软件实现的硬件容错(SIHFT)是一种解决仅采用软件解决方案的可靠系统随机硬件故障的现代方法。这项工作扩展了基于编译器的 SIHFT 自动加固工具 ASPIS,通过新颖的保护机制和减少开销技术对其进行了增强,同时还对其与开源实时操作系统 FreeRTOS 的非简单工作负载的兼容性进行了广泛分析。在 STM32 电路板上进行的全面故障注入实验表明,该系统对单次事件中断具有极高的耐受性,对已实施的 SIHFT 机制进行的比较总结了各种解决方案在开销和检测能力之间的权衡。
{"title":"Enhanced Compiler Technology for Software-based Hardware Fault Detection","authors":"Davide Baroffio, Federico Reghenzani, William Fornaciari","doi":"10.1145/3660524","DOIUrl":"https://doi.org/10.1145/3660524","url":null,"abstract":"<p>Software-Implemented Hardware Fault Tolerance (SIHFT) is a modern approach for tackling random hardware faults of dependable systems employing solely software solutions. This work extends an automatic compiler-based SIHFT hardening tool called ASPIS, enhancing it with novel protection mechanisms and overhead-reduction techniques, also providing an extensive analysis of its compliance with the non-trivial workload of the open-source Real-Time Operating System FreeRTOS. A thorough experimental fault-injection campaign on an STM32 board shows how the system achieves remarkably high tolerance to single-event upsets and a comparison between the SIHFT mechanisms implemented summarises the trade-off between the overhead introduced and the detection capabilities of the various solutions.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"211 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140634581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Design Automation of Electronic Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1