首页 > 最新文献

2023 IEEE European Test Symposium (ETS)最新文献

英文 中文
Online Reliability Evaluation Design: Select Reliable CRPs for Arbiter PUF and Its Variants 联机可靠性评估设计:仲裁者PUF及其变体的可靠crp选择
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10174198
Chaofang Ma, Jianan Mu, Jing Ye, Shuai Chen, Yuan Cao, Huawei Li, Xiaowei Li
Physical Unclonable Function (PUF) is a hardware security primitive with broad application prospects. Variants of the arbiter PUF have been proposed to resist modeling attacks. However, their low reliability issue limits their applications. To solve the low reliability issue, this paper proposes an Online Reliability Evaluation (ORE) design for the arbiter PUF and its variants. Moreover, a corresponding machine learning method to select reliable Challenge Response Pairs (CRPs) for applications is proposed. Based on the ORE design, a small number of CRPs and their reliability levels are collected during the enrollment phase. Then they are trained to build reliability models for predicting the responses and reliability levels of other challenges. Since the ORE design does not change the security structures of the arbiter PUF and its variants, the resistance to modeling attacks of PUF designs equipped with it is maintained. Compared to the previous work that tests 100,000 times per CRP, our design is time-saving in the enrollment phase since each CRP is only tested three times for training reliability models. The proposed design is implemented under the 40nm process. Experimental results on real chips show that all the CRPs selected by our reliability models are indeed reliable for applications, verifying the effectiveness of our method.
物理不可克隆函数(PUF)是一种具有广泛应用前景的硬件安全原语。已经提出了仲裁器PUF的变体来抵抗建模攻击。然而,它们的低可靠性问题限制了它们的应用。为了解决低可靠性问题,本文提出了一种针对仲裁PUF及其变体的在线可靠性评估(ORE)设计。此外,提出了一种相应的机器学习方法来选择可靠的挑战响应对(CRPs)。基于ORE设计,在注册阶段收集少量的crp及其可靠性水平。然后训练他们建立可靠性模型来预测其他挑战的反应和可靠性水平。由于ORE设计不改变仲裁PUF及其变体的安全结构,因此配备它的PUF设计对建模攻击的抵抗力得以保持。与之前每个CRP测试100,000次的工作相比,我们的设计在登记阶段节省了时间,因为每个CRP只测试三次用于训练可靠性模型。该设计是在40nm制程下实现的。在实际芯片上的实验结果表明,我们的可靠性模型所选择的所有crp都是可靠的,验证了我们的方法的有效性。
{"title":"Online Reliability Evaluation Design: Select Reliable CRPs for Arbiter PUF and Its Variants","authors":"Chaofang Ma, Jianan Mu, Jing Ye, Shuai Chen, Yuan Cao, Huawei Li, Xiaowei Li","doi":"10.1109/ETS56758.2023.10174198","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10174198","url":null,"abstract":"Physical Unclonable Function (PUF) is a hardware security primitive with broad application prospects. Variants of the arbiter PUF have been proposed to resist modeling attacks. However, their low reliability issue limits their applications. To solve the low reliability issue, this paper proposes an Online Reliability Evaluation (ORE) design for the arbiter PUF and its variants. Moreover, a corresponding machine learning method to select reliable Challenge Response Pairs (CRPs) for applications is proposed. Based on the ORE design, a small number of CRPs and their reliability levels are collected during the enrollment phase. Then they are trained to build reliability models for predicting the responses and reliability levels of other challenges. Since the ORE design does not change the security structures of the arbiter PUF and its variants, the resistance to modeling attacks of PUF designs equipped with it is maintained. Compared to the previous work that tests 100,000 times per CRP, our design is time-saving in the enrollment phase since each CRP is only tested three times for training reliability models. The proposed design is implemented under the 40nm process. Experimental results on real chips show that all the CRPs selected by our reliability models are indeed reliable for applications, verifying the effectiveness of our method.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121354810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harvesting Wasted Clock Cycles for Efficient Online Testing 为有效的在线测试收集浪费的时钟周期
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10173955
Eslam Yassien, Yongjia Xu, Hui Jiang, Thach Nguyen, Jennifer Dworak, T. Manikas, Kundan Nepal
Mission-critical systems often require some testing to occur while the system is running. In many cases, this involves taking parts of the system off-line temporarily to apply the tests. However, hazards that occur during regular processor execution require the addition of stall cycles to maintain program correctness. These stall cycles generally perform no other function. In this paper, we focus on testing the ALU during those stall cycles to identify new errors or defects that arise during program execution due to aging and increased temperature that may slow down the circuitry or cause permanent defects. We investigate the time to detection of a fault (both stuck-at and transition) that may have caused silent data corruption. In addition, we identify the relationship between the programs running and the list of functional faults and how this impacts the test set length. Finally, we discuss area and performance impacts for the physical implementation of the approach.
关键任务系统通常需要在系统运行时进行一些测试。在许多情况下,这涉及到将部分系统暂时脱机以应用测试。然而,在正常的处理器执行过程中发生的危险需要增加失速周期来保持程序的正确性。这些失速循环通常没有其他功能。在本文中,我们着重于在这些失速周期中测试ALU,以识别由于老化和温度升高而导致的程序执行过程中出现的新错误或缺陷,这些错误或缺陷可能会减慢电路或导致永久缺陷。我们调查检测到可能导致静默数据损坏的故障(卡滞和转换)所需的时间。此外,我们确定运行的程序和功能故障列表之间的关系,以及这如何影响测试集长度。最后,我们讨论了该方法的物理实施对区域和性能的影响。
{"title":"Harvesting Wasted Clock Cycles for Efficient Online Testing","authors":"Eslam Yassien, Yongjia Xu, Hui Jiang, Thach Nguyen, Jennifer Dworak, T. Manikas, Kundan Nepal","doi":"10.1109/ETS56758.2023.10173955","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10173955","url":null,"abstract":"Mission-critical systems often require some testing to occur while the system is running. In many cases, this involves taking parts of the system off-line temporarily to apply the tests. However, hazards that occur during regular processor execution require the addition of stall cycles to maintain program correctness. These stall cycles generally perform no other function. In this paper, we focus on testing the ALU during those stall cycles to identify new errors or defects that arise during program execution due to aging and increased temperature that may slow down the circuitry or cause permanent defects. We investigate the time to detection of a fault (both stuck-at and transition) that may have caused silent data corruption. In addition, we identify the relationship between the programs running and the list of functional faults and how this impacts the test set length. Finally, we discuss area and performance impacts for the physical implementation of the approach.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126743571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global Tuning for System Performance Optimization of RF MIMO Radars 射频MIMO雷达系统性能优化的全局调谐
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10174159
Ferhat Can Ataman, Muslum Emir Avci, Chethan Kumar Y. B., S. Ozev
RF systems, including RF MIMO RADARs, are increasingly integrated with digital systems in fine-geometry processes. Due to the prevalent use of RF MIMO RADARs in automotive and other safety-critical applications, in-field testing and tuning of these systems are needed to meet performance and safety targets. The fundamental performance targets of an RF MIMO system include the signal-to-noise ratio at the end of the receiver chain, matching characteristics between different signal paths, gain, noise figure, and linearity of the RF front end. In a RADAR device, matching between signal paths affects the angular resolution of the system. The gain and noise figure of the receiver control the maximum distance and the smallest object that the system can detect. In this work, we present a global tuning algorithm for RF MIMO RADARs to meet critical system performance targets while minimizing power consumption. The efficacy of the method is demonstrated with extensive simulations and hardware experiments.
射频系统,包括射频MIMO雷达,越来越多地与精细几何过程中的数字系统集成。由于射频MIMO雷达在汽车和其他安全关键应用中的广泛使用,需要对这些系统进行现场测试和调整,以满足性能和安全目标。射频MIMO系统的基本性能指标包括接收机链末端的信噪比、不同信号路径之间的匹配特性、增益、噪声系数和射频前端的线性度。在雷达设备中,信号路径之间的匹配会影响系统的角分辨率。接收机的增益和噪声系数控制着系统能检测到的最大距离和最小目标。在这项工作中,我们提出了一种射频MIMO雷达的全局调谐算法,以满足关键的系统性能目标,同时最小化功耗。通过大量的仿真和硬件实验证明了该方法的有效性。
{"title":"Global Tuning for System Performance Optimization of RF MIMO Radars","authors":"Ferhat Can Ataman, Muslum Emir Avci, Chethan Kumar Y. B., S. Ozev","doi":"10.1109/ETS56758.2023.10174159","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10174159","url":null,"abstract":"RF systems, including RF MIMO RADARs, are increasingly integrated with digital systems in fine-geometry processes. Due to the prevalent use of RF MIMO RADARs in automotive and other safety-critical applications, in-field testing and tuning of these systems are needed to meet performance and safety targets. The fundamental performance targets of an RF MIMO system include the signal-to-noise ratio at the end of the receiver chain, matching characteristics between different signal paths, gain, noise figure, and linearity of the RF front end. In a RADAR device, matching between signal paths affects the angular resolution of the system. The gain and noise figure of the receiver control the maximum distance and the smallest object that the system can detect. In this work, we present a global tuning algorithm for RF MIMO RADARs to meet critical system performance targets while minimizing power consumption. The efficacy of the method is demonstrated with extensive simulations and hardware experiments.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128433167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automating the Generation of Functional Stress Inducing Stimuli for Burn-In Testing 自动生成功能应力诱导刺激的老化测试
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10174232
N. I. Deligiannis, Tobias Faller, Chenghan Zhou, R. Cantoro, B. Becker, M. Reorda
In the domain of high reliability applications, Burn-In testing (BI) is always present since it is one of the prime countermeasures against the infant mortality phenomenon. Traditional static BI testing proves to be inefficient for modern circuit designs. As the devices’ feature size scales down and their structural and architectural complexity increases, so does the complexity and cost of the BI test. Different BI methods are employed by the industry where stimuli are also applied to the devices under test (DUTs) in order to effectively stress and stimulate all nets of the design. One known industry practice resorts to Design for Testability (DfT) infrastructures (e.g., scan) and is based on the application of test vectors at low frequency to excite the DUT as much as possible with the goal of switching each net of the design at least once. In this paper we consider the case where the layout of the circuit is known and propose two novel methods able to automatically produce functional stimuli to switch pairs of neighboring nodes (i.e., nodes that are placed within a specified distance in the DUT) in short periods of time. This solution has been shown to be able to trigger some latent defects in a circuit better than other methods. As a case study, we target functional units within a RISC-V processor (RI5CY). We show that the functional stimuli generated by the exact method described in the paper are able to achieve optimal results (i.e., the maximum functional switching of neighboring pairs), thus maximizing the chance that their at-speed application can activate weak points in the circuit.
在高可靠性应用领域,老化测试(BI)一直存在,因为它是防止婴儿死亡现象的主要对策之一。传统的静态BI测试在现代电路设计中被证明是低效的。随着设备的功能尺寸缩小,其结构和架构复杂性增加,BI测试的复杂性和成本也随之增加。工业采用了不同的BI方法,其中刺激也应用于被测设备(dut),以便有效地对设计的所有网络施加压力和刺激。一个已知的行业实践是采用可测试性设计(DfT)基础设施(例如,扫描),并基于低频测试向量的应用,以尽可能多地激发被测设备,目标是至少切换设计的每个网络一次。在本文中,我们考虑了电路布局已知的情况,并提出了两种新颖的方法,能够在短时间内自动产生功能刺激来切换相邻节点对(即,在DUT中放置在指定距离内的节点)。这种解决方案已被证明能够比其他方法更好地触发电路中的一些潜在缺陷。作为一个案例研究,我们的目标是RISC-V处理器(RI5CY)中的功能单元。我们表明,由本文中描述的精确方法产生的功能刺激能够达到最佳结果(即相邻对的最大功能切换),从而最大限度地提高了它们的高速应用可以激活电路中的弱点的机会。
{"title":"Automating the Generation of Functional Stress Inducing Stimuli for Burn-In Testing","authors":"N. I. Deligiannis, Tobias Faller, Chenghan Zhou, R. Cantoro, B. Becker, M. Reorda","doi":"10.1109/ETS56758.2023.10174232","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10174232","url":null,"abstract":"In the domain of high reliability applications, Burn-In testing (BI) is always present since it is one of the prime countermeasures against the infant mortality phenomenon. Traditional static BI testing proves to be inefficient for modern circuit designs. As the devices’ feature size scales down and their structural and architectural complexity increases, so does the complexity and cost of the BI test. Different BI methods are employed by the industry where stimuli are also applied to the devices under test (DUTs) in order to effectively stress and stimulate all nets of the design. One known industry practice resorts to Design for Testability (DfT) infrastructures (e.g., scan) and is based on the application of test vectors at low frequency to excite the DUT as much as possible with the goal of switching each net of the design at least once. In this paper we consider the case where the layout of the circuit is known and propose two novel methods able to automatically produce functional stimuli to switch pairs of neighboring nodes (i.e., nodes that are placed within a specified distance in the DUT) in short periods of time. This solution has been shown to be able to trigger some latent defects in a circuit better than other methods. As a case study, we target functional units within a RISC-V processor (RI5CY). We show that the functional stimuli generated by the exact method described in the paper are able to achieve optimal results (i.e., the maximum functional switching of neighboring pairs), thus maximizing the chance that their at-speed application can activate weak points in the circuit.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121389494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spotlight: An Impairing Packet Transmission Attack Targeting Specific Node in NoC-based TCMP 聚焦:基于noc的tcp协议中针对特定节点的破坏性数据包传输攻击
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10174197
Jiaoyan Yao, Ying Zhang, Yifeng Hua, Yuanxiang Li, Jizhong Yang, Xin Chen
As the communication infrastructure utilized by Tiled Chip Multicore Processors (TCMP), Network-on-Chip (NoC) has been subject to serious security vulnerabilities due to hardware Trojans (HTs) concealed in potentially insecure 3PIPs. To satisfy the need for secure NoCs, it is vital to model potential attacks and analyze their impacts on NoC performance. This paper proposes a novel and covert HT model called Spotlight targeting specific victim node in XY-routing NoC to optimize the attacking effect. By inserting Trojans into special nodes and modifying the arbiters of input ports within the switch allocator of router, packets flowing to the victim node are unfairly treated causing considerable latency. As a result, the HT effectively degrades the transmission of packets while having a subtle impact on other NoC performance. The proposed HT is inserted into Garnet 2.0 of Gem5 simulator for performance evaluation. Experimental results indicate that the Spotlight attack increased the average delay of target packets by 12.16 cycles. Compared to some DoS attacks, the proposed Trojan affected packet transmission with fewer packets, causing minimal fluctuates in NoC metrics such as average latency. And the area and power overheads are only 0.94% and 0.11%, respectively.
作为平铺式芯片多核处理器(TCMP)使用的通信基础设施,片上网络(NoC)由于隐藏在潜在不安全的3pip中的硬件木马(ht)而受到严重的安全漏洞的影响。为了满足对安全NoC的需求,对潜在攻击进行建模并分析其对NoC性能的影响至关重要。针对xy路由NoC中特定受害节点,提出了一种新颖的隐蔽HT模型Spotlight,以优化攻击效果。通过在特殊节点中插入木马并修改路由器的交换机分配器中的输入端口仲裁器,流向受害节点的数据包会受到不公平的处理,导致相当大的延迟。因此,高温有效地降低了数据包的传输,同时对其他NoC性能产生了微妙的影响。将所提出的HT插入Gem5模拟器的Garnet 2.0中进行性能评估。实验结果表明,Spotlight攻击使目标数据包的平均延迟增加了12.16个周期。与某些DoS攻击相比,该木马影响的数据包传输数量更少,导致平均延迟等NoC指标波动最小。面积开销和功耗开销分别仅为0.94%和0.11%。
{"title":"Spotlight: An Impairing Packet Transmission Attack Targeting Specific Node in NoC-based TCMP","authors":"Jiaoyan Yao, Ying Zhang, Yifeng Hua, Yuanxiang Li, Jizhong Yang, Xin Chen","doi":"10.1109/ETS56758.2023.10174197","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10174197","url":null,"abstract":"As the communication infrastructure utilized by Tiled Chip Multicore Processors (TCMP), Network-on-Chip (NoC) has been subject to serious security vulnerabilities due to hardware Trojans (HTs) concealed in potentially insecure 3PIPs. To satisfy the need for secure NoCs, it is vital to model potential attacks and analyze their impacts on NoC performance. This paper proposes a novel and covert HT model called Spotlight targeting specific victim node in XY-routing NoC to optimize the attacking effect. By inserting Trojans into special nodes and modifying the arbiters of input ports within the switch allocator of router, packets flowing to the victim node are unfairly treated causing considerable latency. As a result, the HT effectively degrades the transmission of packets while having a subtle impact on other NoC performance. The proposed HT is inserted into Garnet 2.0 of Gem5 simulator for performance evaluation. Experimental results indicate that the Spotlight attack increased the average delay of target packets by 12.16 cycles. Compared to some DoS attacks, the proposed Trojan affected packet transmission with fewer packets, causing minimal fluctuates in NoC metrics such as average latency. And the area and power overheads are only 0.94% and 0.11%, respectively.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127757457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automating Greybox System-Level Test Generation 自动化灰盒系统级测试生成
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10173985
Denis Schwachhofer, M. Betka, Steffen Becker, S. Wagner, M. Sauer, I. Polian
System-Level Test (SLT) emerged as an additional test step to detect manufacturing defects not caught by traditional testing. For SLT, the Device Under Test (DUT) is embedded into an environment that emulates the end-user application as closely as possible and runs workloads composed of existing off-the-shelf software. We present an automatic greybox SLT program generation method to find code snippets that control the DUT’s extra-functional properties, to achieve better characterization, or to improve the coverage of emerging defect types. In contrast to ATPG or formal methods, our method does not require structural information and relies solely on simulation results or hardware measurements to guide the generation. We show that our method outperforms hand-crafted snippets on a RISC-V super-scalar processor and look into possible reasons why the snippets perform the way they do.
系统级测试(SLT)作为一个额外的测试步骤出现,用于检测传统测试无法捕获的制造缺陷。对于SLT,被测设备(Device Under Test, DUT)被嵌入到一个环境中,该环境尽可能地模拟最终用户应用程序,并运行由现有现成软件组成的工作负载。我们提出了一种自动灰盒SLT程序生成方法来查找控制DUT的额外功能属性的代码片段,以实现更好的特性描述,或者改进出现的缺陷类型的覆盖率。与ATPG或形式化方法相比,我们的方法不需要结构信息,仅依赖于仿真结果或硬件测量来指导生成。我们展示了我们的方法在RISC-V超标量处理器上优于手工制作的代码片段,并研究了代码片段执行方式的可能原因。
{"title":"Automating Greybox System-Level Test Generation","authors":"Denis Schwachhofer, M. Betka, Steffen Becker, S. Wagner, M. Sauer, I. Polian","doi":"10.1109/ETS56758.2023.10173985","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10173985","url":null,"abstract":"System-Level Test (SLT) emerged as an additional test step to detect manufacturing defects not caught by traditional testing. For SLT, the Device Under Test (DUT) is embedded into an environment that emulates the end-user application as closely as possible and runs workloads composed of existing off-the-shelf software. We present an automatic greybox SLT program generation method to find code snippets that control the DUT’s extra-functional properties, to achieve better characterization, or to improve the coverage of emerging defect types. In contrast to ATPG or formal methods, our method does not require structural information and relies solely on simulation results or hardware measurements to guide the generation. We show that our method outperforms hand-crafted snippets on a RISC-V super-scalar processor and look into possible reasons why the snippets perform the way they do.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132297170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Understanding and Improving GPUs' Reliability Combining Beam Experiments with Fault Simulation 波束实验与故障仿真相结合对gpu可靠性的认识与改进
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10174206
F. Santos, L. Carro, P. Rech
Graphics Processing Units (GPUs) are being employed in High Performance Computing (HPC) and safety-critical applications, such as autonomous vehicles. This market shift led to significant improvements in the programming frameworks and performance evaluation tools and concerns about their reliability. GPU reliability evaluation is extremely challenging due to the parallel nature and high complexity of GPU architectures. We conducted the first cross-layer GPU reliability evaluation to unveil (and mitigate) GPU vulnerabilities. The proposed evaluation is achieved by comparing and combining extensive high-energy neutron beam experiments, massive fault simulation campaigns at both Register-Transfer Level (RTL) and software levels, and application profiling. Based on this extensive and detailed analysis, a novel accurate methodology to accurately estimate GPUs application FIT rate is proposed. Moreover, by employing the knowledge obtained from the cross-layer reliability evaluation, two novel hardening solutions for HPC and safety-critical applications are proposed: (1) Reduced Precision Duplication With Comparison (RP-DWC), which executes a redundant copy in a reduced precision. RP-DWC delivers excellent fault coverage, up to 86%, with minimal execution time and energy consumption overheads (13% and 24%, respectively). (2) Dedicated software solutions for hardening Convolutional Neural Networks (CNNs) that can correct up to 98% of the CNN errors.
图形处理单元(gpu)被用于高性能计算(HPC)和安全关键应用,如自动驾驶汽车。这种市场转变导致了编程框架和性能评估工具的重大改进以及对其可靠性的关注。由于GPU架构的并行性和高复杂性,GPU可靠性评估极具挑战性。我们进行了第一次跨层GPU可靠性评估,以揭示(并减轻)GPU漏洞。所提出的评估是通过比较和结合大量的高能中子束实验、在注册-传输级别(RTL)和软件级别的大规模故障模拟活动以及应用分析来实现的。在此基础上,提出了一种准确估计gpu应用FIT率的新方法。此外,利用从跨层可靠性评估中获得的知识,针对高性能计算和安全关键应用,提出了两种新的强化方案:(1)降低精度重复与比较(RP-DWC),以降低精度执行冗余副本。RP-DWC提供了出色的故障覆盖率,高达86%,执行时间和能耗开销最小(分别为13%和24%)。(2)强化卷积神经网络(CNN)的专用软件解决方案,可以纠正高达98%的CNN错误。
{"title":"Understanding and Improving GPUs' Reliability Combining Beam Experiments with Fault Simulation","authors":"F. Santos, L. Carro, P. Rech","doi":"10.1109/ETS56758.2023.10174206","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10174206","url":null,"abstract":"Graphics Processing Units (GPUs) are being employed in High Performance Computing (HPC) and safety-critical applications, such as autonomous vehicles. This market shift led to significant improvements in the programming frameworks and performance evaluation tools and concerns about their reliability. GPU reliability evaluation is extremely challenging due to the parallel nature and high complexity of GPU architectures. We conducted the first cross-layer GPU reliability evaluation to unveil (and mitigate) GPU vulnerabilities. The proposed evaluation is achieved by comparing and combining extensive high-energy neutron beam experiments, massive fault simulation campaigns at both Register-Transfer Level (RTL) and software levels, and application profiling. Based on this extensive and detailed analysis, a novel accurate methodology to accurately estimate GPUs application FIT rate is proposed. Moreover, by employing the knowledge obtained from the cross-layer reliability evaluation, two novel hardening solutions for HPC and safety-critical applications are proposed: (1) Reduced Precision Duplication With Comparison (RP-DWC), which executes a redundant copy in a reduced precision. RP-DWC delivers excellent fault coverage, up to 86%, with minimal execution time and energy consumption overheads (13% and 24%, respectively). (2) Dedicated software solutions for hardening Convolutional Neural Networks (CNNs) that can correct up to 98% of the CNN errors.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129404934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Neural Networks, Exploring the Future of Computing Hardware 超越神经网络,探索计算硬件的未来
Pub Date : 2023-05-22 DOI: 10.1109/ets56758.2023.10174029
{"title":"Beyond Neural Networks, Exploring the Future of Computing Hardware","authors":"","doi":"10.1109/ets56758.2023.10174029","DOIUrl":"https://doi.org/10.1109/ets56758.2023.10174029","url":null,"abstract":"","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133003979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning-Based Characterization Models for Quality Assurance of Emerging Memory Technologies 基于学习的表征模型用于新兴存储技术的质量保证
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10174202
Xhesila Xhafa, P. Girard, A. Virazel
The shrinking of technology nodes has led to high-density memories containing large amounts of transistors which are prone to defects and reliability issues. Their test is generally based on the use of well-known March algorithms targeting Functional Fault Models (FFMs). This Ph.D. thesis aims to introduce a novel approach for advanced and emerging memory testing that relies on the Cell-Aware (CA) methodology to further improve the yield of System on Chips (SoCs).
技术节点的缩小导致包含大量晶体管的高密度存储器容易出现缺陷和可靠性问题。他们的测试通常基于针对功能故障模型(ffm)的著名March算法的使用。本博士论文旨在介绍一种基于Cell-Aware (CA)方法的先进和新兴内存测试的新方法,以进一步提高片上系统(soc)的产量。
{"title":"Learning-Based Characterization Models for Quality Assurance of Emerging Memory Technologies","authors":"Xhesila Xhafa, P. Girard, A. Virazel","doi":"10.1109/ETS56758.2023.10174202","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10174202","url":null,"abstract":"The shrinking of technology nodes has led to high-density memories containing large amounts of transistors which are prone to defects and reliability issues. Their test is generally based on the use of well-known March algorithms targeting Functional Fault Models (FFMs). This Ph.D. thesis aims to introduce a novel approach for advanced and emerging memory testing that relies on the Cell-Aware (CA) methodology to further improve the yield of System on Chips (SoCs).","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126580607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-coverage analog IP block test generation methodology using low-cost signal generation and output response analysis 使用低成本信号生成和输出响应分析的高覆盖模拟IP块测试生成方法
Pub Date : 2023-05-22 DOI: 10.1109/ETS56758.2023.10173963
Jhon Gomez, Nektar Xama, Anthony Coyette, Ronny Vanhooren, Wim Dobbelaere, G. Gielen
Today, testing of AMS circuits needs to improve quality towards ppb test escape levels as well as decrease the test development time to reduce the IC lead time. A defect-oriented solution can improve quality by focusing on structural tests that can detect defects more efficiently than traditional functional tests, while test reuse can decrease test development time on ICs built with reusable IP blocks. A defect-oriented built-in self-test (BIST) approach integrates both solutions. This paper proposes a test development methodology for analog IP blocks based on such defect-oriented BIST framework. The methodology allows for achieving the target defect coverage at the lowest possible cost. Co-designing the IP with the DfT structures allows accounting for any non-idealities that the DfT may add to the IP. Test structures cost is limited by using low-cost signal generation and a new output response analyzer (ORA). The proposed methodology is demonstrated on two case studies. The results show that coverages higher than 90% are possible using a simple digital pulse signal and an ORA with only 4 bits of accuracy, while coverages higher than 95% are possible with 6 bits, offering a good trade-off between coverage and cost.
今天,AMS电路的测试需要提高质量,达到ppb测试逃逸水平,并减少测试开发时间,以减少IC的交货时间。面向缺陷的解决方案可以通过关注比传统功能测试更有效地检测缺陷的结构测试来提高质量,而测试重用可以减少使用可重用IP块构建的ic的测试开发时间。面向缺陷的内置自检(BIST)方法集成了这两种解决方案。本文提出了一种基于这种缺陷导向的BIST框架的模拟IP模块测试开发方法。该方法允许以尽可能低的成本实现目标缺陷覆盖。与DfT结构共同设计IP允许考虑DfT可能添加到IP中的任何非理想性。使用低成本的信号产生和新的输出响应分析仪(ORA)限制了测试结构的成本。提出的方法在两个案例研究中得到证明。结果表明,使用简单的数字脉冲信号和只有4位精度的ORA,覆盖率可能高于90%,而6位的覆盖率可能高于95%,在覆盖率和成本之间提供了很好的权衡。
{"title":"High-coverage analog IP block test generation methodology using low-cost signal generation and output response analysis","authors":"Jhon Gomez, Nektar Xama, Anthony Coyette, Ronny Vanhooren, Wim Dobbelaere, G. Gielen","doi":"10.1109/ETS56758.2023.10173963","DOIUrl":"https://doi.org/10.1109/ETS56758.2023.10173963","url":null,"abstract":"Today, testing of AMS circuits needs to improve quality towards ppb test escape levels as well as decrease the test development time to reduce the IC lead time. A defect-oriented solution can improve quality by focusing on structural tests that can detect defects more efficiently than traditional functional tests, while test reuse can decrease test development time on ICs built with reusable IP blocks. A defect-oriented built-in self-test (BIST) approach integrates both solutions. This paper proposes a test development methodology for analog IP blocks based on such defect-oriented BIST framework. The methodology allows for achieving the target defect coverage at the lowest possible cost. Co-designing the IP with the DfT structures allows accounting for any non-idealities that the DfT may add to the IP. Test structures cost is limited by using low-cost signal generation and a new output response analyzer (ORA). The proposed methodology is demonstrated on two case studies. The results show that coverages higher than 90% are possible using a simple digital pulse signal and an ORA with only 4 bits of accuracy, while coverages higher than 95% are possible with 6 bits, offering a good trade-off between coverage and cost.","PeriodicalId":211522,"journal":{"name":"2023 IEEE European Test Symposium (ETS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128393572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 IEEE European Test Symposium (ETS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1