首页 > 最新文献

ACM Transactions on Design Automation of Electronic Systems最新文献

英文 中文
Systemization of Knowledge: Robust Deep Learning using Hardware-software co-design in Centralized and Federated Settings 知识的系统化:在集中和联邦设置中使用硬件软件协同设计的鲁棒深度学习
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-23 DOI: 10.1145/3616868
Ruisi Zhang, Shehzeen Samarah Hussain, Huili Chen, Mojan Javaheripi, F. Koushanfar
Deep learning (DL) models are enabling a significant paradigm shift in a diverse range of fields, including natural language processing, computer vision, as well as the design and automation of complex integrated circuits. While the deep models – and optimizations-based on them, e.g., Deep Reinforcement Learning (RL) – demonstrate a superior performance and a great capability for automated representation learning, earlier works have revealed the vulnerability of DLs to various attacks. The vulnerabilities include adversarial samples, model poisoning, and fault injection attacks. On the one hand, these security threats could divert the behavior of the DL model and lead to incorrect decisions in critical tasks. On the other hand, the susceptibility of DLs to potential attacks might thwart trustworthy technology transfer as well as reliable DL deployment. In this work, we investigate the existing defense techniques to protect DLs against the above-mentioned security threats. Particularly, we review end-to-end defense schemes for robust deep learning in both centralized and federated learning settings. Our comprehensive taxonomy and horizontal comparisons reveal an important fact that defense strategies developed using DL/software/hardware co-design outperform the DL/software-only counterparts and show how they can achieve very efficient and latency-optimized defenses for real-world applications. We believe our systemization of knowledge sheds light on the promising performance of hardware-software co-design of DL security methodologies and can guide the development of future defenses.
深度学习(DL)模型正在实现各种领域的重大范式转变,包括自然语言处理、计算机视觉以及复杂集成电路的设计和自动化。虽然深度模型以及基于它们的优化,例如深度强化学习(RL),在自动表示学习方面表现出了卓越的性能和强大的能力,但早期的工作已经揭示了DL对各种攻击的脆弱性。这些漏洞包括对抗性样本、模型中毒和故障注入攻击。一方面,这些安全威胁可能会转移DL模型的行为,并导致关键任务中的错误决策。另一方面,DL对潜在攻击的易感性可能会阻碍可靠的技术转让以及可靠的DL部署。在这项工作中,我们调查了现有的防御技术,以保护DL免受上述安全威胁。特别是,我们回顾了在集中式和联合学习环境中用于稳健深度学习的端到端防御方案。我们的全面分类和水平比较揭示了一个重要事实,即使用DL/软件/硬件协同设计开发的防御策略优于仅使用DL/硬件的防御策略,并展示了它们如何为现实世界的应用程序实现非常高效和延迟优化的防御。我们相信,我们的知识系统化揭示了DL安全方法的软硬件协同设计的良好性能,并可以指导未来防御的发展。
{"title":"Systemization of Knowledge: Robust Deep Learning using Hardware-software co-design in Centralized and Federated Settings","authors":"Ruisi Zhang, Shehzeen Samarah Hussain, Huili Chen, Mojan Javaheripi, F. Koushanfar","doi":"10.1145/3616868","DOIUrl":"https://doi.org/10.1145/3616868","url":null,"abstract":"Deep learning (DL) models are enabling a significant paradigm shift in a diverse range of fields, including natural language processing, computer vision, as well as the design and automation of complex integrated circuits. While the deep models – and optimizations-based on them, e.g., Deep Reinforcement Learning (RL) – demonstrate a superior performance and a great capability for automated representation learning, earlier works have revealed the vulnerability of DLs to various attacks. The vulnerabilities include adversarial samples, model poisoning, and fault injection attacks. On the one hand, these security threats could divert the behavior of the DL model and lead to incorrect decisions in critical tasks. On the other hand, the susceptibility of DLs to potential attacks might thwart trustworthy technology transfer as well as reliable DL deployment. In this work, we investigate the existing defense techniques to protect DLs against the above-mentioned security threats. Particularly, we review end-to-end defense schemes for robust deep learning in both centralized and federated learning settings. Our comprehensive taxonomy and horizontal comparisons reveal an important fact that defense strategies developed using DL/software/hardware co-design outperform the DL/software-only counterparts and show how they can achieve very efficient and latency-optimized defenses for real-world applications. We believe our systemization of knowledge sheds light on the promising performance of hardware-software co-design of DL security methodologies and can guide the development of future defenses.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43676970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TMDS: A Temperature-aware Makespan Minimizing DAG Scheduler for Heterogeneous Distributed Systems 异构分布式系统的温度感知最大时间跨度最小化DAG调度器
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-19 DOI: 10.1145/3616869
Debabrata Senapati, Kousik Rajesh, C. Karfa, A. Sarkar
To meet application-specific performance demands, recent embedded platforms often involve the use of intricate micro-architectural designs and very small feature sizes leading to complex chips with multi-million gates. Such ultra-high gate densities often make these chips susceptible to inappropriate surges in core temperatures. Temperature surges above a specific threshold may throttle processor performance, enhance cooling costs and reduce processor life expectancy. This work proposes a generic temperature management strategy which can be easily employed to adapt existing state-of-the-art task graph schedulers so that schedules generated by them never violate stipulated thermal bounds. The overall temperature-aware task graph scheduling problem has first been formally modeled as a constraint optimization formulation whose solution is shown to be prohibitively expensive in terms of computational overheads. Based on insights obtained through the formal model, a new fast and efficient heuristic algorithm called TMDS, has been designed. Experimental evaluation over diverse test case scenarios shows that TMDS is able to deliver lower schedule lengths compared to the temperature-aware versions of four prominent makespan minimizing algorithms, namely HEFT, PEFT, PPTS, PSLS. Additionally, a case study with an adaptive cruise controller in automotive systems has been included to exhibit the applicability of TMDS in real-world settings.
为了满足特定应用程序的性能需求,最近的嵌入式平台通常需要使用复杂的微架构设计和非常小的功能尺寸,从而产生具有数百万个门的复杂芯片。这种超高的栅极密度通常使这些芯片容易受到核心温度不适当波动的影响。超过特定阈值的温度波动可能会限制处理器性能,提高冷却成本并缩短处理器预期寿命。这项工作提出了一种通用的温度管理策略,该策略可以很容易地用于调整现有的最先进的任务图调度器,使它们生成的调度永远不会违反规定的热边界。总体温度感知任务图调度问题首先被正式建模为约束优化公式,其解决方案在计算开销方面非常昂贵。基于通过形式模型获得的见解,设计了一种新的快速高效的启发式算法TMDS。对不同测试用例场景的实验评估表明,与四种突出的完工时间最小化算法(即HEFT、PEFT、PPTS和PSLS)的温度感知版本相比,TMDS能够提供更低的时间表长度。此外,还包括了一个汽车系统中自适应巡航控制器的案例研究,以展示TMDS在现实世界中的适用性。
{"title":"TMDS: A Temperature-aware Makespan Minimizing DAG Scheduler for Heterogeneous Distributed Systems","authors":"Debabrata Senapati, Kousik Rajesh, C. Karfa, A. Sarkar","doi":"10.1145/3616869","DOIUrl":"https://doi.org/10.1145/3616869","url":null,"abstract":"To meet application-specific performance demands, recent embedded platforms often involve the use of intricate micro-architectural designs and very small feature sizes leading to complex chips with multi-million gates. Such ultra-high gate densities often make these chips susceptible to inappropriate surges in core temperatures. Temperature surges above a specific threshold may throttle processor performance, enhance cooling costs and reduce processor life expectancy. This work proposes a generic temperature management strategy which can be easily employed to adapt existing state-of-the-art task graph schedulers so that schedules generated by them never violate stipulated thermal bounds. The overall temperature-aware task graph scheduling problem has first been formally modeled as a constraint optimization formulation whose solution is shown to be prohibitively expensive in terms of computational overheads. Based on insights obtained through the formal model, a new fast and efficient heuristic algorithm called TMDS, has been designed. Experimental evaluation over diverse test case scenarios shows that TMDS is able to deliver lower schedule lengths compared to the temperature-aware versions of four prominent makespan minimizing algorithms, namely HEFT, PEFT, PPTS, PSLS. Additionally, a case study with an adaptive cruise controller in automotive systems has been included to exhibit the applicability of TMDS in real-world settings.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47015221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Performance Masking Design Approach for Saber against High-order Side-channel Attack 一种针对高阶侧信道攻击的Saber高性能掩蔽设计方法
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-03 DOI: 10.1145/3611670
Yajing Chang, Yingjian Yan, Chunsheng Zhu, Yanjiang Liu
Post-quantum cryptography (PQC) has become the most promising cryptographic scheme against the threat of quantum computing to conventional public-key cryptographic schemes. Saber, as the finalist in the third round of the PQC standardization procedure, presents an appealing option for embedded systems due to its high encryption efficiency and accessibility. However, side-channel attack (SCA) can easily reveal confidential information by analyzing the physical manifestations, and several works demonstrate that Saber is vulnerable to SCAs. In this work, a ciphertext comparison method for masking design based on bitslicing technique and zerotest is proposed, which balances the trade-off between the performance and security of comparing two arrays. The mathematical description of the proposed ciphertext comparison method is provided, and its correctness and security metrics are analyzed under the concept of PINI. Moreover, a high-order masking approach based on the state-of-the-art, including the hash functions, centered binomial sampling, masking conversions, and proposed ciphertext comparison is presented, using the bitslicing technique to improve throughput. As a proof of concept, the proposed implementation of Saber is on the ARM Cortex-M4. The performance results show that the run-time overhead factor of 1st-, 2nd-, and 3rd-order masking is 3.01x, 5.58x, and 8.68x, and the dynamic memory used for 1st-, 2nd-, and 3rd-order masking is 17.4kB, 24.0kB, and 30.2kB, respectively. The SCA-resilience evaluation results illustrate that the first-order Test Vectors Leakage Assessment (TVLA) result fails to reveal the secret key with 100,000 traces.
后量子密码学(PQC)已成为最有前途的密码方案,以对抗量子计算对传统公钥密码方案的威胁。Saber作为第三轮PQC标准化程序的决赛选手,由于其高加密效率和可访问性,为嵌入式系统提供了一个有吸引力的选择。然而,侧信道攻击(SCA)可以通过分析物理表现很容易地揭示机密信息,一些工作表明Saber容易受到SCA的攻击。在这项工作中,提出了一种基于比特切片技术和零测试的屏蔽设计密文比较方法,该方法平衡了比较两个阵列的性能和安全性之间的权衡。给出了所提出的密文比较方法的数学描述,并在PINI的概念下分析了其正确性和安全度量。此外,还提出了一种基于现有技术的高阶掩蔽方法,包括哈希函数、中心二项采样、掩蔽转换和所提出的密文比较,使用比特切片技术来提高吞吐量。作为概念验证,Saber的拟议实现是在ARM Cortex-M4上实现的。性能结果表明,一阶、二阶和三阶掩码的运行时开销因子分别为3.01x、5.58x和8.68x,用于一阶、两阶和三级掩码的动态内存分别为17.4kB、24.0kB和30.2kB。SCA弹性评估结果表明,一阶测试矢量泄漏评估(TVLA)结果未能揭示100000条记录道的密钥。
{"title":"A High-Performance Masking Design Approach for Saber against High-order Side-channel Attack","authors":"Yajing Chang, Yingjian Yan, Chunsheng Zhu, Yanjiang Liu","doi":"10.1145/3611670","DOIUrl":"https://doi.org/10.1145/3611670","url":null,"abstract":"Post-quantum cryptography (PQC) has become the most promising cryptographic scheme against the threat of quantum computing to conventional public-key cryptographic schemes. Saber, as the finalist in the third round of the PQC standardization procedure, presents an appealing option for embedded systems due to its high encryption efficiency and accessibility. However, side-channel attack (SCA) can easily reveal confidential information by analyzing the physical manifestations, and several works demonstrate that Saber is vulnerable to SCAs. In this work, a ciphertext comparison method for masking design based on bitslicing technique and zerotest is proposed, which balances the trade-off between the performance and security of comparing two arrays. The mathematical description of the proposed ciphertext comparison method is provided, and its correctness and security metrics are analyzed under the concept of PINI. Moreover, a high-order masking approach based on the state-of-the-art, including the hash functions, centered binomial sampling, masking conversions, and proposed ciphertext comparison is presented, using the bitslicing technique to improve throughput. As a proof of concept, the proposed implementation of Saber is on the ARM Cortex-M4. The performance results show that the run-time overhead factor of 1st-, 2nd-, and 3rd-order masking is 3.01x, 5.58x, and 8.68x, and the dynamic memory used for 1st-, 2nd-, and 3rd-order masking is 17.4kB, 24.0kB, and 30.2kB, respectively. The SCA-resilience evaluation results illustrate that the first-order Test Vectors Leakage Assessment (TVLA) result fails to reveal the secret key with 100,000 traces.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46682768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced PATRON: Fault Injection and Power-aware FSM Encoding Through Linear Programming 增强型PATRON:通过线性编程实现故障注入和功率感知FSM编码
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-03 DOI: 10.1145/3611669
Muhtadi Choudhury, Minyan Gao, Avinash L. Varna, Elad Peer, Domenic Forte
Since finite state machines (FSMs) regulate the control flow in circuits, a computing system’s security might be breached by attacking the FSM. Physical attacks are especially worrisome because they can bypass software countermeasures. For example, an attacker can gain illegal access to the sensitive states of an FSM through fault injection, leading to privilege escalation and/or information leakage. Laser fault injection (LFI) provides one of the most effective attack vectors by enabling adversaries to precisely overturn single flip-flops states. Although conventional error correction/detection methodologies have been employed to improve FSM resiliency, their substantial overhead makes them unattractive to circuit designers. In our prior work, a novel decision diagram-based FSM encoding scheme called PATRON was proposed to resist LFI according to attack parameters, e.g., number of simultaneous faults. Although PATRON bested traditional encodings keeping overhead minimum, it provided numerous candidates for FSM designs requiring exhaustive and manual effort to select one optimum candidate. In this article, we automatically select an optimum candidate by enhancing PATRON using linear programming (LP). First, we exploit the proportionality between dynamic power dissipation and switching activity in digital CMOS circuits. Thus, our LP objective minimizes the number of FSM bit switches per transition, for comparatively lower switching activity and hence total power consumption. Second, additional LP constraints along with incorporating the original PATRON rules, systematically enforce bidirectionality to at least two state elements per FSM transition. This bestows protection against different types of fault injection, which we capture with a new unidirectional metric. Enhanced PATRON (EP) achieves superior security at lower power consumption in average compared to PATRON, error-coding, and traditional FSM encoding on five popular benchmarks.
由于有限状态机(FSM)调节电路中的控制流,因此攻击FSM可能会破坏计算系统的安全性。物理攻击尤其令人担忧,因为它们可以绕过软件对抗措施。例如,攻击者可以通过故障注入非法访问FSM的敏感状态,从而导致权限提升和/或信息泄露。激光故障注入(LFI)通过使对手能够精确地推翻单触发器状态,提供了最有效的攻击向量之一。尽管传统的纠错/检测方法已被用来提高FSM的弹性,但它们的大量开销使它们对电路设计者没有吸引力。在我们之前的工作中,提出了一种新的基于决策图的FSM编码方案PATRON,用于根据攻击参数(例如同时发生的故障数量)来抵御LFI。尽管PATRON在保持开销最小的情况下击败了传统编码,但它为FSM设计提供了许多候选者,这些FSM设计需要耗费大量人力来选择一个最佳候选者。在本文中,我们通过使用线性规划(LP)增强PATRON来自动选择最佳候选者。首先,我们利用数字CMOS电路中动态功耗和开关活动之间的比例关系。因此,我们的LP目标是最大限度地减少每次转换的FSM位开关的数量,以实现相对较低的开关活动,从而实现总功耗。其次,附加的LP约束以及结合原始PATRON规则,系统地对每个FSM转换至少两个状态元素强制双向性。这提供了针对不同类型的故障注入的保护,我们用一个新的单向度量来捕捉这些故障注入。在五个流行的基准测试上,与PATRON、错误编码和传统FSM编码相比,增强型PATRON(EP)在平均功耗较低的情况下实现了卓越的安全性。
{"title":"Enhanced PATRON: Fault Injection and Power-aware FSM Encoding Through Linear Programming","authors":"Muhtadi Choudhury, Minyan Gao, Avinash L. Varna, Elad Peer, Domenic Forte","doi":"10.1145/3611669","DOIUrl":"https://doi.org/10.1145/3611669","url":null,"abstract":"Since finite state machines (FSMs) regulate the control flow in circuits, a computing system’s security might be breached by attacking the FSM. Physical attacks are especially worrisome because they can bypass software countermeasures. For example, an attacker can gain illegal access to the sensitive states of an FSM through fault injection, leading to privilege escalation and/or information leakage. Laser fault injection (LFI) provides one of the most effective attack vectors by enabling adversaries to precisely overturn single flip-flops states. Although conventional error correction/detection methodologies have been employed to improve FSM resiliency, their substantial overhead makes them unattractive to circuit designers. In our prior work, a novel decision diagram-based FSM encoding scheme called PATRON was proposed to resist LFI according to attack parameters, e.g., number of simultaneous faults. Although PATRON bested traditional encodings keeping overhead minimum, it provided numerous candidates for FSM designs requiring exhaustive and manual effort to select one optimum candidate. In this article, we automatically select an optimum candidate by enhancing PATRON using linear programming (LP). First, we exploit the proportionality between dynamic power dissipation and switching activity in digital CMOS circuits. Thus, our LP objective minimizes the number of FSM bit switches per transition, for comparatively lower switching activity and hence total power consumption. Second, additional LP constraints along with incorporating the original PATRON rules, systematically enforce bidirectionality to at least two state elements per FSM transition. This bestows protection against different types of fault injection, which we capture with a new unidirectional metric. Enhanced PATRON (EP) achieves superior security at lower power consumption in average compared to PATRON, error-coding, and traditional FSM encoding on five popular benchmarks.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45718877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modified Decoupled Sense Amplifier with Improved Sensing Speed for Low-Voltage Differential SRAM 提高低压差分SRAM检测速度的改进解耦感测放大器
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-02 DOI: 10.1145/3611672
Ayush, P. Mittal, Rajesh Rohilla
A modified decoupled sense amplifier (MDSA) and modified decoupled sense amplifier with NMOS foot-switch is proposed for improved sensing in differential SRAM for low voltage operation at 22 nm technology node. The MDSA and MDSANF both offer notable improvements to read delay over conventional voltage and current sense amplifiers. At an operating voltage of 0.8 V, the MDSA exhibited a reduced delay of 28.6%, 41.79%, 37.74%, 30.94% compared to modified clamped sense amplifier (MCSA), double tail sense amplifier(DTSA), modified hybrid sense amplifier (MHSA) and conventional latch-type sense amplifier (LSA) respectively. Similarly, the MDSANF demonstrated a delay reduction of 26.13%, 39.78%, 35.58%, 28.55% over MCSA, DTSA, MHSA and LSA respectively. To validate the performance, the MDSA and MDSANF are evaluated using the variation in delay and power consumption across various supply voltages, process corners, input differential bit line voltage (ΔVBL), bit line capacitance CBL) and the sizing of decoupling transistors. Monte Carlo simulations were conducted to analyse the impact of voltage threshold variations on transistor mismatch which leads to an increased occurrence of read failures and a decline in SRAM yield. The performance analysis of various voltage and current sense amplifiers is presented along with MDSA and MDSANF. Area consideration for selection of sensing scheme is important and as such layout of MDSA and MDSANF was performed conforming to the design rules and estimated area for MDSA is 0.297 μm2 whereas MDSANF occupies 0.5192 μm2.
提出了一种改进的去耦感测放大器(MDSA)和带有NMOS脚开关的改进的去耦合感测放大器,用于改进差分SRAM中在22nm技术节点处的低电压操作的感测。与传统的电压和电流感测放大器相比,MDSA和MDSANF都在读取延迟方面提供了显著的改进。在0.8V的工作电压下,与改进的钳位灵敏放大器(MCSA)、双尾灵敏放大器(DTSA)、改进的混合灵敏放大器(MHSA)和传统锁存型灵敏放大器(LSA)相比,MDSA分别表现出28.6%、41.79%、37.74%和30.94%的延迟降低。同样,与MCSA、DTSA、MHSA和LSA相比,MDSANF的延迟分别减少了26.13%、39.78%、35.58%和28.55%。为了验证性能,使用不同电源电压、工艺角点、输入差分位线电压(ΔVBL)、位线电容CBL)和去耦晶体管大小之间的延迟和功耗变化来评估MDSA和MDSANF。进行蒙特卡罗模拟以分析电压阈值变化对晶体管失配的影响,晶体管失配导致读取故障的发生增加和SRAM产量的下降。对各种电压和电流传感放大器以及MDSA和MDSANF的性能进行了分析。选择传感方案时的面积考虑很重要,因此MDSA和MDSANF的布局符合设计规则,MDSA的估计面积为0.297μm2,而MDSANF占0.5192μm2。
{"title":"Modified Decoupled Sense Amplifier with Improved Sensing Speed for Low-Voltage Differential SRAM","authors":"Ayush, P. Mittal, Rajesh Rohilla","doi":"10.1145/3611672","DOIUrl":"https://doi.org/10.1145/3611672","url":null,"abstract":"A modified decoupled sense amplifier (MDSA) and modified decoupled sense amplifier with NMOS foot-switch is proposed for improved sensing in differential SRAM for low voltage operation at 22 nm technology node. The MDSA and MDSANF both offer notable improvements to read delay over conventional voltage and current sense amplifiers. At an operating voltage of 0.8 V, the MDSA exhibited a reduced delay of 28.6%, 41.79%, 37.74%, 30.94% compared to modified clamped sense amplifier (MCSA), double tail sense amplifier(DTSA), modified hybrid sense amplifier (MHSA) and conventional latch-type sense amplifier (LSA) respectively. Similarly, the MDSANF demonstrated a delay reduction of 26.13%, 39.78%, 35.58%, 28.55% over MCSA, DTSA, MHSA and LSA respectively. To validate the performance, the MDSA and MDSANF are evaluated using the variation in delay and power consumption across various supply voltages, process corners, input differential bit line voltage (ΔVBL), bit line capacitance CBL) and the sizing of decoupling transistors. Monte Carlo simulations were conducted to analyse the impact of voltage threshold variations on transistor mismatch which leads to an increased occurrence of read failures and a decline in SRAM yield. The performance analysis of various voltage and current sense amplifiers is presented along with MDSA and MDSANF. Area consideration for selection of sensing scheme is important and as such layout of MDSA and MDSANF was performed conforming to the design rules and estimated area for MDSA is 0.297 μm2 whereas MDSANF occupies 0.5192 μm2.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43297669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QuanDA: GPU Accelerated Quantitative Deep Neural Network Analysis 广达:GPU加速定量深度神经网络分析
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-08-01 DOI: 10.1145/3611671
Mahum Naseer, Osman Hasan, Muhammad Shafique
Over the past years, numerous studies demonstrated the vulnerability of deep neural networks (DNNs) to make correct classifications in the presence of small noise. This motivated the formal analysis of DNNs to ensure they delineate acceptable behavior. However, in case the DNN’s behavior is unacceptable for the desired application, these qualitative approaches are ill-equipped to determine the precise degree to which the DNN behaves unacceptably. Towards this, we propose a novel quantitative DNN analysis framework, QuanDA, which does not only check if the DNN delineates certain behavior, but also provides the estimated probability of the DNN to delineate this particular behavior. Unlike the (few) available quantitative DNN analysis frameworks, QuanDA does not use any implicit assumptions on the probability distribution of the hidden nodes, which enables the framework to propagate close to real probability distributions of the hidden node values to each proceeding DNN layer. Furthermore, our framework leverages CUDA to parallelize the analysis, enabling high-speed GPU implementation for fast analysis. The applicability of the framework is demonstrated using the ACAS Xu benchmark, to provide reachability probability estimates for all network nodes. Moreover, this paper also provides potential applications of QuanDA for the analysis of the DNN safety properties.
在过去的几年里,大量的研究表明,深度神经网络(dnn)在存在小噪声的情况下容易做出正确的分类。这促使对dnn进行形式化分析,以确保它们描述可接受的行为。然而,如果DNN的行为对于期望的应用来说是不可接受的,那么这些定性方法就无法确定DNN行为不可接受的精确程度。为此,我们提出了一种新的定量DNN分析框架,QuanDA,它不仅检查DNN是否描述了某种行为,而且提供了DNN描述这种特定行为的估计概率。与(少数)可用的定量DNN分析框架不同,QuanDA没有对隐藏节点的概率分布使用任何隐式假设,这使得框架能够将隐藏节点值的真实概率分布传播到每个后续DNN层。此外,我们的框架利用CUDA来并行化分析,使高速GPU实现快速分析。使用ACAS Xu基准验证了该框架的适用性,为所有网络节点提供了可达性概率估计。此外,本文还提供了QuanDA在深度神经网络安全特性分析中的潜在应用。
{"title":"QuanDA: GPU Accelerated Quantitative Deep Neural Network Analysis","authors":"Mahum Naseer, Osman Hasan, Muhammad Shafique","doi":"10.1145/3611671","DOIUrl":"https://doi.org/10.1145/3611671","url":null,"abstract":"Over the past years, numerous studies demonstrated the vulnerability of deep neural networks (DNNs) to make correct classifications in the presence of small noise. This motivated the formal analysis of DNNs to ensure they delineate acceptable behavior. However, in case the DNN’s behavior is unacceptable for the desired application, these qualitative approaches are ill-equipped to determine the precise degree to which the DNN behaves unacceptably. Towards this, we propose a novel quantitative DNN analysis framework, QuanDA, which does not only check if the DNN delineates certain behavior, but also provides the estimated probability of the DNN to delineate this particular behavior. Unlike the (few) available quantitative DNN analysis frameworks, QuanDA does not use any implicit assumptions on the probability distribution of the hidden nodes, which enables the framework to propagate close to real probability distributions of the hidden node values to each proceeding DNN layer. Furthermore, our framework leverages CUDA to parallelize the analysis, enabling high-speed GPU implementation for fast analysis. The applicability of the framework is demonstrated using the ACAS Xu benchmark, to provide reachability probability estimates for all network nodes. Moreover, this paper also provides potential applications of QuanDA for the analysis of the DNN safety properties.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42124580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation 利用飞行重量生成减轻CNN引擎中的记忆壁效应
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-07-25 DOI: 10.1145/3611673
Stylianos I. Venieris, J. Fernández-Marqués, Nicholas D. Lane
The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference, significant research effort has been invested in the design of FPGA-based CNN accelerators. In this context, single computation engines constitute a popular design approach that enables the deployment of diverse models without the overhead of fabric reconfiguration. Nevertheless, this flexibility often comes with significantly degraded performance on memory-bound layers and resource underutilisation due to the suboptimal mapping of certain layers on the engine’s fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. This paper presents unzipFPGA, a novel CNN inference system that counteracts the limitations of existing CNN engines. The proposed framework comprises a novel CNN hardware architecture that introduces a weights generator module that enables the on-chip on-the-fly generation of weights, alleviating the negative impact of limited bandwidth on memory-bound layers. We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair, leading to an improved accuracy-performance balance. Finally, we introduce an input selective processing element (PE) design that balances the load between PEs in suboptimally mapped layers. Quantitative evaluation shows that the proposed framework yields hardware designs that achieve an average of 2.57 × performance efficiency gain over highly optimised GPU designs for the same power constraints and up to 3.94 × higher performance density over a diverse range of state-of-the-art FPGA-based CNN accelerators.
卷积神经网络(CNNs)在广泛的人工智能任务中具有前所未有的准确性,导致其在移动和嵌入式环境中得到广泛部署。为了追求高性能和节能的推理,在基于FPGA的CNN加速器的设计上投入了大量的研究工作。在这种情况下,单个计算引擎构成了一种流行的设计方法,它能够在没有结构重新配置开销的情况下部署不同的模型。然而,这种灵活性往往伴随着内存绑定层的性能显著下降,以及由于引擎固定配置上某些层的次优映射而导致的资源利用不足。在这项工作中,我们研究了CNN引擎设计对一类模型的影响,该模型引入了预卷积阶段来在运行时解压缩权重。我们将这些方法称为动态方法。本文提出了一种新的CNN推理系统unzip FPGA,它克服了现有CNN引擎的局限性。所提出的框架包括一种新颖的CNN硬件架构,该架构引入了一个权重生成器模块,该模块能够在芯片上动态生成权重,减轻有限带宽对内存绑定层的负面影响。我们通过一种自动化的硬件感知方法进一步增强了unzip FPGA,该方法根据目标CNN设备对定制权重生成机制,从而提高了精度和性能的平衡。最后,我们介绍了一种输入选择性处理元件(PE)设计,该设计在次优映射层中平衡PE之间的负载。定量评估表明,在相同的功率约束下,所提出的框架产生的硬件设计比高度优化的GPU设计平均实现2.57倍的性能效率增益,在各种最先进的基于FPGA的CNN加速器上实现高达3.94倍的性能密度。
{"title":"Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation","authors":"Stylianos I. Venieris, J. Fernández-Marqués, Nicholas D. Lane","doi":"10.1145/3611673","DOIUrl":"https://doi.org/10.1145/3611673","url":null,"abstract":"The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference, significant research effort has been invested in the design of FPGA-based CNN accelerators. In this context, single computation engines constitute a popular design approach that enables the deployment of diverse models without the overhead of fabric reconfiguration. Nevertheless, this flexibility often comes with significantly degraded performance on memory-bound layers and resource underutilisation due to the suboptimal mapping of certain layers on the engine’s fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. This paper presents unzipFPGA, a novel CNN inference system that counteracts the limitations of existing CNN engines. The proposed framework comprises a novel CNN hardware architecture that introduces a weights generator module that enables the on-chip on-the-fly generation of weights, alleviating the negative impact of limited bandwidth on memory-bound layers. We further enhance unzipFPGA with an automated hardware-aware methodology that tailors the weights generation mechanism to the target CNN-device pair, leading to an improved accuracy-performance balance. Finally, we introduce an input selective processing element (PE) design that balances the load between PEs in suboptimally mapped layers. Quantitative evaluation shows that the proposed framework yields hardware designs that achieve an average of 2.57 × performance efficiency gain over highly optimised GPU designs for the same power constraints and up to 3.94 × higher performance density over a diverse range of state-of-the-art FPGA-based CNN accelerators.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42316070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A General Layout Pattern Clustering Using Geometric Matching Based Clip Relocation and Lower-Bound Aided Optimization 基于几何匹配的裁剪重定位和下界辅助优化的通用布局模式聚类
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-07-24 DOI: 10.1145/3610293
Xu He, Yao Wang, Zhiyong Fu, Yipei Wang, Yang Guo
With the continuous shrinking of feature size, detection of lithography hotspots has been raised as one of the major concerns in Design-for-Manufacturability (DFM) of semiconductor processing. Hotspot detection, along with other DFM measures, trades off turn-around time for the yield of IC manufacturing, thus a simplified but wide-range-covered pattern definition is a key essential to the problem. Layout pattern clustering methods, which group geometrically similar layout clips into clusters, have been vastly proposed to identify layout patterns efficiently. To minimize the clustering number for subsequent DFM processing, in this paper, we propose a geometric-matching-based clip relocation technique to increase the opportunity of pattern clustering. Particularly, we formulate the lower-bound of the clustering number as a maximum-clique problem, and we have also proved that the clustering problem can be solved by the result of the maximum-clique very efficiently. Compared with the experimental results of the state-of-the-art approaches on ICCAD 2016 Contest benchmarks, the proposed method can achieve the optimal solutions for all benchmarks with very competitive run-time. To evaluate the scalability, the ICCAD 2016 Contest benchmarks are extended and evaluated. And experimental results on the extended benchmarks demonstrate that our method can reduce the cluster number by 16.59% on average, while the run-time is 74.11% faster on large-scale benchmarks compared with previous works.
随着特征尺寸的不断缩小,光刻热点的检测已成为半导体工艺可制造性设计的主要问题之一。热点检测与其他DFM措施一起,以周转时间换取IC制造的产量,因此简化但覆盖范围广的模式定义是解决该问题的关键。布局模式聚类方法将几何相似的布局片段分组为聚类,已被广泛提出以有效地识别布局模式。为了最大限度地减少后续DFM处理的聚类数量,本文提出了一种基于几何匹配的片段重定位技术,以增加模式聚类的机会。特别地,我们将聚类数的下界公式化为最大团问题,并且我们还证明了通过最大团的结果可以非常有效地解决聚类问题。与最先进方法在ICCAD 2016竞赛基准测试上的实验结果相比,所提出的方法可以在非常有竞争力的运行时间内实现所有基准测试的最优解。为了评估可扩展性,对ICCAD 2016竞赛基准进行了扩展和评估。在扩展基准测试上的实验结果表明,与以前的工作相比,我们的方法可以平均减少16.59%的集群数量,而在大规模基准测试上运行时间快74.11%。
{"title":"A General Layout Pattern Clustering Using Geometric Matching Based Clip Relocation and Lower-Bound Aided Optimization","authors":"Xu He, Yao Wang, Zhiyong Fu, Yipei Wang, Yang Guo","doi":"10.1145/3610293","DOIUrl":"https://doi.org/10.1145/3610293","url":null,"abstract":"With the continuous shrinking of feature size, detection of lithography hotspots has been raised as one of the major concerns in Design-for-Manufacturability (DFM) of semiconductor processing. Hotspot detection, along with other DFM measures, trades off turn-around time for the yield of IC manufacturing, thus a simplified but wide-range-covered pattern definition is a key essential to the problem. Layout pattern clustering methods, which group geometrically similar layout clips into clusters, have been vastly proposed to identify layout patterns efficiently. To minimize the clustering number for subsequent DFM processing, in this paper, we propose a geometric-matching-based clip relocation technique to increase the opportunity of pattern clustering. Particularly, we formulate the lower-bound of the clustering number as a maximum-clique problem, and we have also proved that the clustering problem can be solved by the result of the maximum-clique very efficiently. Compared with the experimental results of the state-of-the-art approaches on ICCAD 2016 Contest benchmarks, the proposed method can achieve the optimal solutions for all benchmarks with very competitive run-time. To evaluate the scalability, the ICCAD 2016 Contest benchmarks are extended and evaluated. And experimental results on the extended benchmarks demonstrate that our method can reduce the cluster number by 16.59% on average, while the run-time is 74.11% faster on large-scale benchmarks compared with previous works.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42946077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SoC Protocol Implementation Verification Using Instruction-Level Abstraction (ILA) Specifications 使用指令级抽象(ILA)规范的SoC协议实现验证
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-07-24 DOI: 10.1145/3610292
Huaixi Lu, Yue Xing, Aarti Gupta, S. Malik
In modern systems-on-chips (SoCs) several hardware protocols are used for communication and interaction among different modules. These protocols are complex and need to be implemented correctly for correct operation of the SoC. Therefore, protocol verification has received significant attention. However, this verification is often limited to checking high-level properties on a protocol specification or an implementation. Verifying these properties directly on an implementation faces scalability challenges due to its size and design complexity. Further, even after some high-level properties are verified, there is no guarantee that an implementation fully complies with a given specification, even if the same properties have also been checked on the specification. We address these challenges and gaps by adding a layer of component specifications, one for each component in the protocol implementation, and specifying and verifying the interactions at the interfaces between each pair of communicating components. We use the recently proposed formal model termed the Instruction-Level-Abstraction (ILA) as a component specification, which includes an interface specification for the interactions in composing different components. The use of ILA models as component specifications allows us to decompose the complete verification task into two sub-tasks – checking that the composition of ILAs is sequentially equivalent to a verified formal protocol specification, and checking that the protocol implementation is a refinement of the ILA composition. This check requires that each component implementation is a refinement of its ILA specification, and includes interface checks guaranteeing that components interact with each other as specified. We have applied the proposed ILA-based methodology for protocol verification to several third-party design case studies. These include an AXI on-chip communication protocol, an off-chip communication protocol, and a cache coherence protocol. For each system, we successfully detected bugs in the implementation, and show that the full formal verification can be completed in reasonable time and effort.
在现代片上系统(SoC)中,几种硬件协议用于不同模块之间的通信和交互。这些协议很复杂,需要正确实现才能使SoC正确运行。因此,协议验证受到了极大的关注。然而,这种验证通常仅限于检查协议规范或实现的高级属性。由于其规模和设计复杂性,直接在实现上验证这些属性面临可扩展性挑战。此外,即使在验证了一些高级属性之后,也不能保证实现完全符合给定的规范,即使在规范上也检查了相同的属性。我们通过添加一层组件规范来解决这些挑战和差距,协议实现中的每个组件都有一层规范,并指定和验证每对通信组件之间接口的交互。我们使用最近提出的称为指令级抽象(ILA)的形式化模型作为组件规范,其中包括用于组成不同组件的交互的接口规范。使用ILA模型作为组件规范使我们能够将完整的验证任务分解为两个子任务——检查ILA的组成是否依次等效于已验证的正式协议规范,以及检查协议实现是否是ILA组成的细化。此检查要求每个组件实现都是其ILA规范的细化,并包括接口检查,以确保组件按照指定的方式相互交互。我们已经将所提出的基于ILA的协议验证方法应用于几个第三方设计案例研究。其中包括AXI片上通信协议、片外通信协议和高速缓存一致性协议。对于每个系统,我们都成功地检测到了实现中的错误,并表明可以在合理的时间和精力内完成完整的正式验证。
{"title":"SoC Protocol Implementation Verification Using Instruction-Level Abstraction (ILA) Specifications","authors":"Huaixi Lu, Yue Xing, Aarti Gupta, S. Malik","doi":"10.1145/3610292","DOIUrl":"https://doi.org/10.1145/3610292","url":null,"abstract":"In modern systems-on-chips (SoCs) several hardware protocols are used for communication and interaction among different modules. These protocols are complex and need to be implemented correctly for correct operation of the SoC. Therefore, protocol verification has received significant attention. However, this verification is often limited to checking high-level properties on a protocol specification or an implementation. Verifying these properties directly on an implementation faces scalability challenges due to its size and design complexity. Further, even after some high-level properties are verified, there is no guarantee that an implementation fully complies with a given specification, even if the same properties have also been checked on the specification. We address these challenges and gaps by adding a layer of component specifications, one for each component in the protocol implementation, and specifying and verifying the interactions at the interfaces between each pair of communicating components. We use the recently proposed formal model termed the Instruction-Level-Abstraction (ILA) as a component specification, which includes an interface specification for the interactions in composing different components. The use of ILA models as component specifications allows us to decompose the complete verification task into two sub-tasks – checking that the composition of ILAs is sequentially equivalent to a verified formal protocol specification, and checking that the protocol implementation is a refinement of the ILA composition. This check requires that each component implementation is a refinement of its ILA specification, and includes interface checks guaranteeing that components interact with each other as specified. We have applied the proposed ILA-based methodology for protocol verification to several third-party design case studies. These include an AXI on-chip communication protocol, an off-chip communication protocol, and a cache coherence protocol. For each system, we successfully detected bugs in the implementation, and show that the full formal verification can be completed in reasonable time and effort.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49585589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Compact TRNG design for FPGA based on the Metastability of RO-Driven Shift Registers 基于ro驱动移位寄存器亚稳态的FPGA紧凑TRNG设计
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-07-21 DOI: 10.1145/3610295
Qingsong Peng, Jingchang Bian, Zhengfeng Huang, Senling Wang, Aibin Yan
True random number generators (TRNGs) as an important component of security systems have received a lot of attention for their related research. The previous researches have provided a large number of TRNG solutions, however, they still failed to reach an excellent trade-off in various performance metrics. This paper presents a shift-registers metastability-based TRNG, which is implemented by compact reference units and comparison units. By forcing the D flip-flops in the shift-registers into the metastable state, it optimizes the problem that the conventional metastability entropy sources consume excessive hardware resources. And new method of metastable randomness extraction is used to reduce the bias of metastable output. The proposed TRNG is implemented in Xilinx Spartan-6 and Virtex-6 FPGAs, which generate random sequences that pass the NIST SP800-22, NIST SP800-90B tests and show excellent robustness to voltage and temperature variations. This TRNG can consume only 3 slices of the FPGA, but it has a high throughput rate of 25Mbit/s. In comparison with state-of-the-art FPGA-compatible TRNGs, the proposed TRNG achieves the highest figure of merit FOM, which means that the proposed TRNG significantly outperforms previous researches in terms of hardware resources, throughput rate, and operating frequency trade-offs.
真随机数发生器作为安全系统的重要组成部分,其相关研究受到了广泛的关注。之前的研究已经提供了大量的TRNG解决方案,但是,它们仍然没有在各种性能指标上达到一个很好的权衡。本文提出了一种基于移位寄存器亚稳态的TRNG,它由紧凑的参考单元和比较单元实现。通过强制移位寄存器中的D触发器进入亚稳态,优化了传统亚稳态熵源消耗过多硬件资源的问题。并采用亚稳随机提取的新方法来减小亚稳输出的偏置。所提出的TRNG在Xilinx Spartan-6和Virtex-6 fpga中实现,生成的随机序列通过了NIST SP800-22和NIST SP800-90B测试,并对电压和温度变化具有出色的鲁棒性。该TRNG只能占用FPGA的3片,但它具有25Mbit/s的高吞吐率。与最先进的fpga兼容TRNG相比,所提出的TRNG达到了最高的FOM值,这意味着所提出的TRNG在硬件资源、吞吐率和工作频率权衡方面显着优于先前的研究。
{"title":"A Compact TRNG design for FPGA based on the Metastability of RO-Driven Shift Registers","authors":"Qingsong Peng, Jingchang Bian, Zhengfeng Huang, Senling Wang, Aibin Yan","doi":"10.1145/3610295","DOIUrl":"https://doi.org/10.1145/3610295","url":null,"abstract":"True random number generators (TRNGs) as an important component of security systems have received a lot of attention for their related research. The previous researches have provided a large number of TRNG solutions, however, they still failed to reach an excellent trade-off in various performance metrics. This paper presents a shift-registers metastability-based TRNG, which is implemented by compact reference units and comparison units. By forcing the D flip-flops in the shift-registers into the metastable state, it optimizes the problem that the conventional metastability entropy sources consume excessive hardware resources. And new method of metastable randomness extraction is used to reduce the bias of metastable output. The proposed TRNG is implemented in Xilinx Spartan-6 and Virtex-6 FPGAs, which generate random sequences that pass the NIST SP800-22, NIST SP800-90B tests and show excellent robustness to voltage and temperature variations. This TRNG can consume only 3 slices of the FPGA, but it has a high throughput rate of 25Mbit/s. In comparison with state-of-the-art FPGA-compatible TRNGs, the proposed TRNG achieves the highest figure of merit FOM, which means that the proposed TRNG significantly outperforms previous researches in terms of hardware resources, throughput rate, and operating frequency trade-offs.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42235926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Design Automation of Electronic Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1