首页 > 最新文献

2023 24th International Symposium on Quality Electronic Design (ISQED)最新文献

英文 中文
Cache Register Sharing Structure for Channel-level Near-memory Processing in NAND Flash Memory NAND闪存中通道级近存储器处理的缓存寄存器共享结构
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129383
Hyunwoo Kim, Hyundong Lee, Jongbeom Kim, Yunjeong Go, Seungwon Baek, Jae-Seok Song, Junhyeon Kim, Minyoung Jung, Hyodong Kim, Seong-Jae Kim, Taigon Song
A vast number of data used for Artificial intelligence causes bottleneck between the processor and memory. To tackle this issue, a technology that embeds a processing unit in the memory (PIM: Processing-in Memory) has been proposed. However, SRAM/DRAM based PIM have a issue for lack of capacity. Thus, we propose a NAND flash PIM scheme that shares the cache register. Our scheme significantly reduces the read latency and operation time by -22.8% and -43.7%, compared to the conventional memory system. The power-performance-area (PPA) was reduced by 17.2% by shortening the number of cycles. Our NAND PIM specializes in tasks requiring high-performance computing.
用于人工智能的大量数据造成了处理器和内存之间的瓶颈。为了解决这个问题,提出了在内存中嵌入处理单元的技术(PIM: processing -in memory)。然而,基于SRAM/DRAM的PIM存在容量不足的问题。因此,我们提出了一种共享缓存寄存器的NAND闪存PIM方案。与传统的存储系统相比,我们的方案显著降低了读取延迟和操作时间,分别降低了-22.8%和-43.7%。通过缩短循环次数,功率性能面积(PPA)降低了17.2%。我们的NAND PIM专门用于需要高性能计算的任务。
{"title":"Cache Register Sharing Structure for Channel-level Near-memory Processing in NAND Flash Memory","authors":"Hyunwoo Kim, Hyundong Lee, Jongbeom Kim, Yunjeong Go, Seungwon Baek, Jae-Seok Song, Junhyeon Kim, Minyoung Jung, Hyodong Kim, Seong-Jae Kim, Taigon Song","doi":"10.1109/ISQED57927.2023.10129383","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129383","url":null,"abstract":"A vast number of data used for Artificial intelligence causes bottleneck between the processor and memory. To tackle this issue, a technology that embeds a processing unit in the memory (PIM: Processing-in Memory) has been proposed. However, SRAM/DRAM based PIM have a issue for lack of capacity. Thus, we propose a NAND flash PIM scheme that shares the cache register. Our scheme significantly reduces the read latency and operation time by -22.8% and -43.7%, compared to the conventional memory system. The power-performance-area (PPA) was reduced by 17.2% by shortening the number of cycles. Our NAND PIM specializes in tasks requiring high-performance computing.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130973356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A True Random Number Generator for Probabilistic Computing using Stochastic Magnetic Actuated Random Transducer Devices 基于随机磁驱动随机换能器的概率计算真随机数发生器
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129319
Ankit Shukla, L. Heller, Md Golam Morshed, L. Rehm, Avik W. Ghosh, A. Kent, S. Rakheja
Magnetic tunnel junctions (MTJs), which are the fundamental building blocks of spintronic devices, have been used to build true random number generators (TRNGs) with different trade-offs between throughput, power, and area requirements. MTJs with high-barrier magnets (HBMs) have been used to generate random bitstreams with ≲ 200 Mb/s throughput and pJ/bit energy consumption. A high temperature sensitivity, however, adversely affects their performance as a TRNG. Superparamagnetic MTJs employing low-barrier magnets (LBMs) have also been used for TRNG operation. Although LBM-based MTJs can operate at low energy, they suffer from slow dynamics, sensitivity to process variations, and low fabrication yield. In this paper, we model a TRNG based on medium-barrier magnets (MBMs) with perpendicular magnetic anisotropy. The proposed MBM-based TRNG is driven with short voltage pulses to induce ballistic, yet stochastic, magnetization switching. We show that the proposed TRNG can operate at frequencies of about 500 MHz while consuming less than 100 fJ/bit of energy. In the short-pulse ballistic limit, the switching probability of our device shows robustness to variations in temperature and material parameters relative to LBMs and HBMs. Our results suggest that MBM-based MTJs are suitable candidates for building fast and energy-efficient TRNG hardware units for probabilistic computing.
磁隧道结(mtj)是自旋电子器件的基本组成部分,已被用于构建具有吞吐量、功率和面积要求之间不同权衡的真随机数发生器(trng)。具有高势垒磁体(HBMs)的MTJs已被用于产生随机比特流,其吞吐量为≤200mb /s,能量消耗为pJ/bit。然而,高温敏感性会对其作为TRNG的性能产生不利影响。采用低势垒磁体(lbm)的超顺磁mtj也被用于TRNG操作。尽管基于lbm的MTJs可以在低能量下工作,但它们存在动力学慢、对工艺变化敏感和制造良率低的问题。在本文中,我们建立了一个基于垂直磁各向异性的中垒磁体的TRNG模型。所提出的基于mbm的TRNG由短电压脉冲驱动,以诱导弹道但随机的磁化开关。我们表明,所提出的TRNG可以在大约500 MHz的频率下工作,而消耗的能量小于100 fJ/bit。在短脉冲弹道极限下,相对于lbm和HBMs,我们的器件的开关概率对温度和材料参数的变化具有鲁棒性。我们的研究结果表明,基于mbm的mtj是构建用于概率计算的快速节能TRNG硬件单元的合适人选。
{"title":"A True Random Number Generator for Probabilistic Computing using Stochastic Magnetic Actuated Random Transducer Devices","authors":"Ankit Shukla, L. Heller, Md Golam Morshed, L. Rehm, Avik W. Ghosh, A. Kent, S. Rakheja","doi":"10.1109/ISQED57927.2023.10129319","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129319","url":null,"abstract":"Magnetic tunnel junctions (MTJs), which are the fundamental building blocks of spintronic devices, have been used to build true random number generators (TRNGs) with different trade-offs between throughput, power, and area requirements. MTJs with high-barrier magnets (HBMs) have been used to generate random bitstreams with ≲ 200 Mb/s throughput and pJ/bit energy consumption. A high temperature sensitivity, however, adversely affects their performance as a TRNG. Superparamagnetic MTJs employing low-barrier magnets (LBMs) have also been used for TRNG operation. Although LBM-based MTJs can operate at low energy, they suffer from slow dynamics, sensitivity to process variations, and low fabrication yield. In this paper, we model a TRNG based on medium-barrier magnets (MBMs) with perpendicular magnetic anisotropy. The proposed MBM-based TRNG is driven with short voltage pulses to induce ballistic, yet stochastic, magnetization switching. We show that the proposed TRNG can operate at frequencies of about 500 MHz while consuming less than 100 fJ/bit of energy. In the short-pulse ballistic limit, the switching probability of our device shows robustness to variations in temperature and material parameters relative to LBMs and HBMs. Our results suggest that MBM-based MTJs are suitable candidates for building fast and energy-efficient TRNG hardware units for probabilistic computing.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128604741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MAAS: Hiding Trojans in Approximate Circuits 在近似电路中隐藏木马
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129286
Qazi Arbab Ahmed, M. Awais, M. Platzner
Automated frameworks for approximate accelerator synthesis employ an iterative search-based approach to generate approximate instances of hardware. While offering distinct savings in terms of hardware area and power consumption, approximate circuits are potentially at risk of being infected with hardware Trojans mainly due to the fact that the approximation is typically provided by third-party approximate accelerator synthesis frameworks which utilize components libraries to perform substitutions during the design space exploration phase. In this paper, we propose a threat model that discusses the potential of hardware Trojans insertion during the approximate accelerator synthesis. Moreover, we present MAAS, a framework that exploits a search-based approximate accelerator synthesis technique to demonstrate the applicability of our threat model by hiding Trojans in approximate circuits. The experimental results show that the approximate circuits generated by MAAS containing infected hardware Trojans are slightly larger than the approximate designs and are hard to identify via conventional area and power measurement techniques. To the best of our knowledge, this is the first effort to demonstrate the hardware Trojan insertion in the third-party approximate accelerator synthesis flow via library component substitution.
近似加速器合成的自动化框架采用基于迭代搜索的方法来生成硬件的近似实例。虽然在硬件面积和功耗方面提供了明显的节省,但近似电路存在被硬件木马感染的潜在风险,这主要是因为近似电路通常是由第三方近似加速器合成框架提供的,该框架在设计空间探索阶段利用组件库执行替换。在本文中,我们提出了一个威胁模型,讨论了在近似加速器合成过程中硬件木马插入的可能性。此外,我们提出了MAAS框架,该框架利用基于搜索的近似加速器合成技术,通过将木马隐藏在近似电路中来证明我们的威胁模型的适用性。实验结果表明,含有感染硬件木马的MAAS生成的近似电路比近似设计略大,难以通过传统的面积和功率测量技术识别。据我们所知,这是第一次通过库组件替换在第三方近似加速器合成流中演示硬件木马插入。
{"title":"MAAS: Hiding Trojans in Approximate Circuits","authors":"Qazi Arbab Ahmed, M. Awais, M. Platzner","doi":"10.1109/ISQED57927.2023.10129286","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129286","url":null,"abstract":"Automated frameworks for approximate accelerator synthesis employ an iterative search-based approach to generate approximate instances of hardware. While offering distinct savings in terms of hardware area and power consumption, approximate circuits are potentially at risk of being infected with hardware Trojans mainly due to the fact that the approximation is typically provided by third-party approximate accelerator synthesis frameworks which utilize components libraries to perform substitutions during the design space exploration phase. In this paper, we propose a threat model that discusses the potential of hardware Trojans insertion during the approximate accelerator synthesis. Moreover, we present MAAS, a framework that exploits a search-based approximate accelerator synthesis technique to demonstrate the applicability of our threat model by hiding Trojans in approximate circuits. The experimental results show that the approximate circuits generated by MAAS containing infected hardware Trojans are slightly larger than the approximate designs and are hard to identify via conventional area and power measurement techniques. To the best of our knowledge, this is the first effort to demonstrate the hardware Trojan insertion in the third-party approximate accelerator synthesis flow via library component substitution.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"525 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123205614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Throughput Hardware Implementation for Haraka in SPHINCS+ Haraka在SPHINCS+中的高吞吐量硬件实现
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129310
Yueqin Dai, Yifeng Song, Jing Tian, Zhongfeng Wang
SPHINCS+, a hash-based signature scheme, has stood out as one of the four winners in the post-quantum cryptography (PQC) competition hosted by the U.S. National Institute of Standards and Technology (NIST). However, the slow signing speed forms a bottleneck for applications. Therefore, a kind of short-input hash function named Haraka is recommended as the third instantiation in SPHINCS+ due to its advantage in processing speed. In this work, we propose four hardware architecture schemes for Haraka in SPHINCS+, denoted as Case I to Case IV. Several optimization methods are combined and applied in different cases to perform the trade-off between area and throughput for different application scenarios. We code our designs in System Verilog language and synthesize them under the TSMC 28-nm CMOS technology. The experiment results show that Case IV achieves the best throughput and the most efficient performance, about 81.92 Gbps and 1.26 Mbps/GE, respectively, which also significantly outperforms the state-of-the-art implementation of Haraka and the advanced hardware implementation of the SHA-3 hash function.
在由美国国家标准与技术研究院(NIST)主办的后量子密码学(PQC)竞赛中,基于哈希的签名方案SPHINCS+脱颖而出,成为四个获胜者之一。但是,缓慢的签名速度会成为应用程序的瓶颈。因此,由于在处理速度上的优势,推荐使用一种名为Haraka的短输入哈希函数作为SPHINCS+中的第三个实例化。在这项工作中,我们为SPHINCS+中的Haraka提出了四种硬件架构方案,分别为Case I到Case IV。在不同的情况下,我们结合了几种优化方法,以在不同的应用场景下实现面积和吞吐量之间的权衡。我们用System Verilog语言进行编码,并在台积电28纳米CMOS技术下进行合成。实验结果表明,Case IV实现了最好的吞吐量和最有效的性能,分别约为81.92 Gbps和1.26 Mbps/GE,这也显著优于Haraka的最先进实现和SHA-3哈希函数的先进硬件实现。
{"title":"High-Throughput Hardware Implementation for Haraka in SPHINCS+","authors":"Yueqin Dai, Yifeng Song, Jing Tian, Zhongfeng Wang","doi":"10.1109/ISQED57927.2023.10129310","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129310","url":null,"abstract":"SPHINCS+, a hash-based signature scheme, has stood out as one of the four winners in the post-quantum cryptography (PQC) competition hosted by the U.S. National Institute of Standards and Technology (NIST). However, the slow signing speed forms a bottleneck for applications. Therefore, a kind of short-input hash function named Haraka is recommended as the third instantiation in SPHINCS+ due to its advantage in processing speed. In this work, we propose four hardware architecture schemes for Haraka in SPHINCS+, denoted as Case I to Case IV. Several optimization methods are combined and applied in different cases to perform the trade-off between area and throughput for different application scenarios. We code our designs in System Verilog language and synthesize them under the TSMC 28-nm CMOS technology. The experiment results show that Case IV achieves the best throughput and the most efficient performance, about 81.92 Gbps and 1.26 Mbps/GE, respectively, which also significantly outperforms the state-of-the-art implementation of Haraka and the advanced hardware implementation of the SHA-3 hash function.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125544185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Split-Slope Chaotic Map Providing High Entropy Across Wide Range 在大范围内提供高熵的分坡混沌映射
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129295
P. Paul, Maisha Sadia, Anur Dhungel, Parker Hardy, Md. Sakib Hasan
This paper presents a novel one-dimensional discrete-time chaotic map. A significantly improved chaotic behavior, compared to already published one-dimensional maps, is achieved in the proposed design by virtue of this non-linear map’s stiffer transfer characteristics. The novelty of the work comes from the proposed methodology of splitting upward and downward slopping mechanisms to gain a stiffer slope in the uni-modal nonlinear circuit. The design methodology is presented with the help of the stability analysis of fixed points, which is generally applicable to a wide variety of nonlinear circuits. The chaotic complexity of the proposed circuit is analyzed with the bifurcation plot, correlation coefficient, and Lyapunov Exponent. The results are compared with reported works to demonstrate a significant improvement. Along with high chaotic complexity, this split-slope chaotic map provides a wide chaotic range covering 100% of the overall region of operation. The high chaotic complexity across a wide chaotic range is achieved with a remarkably low transistor-count circuit which is suitable in many hardware-security applications including, random number generation, chaotic logic circuits, and so on, for resource-constrained devices.
提出了一种新的一维离散混沌映射。与已经发表的一维映射相比,通过这种非线性映射的刚性传递特性,在提出的设计中实现了显着改善的混沌行为。这项工作的新颖之处在于提出了在单模态非线性电路中拆分向上和向下倾斜机构以获得更硬斜率的方法。提出了一种基于不动点稳定性分析的设计方法,该方法一般适用于各种非线性电路。利用分岔图、相关系数和李雅普诺夫指数分析了电路的混沌复杂度。结果与已报道的工作进行了比较,证明了显著的改进。这种分裂斜率混沌映射具有较高的混沌复杂度,提供了覆盖整个操作区域100%的宽混沌范围。在一个非常低的晶体管计数电路中实现了宽混沌范围内的高混沌复杂性,这适用于许多硬件安全应用,包括随机数生成,混沌逻辑电路等,用于资源受限的设备。
{"title":"Split-Slope Chaotic Map Providing High Entropy Across Wide Range","authors":"P. Paul, Maisha Sadia, Anur Dhungel, Parker Hardy, Md. Sakib Hasan","doi":"10.1109/ISQED57927.2023.10129295","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129295","url":null,"abstract":"This paper presents a novel one-dimensional discrete-time chaotic map. A significantly improved chaotic behavior, compared to already published one-dimensional maps, is achieved in the proposed design by virtue of this non-linear map’s stiffer transfer characteristics. The novelty of the work comes from the proposed methodology of splitting upward and downward slopping mechanisms to gain a stiffer slope in the uni-modal nonlinear circuit. The design methodology is presented with the help of the stability analysis of fixed points, which is generally applicable to a wide variety of nonlinear circuits. The chaotic complexity of the proposed circuit is analyzed with the bifurcation plot, correlation coefficient, and Lyapunov Exponent. The results are compared with reported works to demonstrate a significant improvement. Along with high chaotic complexity, this split-slope chaotic map provides a wide chaotic range covering 100% of the overall region of operation. The high chaotic complexity across a wide chaotic range is achieved with a remarkably low transistor-count circuit which is suitable in many hardware-security applications including, random number generation, chaotic logic circuits, and so on, for resource-constrained devices.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130686613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low Overhead System-Level Obfuscation through Hardware Resource Sharing 通过硬件资源共享实现低开销的系统级混淆
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129342
Daniel Xing, Michael Zuzak, A. Srivastava
Logic locking techniques have been proposed to protect chip designs from malicious reverse engineering and overproduction. Stripped functionality logic locking (SFLL) has gained substantial traction as a current state of the art method, exhibiting strong resilience against a wide variety of attacks. However, secure instances of SFLL-based locking tend to have high power and area overheads, particularly in its restore units. This work presents a novel architectural approach to restore unit configuration for SFLL-like logic locking methods that treats restore units as an overhead-constrained shareable resource. We describe how resource contention caused by sharing of restore units imposes constraints on the underlying locking scheme from a graph theoretic perspective and propose both a 0-1 ILP and a heuristic clustering algorithm for finding resource-constrained shared locking configurations that satisfy these constraints. We evaluate our sharing method on SFLL-flex and find that our ILP and heuristic methods were each able to achieve a 55% and 31% reduction in power used by locked datapaths synthesized from MediaBench benchmarks while maintaining the same security and functionality compared to datapaths locked with conventional gate-level techniques.
逻辑锁定技术已被提出,以保护芯片设计免受恶意逆向工程和生产过剩。剥离功能逻辑锁定(SFLL)作为当前最先进的方法获得了极大的关注,显示出对各种攻击的强大弹性。但是,基于sfll的锁定的安全实例往往具有很高的功率和面积开销,特别是在其恢复单元中。这项工作提出了一种新的体系结构方法来为类似sfll的逻辑锁定方法恢复单元配置,这种方法将恢复单元视为开销受限的可共享资源。我们从图论的角度描述了由共享恢复单元引起的资源争用如何对底层锁定方案施加约束,并提出了一个0-1 ILP和一个启发式聚类算法,用于寻找满足这些约束的资源受限共享锁定配置。我们在SFLL-flex上评估了我们的共享方法,发现我们的ILP和启发式方法分别能够将从mediabbench基准合成的锁定数据路径的功耗降低55%和31%,同时与使用传统门级技术锁定的数据路径相比,保持相同的安全性和功能。
{"title":"Low Overhead System-Level Obfuscation through Hardware Resource Sharing","authors":"Daniel Xing, Michael Zuzak, A. Srivastava","doi":"10.1109/ISQED57927.2023.10129342","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129342","url":null,"abstract":"Logic locking techniques have been proposed to protect chip designs from malicious reverse engineering and overproduction. Stripped functionality logic locking (SFLL) has gained substantial traction as a current state of the art method, exhibiting strong resilience against a wide variety of attacks. However, secure instances of SFLL-based locking tend to have high power and area overheads, particularly in its restore units. This work presents a novel architectural approach to restore unit configuration for SFLL-like logic locking methods that treats restore units as an overhead-constrained shareable resource. We describe how resource contention caused by sharing of restore units imposes constraints on the underlying locking scheme from a graph theoretic perspective and propose both a 0-1 ILP and a heuristic clustering algorithm for finding resource-constrained shared locking configurations that satisfy these constraints. We evaluate our sharing method on SFLL-flex and find that our ILP and heuristic methods were each able to achieve a 55% and 31% reduction in power used by locked datapaths synthesized from MediaBench benchmarks while maintaining the same security and functionality compared to datapaths locked with conventional gate-level techniques.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132128078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Method Against Hardware Trojans in Approximate Circuits 近似电路中对抗硬件木马的新方法
Pub Date : 2023-04-05 DOI: 10.1109/isqed57927.2023.10129367
Yuqin Dou, Chongyan Gu, Chenghua Wang, Weiqiang Liu
Approximate computing is a promising computing paradigm that trades off power consumption and performance for error-tolerant applications. Approximate computing has been widely studied, such as for arithmetic circuits and accelerators. However, recent research has shown security vulnerabilities in approximate circuits. Hardware Trojans are one of the major threats to hardware circuits and have not been fully studied for approximate computing. Majority voting (MV) based on vendor diversities has been proposed as an effective technique to mask and/or detect hardware Trojans when assembling trusted chips with untrustworthy components. However, the randomness and diversity of inherent errors in approximate circuits can invalidate the MV technique. In this paper, for the first time, the authors present the challenges to approximate circuits when multiple vendors are considered to provide IPs for approximate circuits (IPac). Experiments demonstrate that the MV strategy is not applicable when trusted chips are assembled with IPacs. A comparison-based technique is proposed to thwart hardware Trojan attacks on approximate circuits. The experimental results show a high effectiveness of the proposed method to detect hardware Trojans in approximate circuits.
近似计算是一种很有前途的计算范式,它为容错应用程序权衡了功耗和性能。近似计算已被广泛研究,例如用于算术电路和加速器。然而,最近的研究表明,近似电路存在安全漏洞。硬件木马是硬件电路的主要威胁之一,目前尚未对其进行充分的近似计算研究。基于供应商多样性的多数投票(MV)被认为是一种有效的技术,可以在将可信芯片与不可信组件组装在一起时屏蔽和/或检测硬件木马。然而,近似电路中固有误差的随机性和多样性会使中压技术失效。在本文中,作者首次提出了当考虑多个供应商为近似电路(IPac)提供ip时,近似电路面临的挑战。实验表明,当可信芯片与IPacs组合时,MV策略不适用。提出了一种基于比较的方法来阻止近似电路上的硬件木马攻击。实验结果表明,该方法在近似电路中检测硬件木马具有较高的有效性。
{"title":"A Novel Method Against Hardware Trojans in Approximate Circuits","authors":"Yuqin Dou, Chongyan Gu, Chenghua Wang, Weiqiang Liu","doi":"10.1109/isqed57927.2023.10129367","DOIUrl":"https://doi.org/10.1109/isqed57927.2023.10129367","url":null,"abstract":"Approximate computing is a promising computing paradigm that trades off power consumption and performance for error-tolerant applications. Approximate computing has been widely studied, such as for arithmetic circuits and accelerators. However, recent research has shown security vulnerabilities in approximate circuits. Hardware Trojans are one of the major threats to hardware circuits and have not been fully studied for approximate computing. Majority voting (MV) based on vendor diversities has been proposed as an effective technique to mask and/or detect hardware Trojans when assembling trusted chips with untrustworthy components. However, the randomness and diversity of inherent errors in approximate circuits can invalidate the MV technique. In this paper, for the first time, the authors present the challenges to approximate circuits when multiple vendors are considered to provide IPs for approximate circuits (IPac). Experiments demonstrate that the MV strategy is not applicable when trusted chips are assembled with IPacs. A comparison-based technique is proposed to thwart hardware Trojan attacks on approximate circuits. The experimental results show a high effectiveness of the proposed method to detect hardware Trojans in approximate circuits.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134457593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Quantization Tradeoffs in a YOLO-based FPGA Accelerator Framework 基于yolo的FPGA加速框架中的图像量化权衡
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129324
Richard Yarnell, M. Hossain, R. Demara
Until recently, FPGA-based acceleration of convolutional neural networks (CNNs) has remained an open-ended research problem. Herein, we evaluate one new method for rapidly implementing CNNs using industry-standard frameworks within Xilinx UltraScale+ FPGA devices. Within this workflow, referred to as Framework for Accelerating YOLO-Based ML on Edge-devices (FAYME), a TensorFlow model of the You Only Look Once version 4 (YOLOv4) object detection algorithm is realized using Xilinx’s Vitis AI toolchain. We test various levels of model bit-quantization and evaluate performance while simultaneously analyzing the utilization of available memory and processing elements. We also implement a ResNet-50 model to provide additional comparisons. In this paper, we present our YOLO model, which achieves a mAP of 0.581, and our ResNet model, which achieves a Top-5 accuracy of 0.950. Furthermore, we demonstrate that these results are possible while utilizing less than 25% of the throughput offered by a single hardware accelerator in an UltraScale+ FPGA.
直到最近,基于fpga的卷积神经网络(cnn)加速仍然是一个开放式的研究问题。在此,我们评估了一种在Xilinx UltraScale+ FPGA器件中使用行业标准框架快速实现cnn的新方法。在这个被称为加速边缘设备上基于yolo4的机器学习框架(FAYME)的工作流程中,使用赛灵思的Vitis AI工具链实现了You Only Look Once version 4 (YOLOv4)对象检测算法的TensorFlow模型。我们测试了各种级别的模型位量化和评估性能,同时分析了可用内存和处理元素的利用率。我们还实现了一个ResNet-50模型来提供额外的比较。在本文中,我们提出了我们的YOLO模型,它实现了0.581的mAP,我们的ResNet模型,它实现了0.950的Top-5精度。此外,我们证明了这些结果是可能的,而在UltraScale+ FPGA中使用单个硬件加速器提供的吞吐量不到25%。
{"title":"Image Quantization Tradeoffs in a YOLO-based FPGA Accelerator Framework","authors":"Richard Yarnell, M. Hossain, R. Demara","doi":"10.1109/ISQED57927.2023.10129324","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129324","url":null,"abstract":"Until recently, FPGA-based acceleration of convolutional neural networks (CNNs) has remained an open-ended research problem. Herein, we evaluate one new method for rapidly implementing CNNs using industry-standard frameworks within Xilinx UltraScale+ FPGA devices. Within this workflow, referred to as Framework for Accelerating YOLO-Based ML on Edge-devices (FAYME), a TensorFlow model of the You Only Look Once version 4 (YOLOv4) object detection algorithm is realized using Xilinx’s Vitis AI toolchain. We test various levels of model bit-quantization and evaluate performance while simultaneously analyzing the utilization of available memory and processing elements. We also implement a ResNet-50 model to provide additional comparisons. In this paper, we present our YOLO model, which achieves a mAP of 0.581, and our ResNet model, which achieves a Top-5 accuracy of 0.950. Furthermore, we demonstrate that these results are possible while utilizing less than 25% of the throughput offered by a single hardware accelerator in an UltraScale+ FPGA.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134297972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Image Segmentation for Defect Detection in Photo-lithography Fabrication 用于光刻加工缺陷检测的深度图像分割
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129372
O. Paul, Sakib Abrar, Richard Mu, Riadul Islam, Manar D. Samad
Surface acoustic wave (SAW) sensors with increasingly unique and refined designed patterns are often developed using the lithographic fabrication processes. Emerging applications of SAW sensors often require novel materials, which may present uncharted fabrication outcomes. The fidelity of the SAW sensor performance is often correlated with the ability to restrict the presence of defects in post-fabrication. Therefore, it is critical to have effective means to detect the presence of defects within the SAW sensor. However, labor-intensive manual labeling is often required due to the need for precision identification and classification of surface features for increased confidence in model accuracy. One approach to automating defect detection is to leverage effective machine learning techniques to analyze and quantify defects within the SAW sensor. In this paper, we propose a machine learning approach using a deep convolutional autoencoder to segment surface features semantically. The proposed deep image autoencoder takes a grayscale input image and generates a color image segmenting the defect region in red, metallic interdigital transducing (IDT) fingers in green, and the substrate region in blue. Experimental results demonstrate promising segmentation scores in locating the defects and regions of interest for a novel SAW sensor variant. The proposed method can automate the process of localizing and measuring post-fabrication defects at the pixel level that may be missed by error-prone visual inspection.
表面声波(SAW)传感器通常采用光刻工艺开发,具有越来越独特和精细的设计图案。SAW传感器的新兴应用通常需要新颖的材料,这可能会带来未知的制造结果。SAW传感器性能的保真度通常与后期制造中限制缺陷存在的能力相关。因此,具有有效的手段来检测SAW传感器内部缺陷的存在是至关重要的。然而,由于需要精确识别和分类表面特征以增加模型准确性的信心,通常需要劳动密集型的手动标记。自动化缺陷检测的一种方法是利用有效的机器学习技术来分析和量化SAW传感器中的缺陷。在本文中,我们提出了一种使用深度卷积自编码器对表面特征进行语义分割的机器学习方法。所提出的深度图像自动编码器采用灰度输入图像并生成彩色图像,其中红色为缺陷区域,绿色为金属数字间换能器(IDT)手指,蓝色为衬底区域。实验结果表明,一种新型声表面波传感器变体在定位缺陷和感兴趣区域方面具有良好的分割分数。该方法可以在像素级自动定位和测量容易出错的视觉检测可能遗漏的加工后缺陷。
{"title":"Deep Image Segmentation for Defect Detection in Photo-lithography Fabrication","authors":"O. Paul, Sakib Abrar, Richard Mu, Riadul Islam, Manar D. Samad","doi":"10.1109/ISQED57927.2023.10129372","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129372","url":null,"abstract":"Surface acoustic wave (SAW) sensors with increasingly unique and refined designed patterns are often developed using the lithographic fabrication processes. Emerging applications of SAW sensors often require novel materials, which may present uncharted fabrication outcomes. The fidelity of the SAW sensor performance is often correlated with the ability to restrict the presence of defects in post-fabrication. Therefore, it is critical to have effective means to detect the presence of defects within the SAW sensor. However, labor-intensive manual labeling is often required due to the need for precision identification and classification of surface features for increased confidence in model accuracy. One approach to automating defect detection is to leverage effective machine learning techniques to analyze and quantify defects within the SAW sensor. In this paper, we propose a machine learning approach using a deep convolutional autoencoder to segment surface features semantically. The proposed deep image autoencoder takes a grayscale input image and generates a color image segmenting the defect region in red, metallic interdigital transducing (IDT) fingers in green, and the substrate region in blue. Experimental results demonstrate promising segmentation scores in locating the defects and regions of interest for a novel SAW sensor variant. The proposed method can automate the process of localizing and measuring post-fabrication defects at the pixel level that may be missed by error-prone visual inspection.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HD2FPGA: Automated Framework for Accelerating Hyperdimensional Computing on FPGAs HD2FPGA: fpga上加速超维计算的自动化框架
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129332
Tinaqi Zhang, Sahand Salamat, Behnam Khaleghi, Justin Morris, Baris Aksanli, T. Simunic
Building a highly-efficient FPGA accelerator for Hyperdimensional (HD) computing is tedious work that requires Register Transfer Level (RTL) programming and verification. An inexperienced designer might waste significant time finding the best resource allocation scheme to achieve the target performance under resource constraints, especially for edge applications. HD computing is a novel computational paradigm that emulates brain functionality in performing cognitive tasks. The underlying computations of HD involve a substantial number of element-wise operations (e.g., additions and multiplications) on ultra-wide hypervectors (HVs), which can be effectively parallelized and pipelined. Although different HD applications might vary in terms of the number of input features and output classes (labels), they generally follow the same computation flow. In this paper, we propose HD2FPGA, an automated tool that generates fast and highly efficient FPGA-based accelerators for HD classification and clustering. HD2FPGA eliminates the arduous task of hand-crafted design of hardware accelerators by leveraging a template of optimized processing elements to automatically generate an FPGA implementation as a function of application specifications and user constraints. For HD classification HD2FPGA, on average, provides 1.5× (up to 2.5×) speedup compared to the state-of-the-art FPGA-based accelerator and 36.6× speedup with 5.4× higher energy efficiency compared to the GPU-based one. For HD clustering, HD2FPGA is 2.2× faster than the GPU framework.
为超高维(HD)计算构建高效的FPGA加速器是一项繁琐的工作,需要寄存器传输级(RTL)编程和验证。缺乏经验的设计人员可能会浪费大量时间来寻找在资源限制下实现目标性能的最佳资源分配方案,特别是对于边缘应用程序。高清计算是一种新颖的计算范式,它模拟大脑在执行认知任务中的功能。HD的底层计算涉及超宽超向量(HVs)上大量的元素操作(例如加法和乘法),这些操作可以有效地并行化和流水线化。尽管不同的HD应用程序在输入特征和输出类(标签)的数量方面可能有所不同,但它们通常遵循相同的计算流程。在本文中,我们提出了HD2FPGA,一种自动化工具,可以生成快速高效的基于fpga的高清分类和聚类加速器。通过利用优化处理元素的模板,根据应用规范和用户约束自动生成FPGA实现,HD2FPGA消除了手工设计硬件加速器的艰巨任务。对于高清分类,HD2FPGA平均比最先进的基于fpga的加速器提供1.5倍(最高2.5倍)的加速,比基于gpu的加速器提供36.6倍的加速和5.4倍的能效。对于高清集群,HD2FPGA比GPU框架快2.2倍。
{"title":"HD2FPGA: Automated Framework for Accelerating Hyperdimensional Computing on FPGAs","authors":"Tinaqi Zhang, Sahand Salamat, Behnam Khaleghi, Justin Morris, Baris Aksanli, T. Simunic","doi":"10.1109/ISQED57927.2023.10129332","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129332","url":null,"abstract":"Building a highly-efficient FPGA accelerator for Hyperdimensional (HD) computing is tedious work that requires Register Transfer Level (RTL) programming and verification. An inexperienced designer might waste significant time finding the best resource allocation scheme to achieve the target performance under resource constraints, especially for edge applications. HD computing is a novel computational paradigm that emulates brain functionality in performing cognitive tasks. The underlying computations of HD involve a substantial number of element-wise operations (e.g., additions and multiplications) on ultra-wide hypervectors (HVs), which can be effectively parallelized and pipelined. Although different HD applications might vary in terms of the number of input features and output classes (labels), they generally follow the same computation flow. In this paper, we propose HD2FPGA, an automated tool that generates fast and highly efficient FPGA-based accelerators for HD classification and clustering. HD2FPGA eliminates the arduous task of hand-crafted design of hardware accelerators by leveraging a template of optimized processing elements to automatically generate an FPGA implementation as a function of application specifications and user constraints. For HD classification HD2FPGA, on average, provides 1.5× (up to 2.5×) speedup compared to the state-of-the-art FPGA-based accelerator and 36.6× speedup with 5.4× higher energy efficiency compared to the GPU-based one. For HD clustering, HD2FPGA is 2.2× faster than the GPU framework.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"44 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114121657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 24th International Symposium on Quality Electronic Design (ISQED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1