首页 > 最新文献

2023 24th International Symposium on Quality Electronic Design (ISQED)最新文献

英文 中文
RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms RECO-LFSR:基于LFSR的可重构低功耗加密处理器,适用于可信物联网平台
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129323
Mohamed El-Hadedy, Russell Hua, Kazutomo Yoshii, Wen-mei W. Hwu, M. Margala
Today we see lightweight computer hardware utilized in large volumes, especially with the growing use of IoT devices in homes. However, such devices often ignore security until it is too late and sensitive data breaches have occurred.From here, the importance of finding lightweight cryptographic primitives to secure IoT devices is exponentially increasing, while not impacting the limited resources and limitation of the battery lifetime. In the search for a lightweight cryptographic standard, one must consider how to implement such algorithms optimally. For example, certain parts of an algorithm might be faster in hardware than in software and vice versa.This paper presents a hardware extension supporting the MicroBlaze softcore processor to efficiently implement one of the Lightweight Cryptography (LWC) finalists (TinyJAMBU) on Digilent Nexys A7-100T. The proposed hardware extension consists of a reconfigurable Non-Linear Feedback Shift Register (NLFSR), the central computing part for the authenticated encryption with associated data (AEAD) TinyJAMBU. The proposed NLFSR can run different variants of TinyJAMBU while only consuming 186 mWh in just ten minutes at 100 MHz. The total resources needed to host the proposed NLFSR on the FPGA are 610 LUT and 505 Flip-Flops while executable the binary size is 352 bytes smaller. Therefore, the proposed solution based on the hardware extension is x2.17 times faster than the pure software implementation of the whole TinyJAMBU using MicroBlaze while consuming six mWh more. To our knowledge, this is the first implementation of TinyJAMBU using software/hardware partitioning on FPGA with the softcore processor MicroBlaze.
今天,我们看到大量使用轻量级计算机硬件,特别是随着物联网设备在家庭中的使用越来越多。然而,这些设备往往忽视安全,直到为时已晚,敏感数据泄露发生。从这里开始,寻找轻量级加密原语来保护物联网设备的重要性呈指数级增长,同时不影响有限的资源和电池寿命的限制。在寻找轻量级加密标准时,必须考虑如何以最佳方式实现这些算法。例如,算法的某些部分在硬件上可能比在软件上快,反之亦然。本文提出了一个支持MicroBlaze软核处理器的硬件扩展,以有效地在Digilent Nexys A7-100T上实现轻量级加密(LWC)决赛(TinyJAMBU)之一。提出的硬件扩展包括一个可重构的非线性反馈移位寄存器(NLFSR),该寄存器是带关联数据的身份验证加密(AEAD) TinyJAMBU的中心计算部分。提出的NLFSR可以运行不同版本的TinyJAMBU,在100 MHz下仅在10分钟内消耗186mwh。在FPGA上托管提议的NLFSR所需的总资源为610 LUT和505 flip - flop,而可执行二进制大小减少了352字节。因此,提出的基于硬件扩展的解决方案比使用MicroBlaze实现整个TinyJAMBU的纯软件实现快x2.17倍,同时多消耗6兆瓦时。据我们所知,这是第一个在FPGA上使用软核处理器MicroBlaze的软硬件分区的TinyJAMBU实现。
{"title":"RECO-LFSR: Reconfigurable Low-power Cryptographic processor based on LFSR for Trusted IoT platforms","authors":"Mohamed El-Hadedy, Russell Hua, Kazutomo Yoshii, Wen-mei W. Hwu, M. Margala","doi":"10.1109/ISQED57927.2023.10129323","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129323","url":null,"abstract":"Today we see lightweight computer hardware utilized in large volumes, especially with the growing use of IoT devices in homes. However, such devices often ignore security until it is too late and sensitive data breaches have occurred.From here, the importance of finding lightweight cryptographic primitives to secure IoT devices is exponentially increasing, while not impacting the limited resources and limitation of the battery lifetime. In the search for a lightweight cryptographic standard, one must consider how to implement such algorithms optimally. For example, certain parts of an algorithm might be faster in hardware than in software and vice versa.This paper presents a hardware extension supporting the MicroBlaze softcore processor to efficiently implement one of the Lightweight Cryptography (LWC) finalists (TinyJAMBU) on Digilent Nexys A7-100T. The proposed hardware extension consists of a reconfigurable Non-Linear Feedback Shift Register (NLFSR), the central computing part for the authenticated encryption with associated data (AEAD) TinyJAMBU. The proposed NLFSR can run different variants of TinyJAMBU while only consuming 186 mWh in just ten minutes at 100 MHz. The total resources needed to host the proposed NLFSR on the FPGA are 610 LUT and 505 Flip-Flops while executable the binary size is 352 bytes smaller. Therefore, the proposed solution based on the hardware extension is x2.17 times faster than the pure software implementation of the whole TinyJAMBU using MicroBlaze while consuming six mWh more. To our knowledge, this is the first implementation of TinyJAMBU using software/hardware partitioning on FPGA with the softcore processor MicroBlaze.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131412266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of Pattern-dependent Rapid Thermal Annealing Effects on SRAM Design 模式相关快速热退火对SRAM设计的影响分析
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129399
Vidya A. Chhabria, S. Sapatnekar
Rapid thermal annealing (RTA) is an important step in semiconductor manufacturing. RTA-induced variability due to differences in die layout patterns can significantly contribute to transistor parameter variations, resulting in degraded chip performance and yield. The die layout patterns that drive these variations are related to the distribution of the density of transistors (silicon) and shallow trench isolation (silicon dioxide) across the die, which result in emissivity variations that change the die surface temperature during annealing. While prior art has developed pattern-dependent simulators and provided mitigation techniques for digital design, it has failed to consider the impact of the temperature-dependent thermal conductivity of silicon on RTA effects and has not analyzed the effects on memory. This work develops a novel 3D transient pattern-dependent RTA simulation methodology that accounts for the dependence of the thermal conductivity of silicon on temperature. The simulator is used to both analyze the effects of RTA on memory performance and to propose mitigation strategies for a 7nm FinFET SRAM design. It is shown that RTA effects degrade read and write delays by 16% and 20% and read static noise margin (SNM) by 15%, and the applied mitigation strategies can compensate for these degradations at the cost of a 16% increase in area for a 7.5% tolerance in SNM margin.
快速热退火(RTA)是半导体制造中的一个重要步骤。由于芯片布局模式的差异,rta引起的可变性会显著影响晶体管参数的变化,从而导致芯片性能和良率的下降。驱动这些变化的模具布局模式与晶体管(硅)的密度分布和跨模具的浅沟槽隔离(二氧化硅)有关,这会导致发射率变化,从而改变退火过程中的模具表面温度。虽然现有技术已经开发了模式相关的模拟器,并为数字设计提供了缓解技术,但它未能考虑硅的温度相关导热系数对RTA效应的影响,也没有分析对存储器的影响。这项工作开发了一种新的三维瞬态模式相关的RTA模拟方法,该方法考虑了硅的导热系数对温度的依赖性。该模拟器用于分析RTA对存储器性能的影响,并为7nm FinFET SRAM设计提出缓解策略。研究表明,RTA效应使读写延迟分别降低16%和20%,读取静态噪声裕度(SNM)降低15%,所应用的缓解策略可以补偿这些退化,但代价是SNM裕度公差为7.5%,而面积增加16%。
{"title":"Analysis of Pattern-dependent Rapid Thermal Annealing Effects on SRAM Design","authors":"Vidya A. Chhabria, S. Sapatnekar","doi":"10.1109/ISQED57927.2023.10129399","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129399","url":null,"abstract":"Rapid thermal annealing (RTA) is an important step in semiconductor manufacturing. RTA-induced variability due to differences in die layout patterns can significantly contribute to transistor parameter variations, resulting in degraded chip performance and yield. The die layout patterns that drive these variations are related to the distribution of the density of transistors (silicon) and shallow trench isolation (silicon dioxide) across the die, which result in emissivity variations that change the die surface temperature during annealing. While prior art has developed pattern-dependent simulators and provided mitigation techniques for digital design, it has failed to consider the impact of the temperature-dependent thermal conductivity of silicon on RTA effects and has not analyzed the effects on memory. This work develops a novel 3D transient pattern-dependent RTA simulation methodology that accounts for the dependence of the thermal conductivity of silicon on temperature. The simulator is used to both analyze the effects of RTA on memory performance and to propose mitigation strategies for a 7nm FinFET SRAM design. It is shown that RTA effects degrade read and write delays by 16% and 20% and read static noise margin (SNM) by 15%, and the applied mitigation strategies can compensate for these degradations at the cost of a 16% increase in area for a 7.5% tolerance in SNM margin.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124643030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SQRTLIB : Library of Hardware Square Root Designs SQRTLIB:硬件平方根设计库
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129377
C. PrashanthH., S. SrinikethS., Shrikrishna Hebbar, R. Chinmaye, M. Rao
Square-root is an elementary arithmetic function that is utilized not only for image and signal processing applications but also to extract vector functionalities. The square-root module demands high energy and hardware resources, apart from being a complex design to implement. In the past, many techniques, including Iterative, New Non-Restoring (New-NR), CORDIC, Piece-wise-linear (PWL) approximation, Look-Up-Tables (LUTs), Digit-by-digit based integer (Digit-Int) format and fixed-point (Digit-FP) format implementations were reported to realize square-root function. Cartesian genetic programming (CGP) is a type of evolutionary algorithm that is suggested to evolve circuits by exploring a large solution space. This paper attempts to develop a library of square-root circuits ranging from 2-bits to 8-bits and also benchmark the proposed CGP evolved square-root circuits with the other hardware implementations. All designs were analyzed using both FPGA and ASIC (130 nm Skywater node) flow to characterize hardware parameters and evaluated using various error metrics. Among all the implementations, CGP-derived square-root designs of fixed-point format offered the best trade-off between hardware and error characteristics. All novel designs of this work are made freely available in [1] for further research and development usage.
平方根是一个初等算术函数,不仅用于图像和信号处理应用,而且还用于提取矢量函数。平方根模块除了设计复杂外,还需要大量的能量和硬件资源。在过去,许多技术,包括迭代,新非恢复(New- nr), CORDIC,分段线性(PWL)近似,查找表(LUTs),基于数字的整数(Digit-Int)格式和定点(Digit-FP)格式实现被报道实现平方根函数。笛卡尔遗传规划(CGP)是一种通过探索大的解空间来进化电路的进化算法。本文试图开发一个从2位到8位的平方根电路库,并将所提出的CGP进化平方根电路与其他硬件实现进行比较。所有设计都使用FPGA和ASIC (130 nm Skywater节点)流进行分析,以表征硬件参数,并使用各种误差指标进行评估。在所有实现中,基于cgp的定点格式的平方根设计在硬件和错误特性之间提供了最好的折衷。这项工作的所有新颖设计都在[1]中免费提供,以供进一步的研究和开发使用。
{"title":"SQRTLIB : Library of Hardware Square Root Designs","authors":"C. PrashanthH., S. SrinikethS., Shrikrishna Hebbar, R. Chinmaye, M. Rao","doi":"10.1109/ISQED57927.2023.10129377","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129377","url":null,"abstract":"Square-root is an elementary arithmetic function that is utilized not only for image and signal processing applications but also to extract vector functionalities. The square-root module demands high energy and hardware resources, apart from being a complex design to implement. In the past, many techniques, including Iterative, New Non-Restoring (New-NR), CORDIC, Piece-wise-linear (PWL) approximation, Look-Up-Tables (LUTs), Digit-by-digit based integer (Digit-Int) format and fixed-point (Digit-FP) format implementations were reported to realize square-root function. Cartesian genetic programming (CGP) is a type of evolutionary algorithm that is suggested to evolve circuits by exploring a large solution space. This paper attempts to develop a library of square-root circuits ranging from 2-bits to 8-bits and also benchmark the proposed CGP evolved square-root circuits with the other hardware implementations. All designs were analyzed using both FPGA and ASIC (130 nm Skywater node) flow to characterize hardware parameters and evaluated using various error metrics. Among all the implementations, CGP-derived square-root designs of fixed-point format offered the best trade-off between hardware and error characteristics. All novel designs of this work are made freely available in [1] for further research and development usage.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124919717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129338
Sathwika Bavikadi, Purab Ranjan Sutradhar, A. Ganguly, Sai Manoj Pudukotai Dinakarrao
Emerging applications including deep neural networks (DNNs) and convolutional neural networks (CNNs) employ massive amounts of data to perform computations and data analysis. Such applications often lead to resource constraints and impose large overheads in data movement between memory and compute units. Several architectures such as Processing-in-Memory (PIM) are introduced to alleviate the bandwidth bottlenecks and inefficiency of traditional computing architectures. However, the existing PIM architectures represent a trade-off between power, performance, area, energy efficiency, and programmability. To better achieve the energy-efficiency and flexibility criteria simultaneously in hardware accelerators, we introduce a multi-functional look-up-table (LUT)-based reconfigurable PIM architecture in this work. The proposed architecture is a many-core architecture, each core comprises processing elements (PEs), a stand-alone processor with programmable functional units built using high-speed reconfigurable LUTs. The proposed LUTs can perform various operations, including convolutional, pooling, and activation that are required for CNN acceleration. Additionally, the proposed LUTs are capable of providing multiple outputs relating to different functionalities simultaneously without the need to design different LUTs for different functionalities. This leads to optimized area and power overheads. Furthermore, we also design special-function LUTs, which can provide simultaneous outputs for multiplication and accumulation as well as special activation functions such as hyperbolics and sigmoids. We have evaluated various CNNs such as LeNet, AlexNet, and ResNet18,34,50. Our experimental results have demonstrated that when AlexNet is implemented on the proposed architecture shows a maximum of 200× higher energy efficiency and 1.5× higher throughput than a DRAM-based LUT-based PIM architecture.
包括深度神经网络(dnn)和卷积神经网络(cnn)在内的新兴应用使用大量数据来执行计算和数据分析。这样的应用程序通常会导致资源限制,并在内存和计算单元之间的数据移动中增加大量开销。为了缓解传统计算体系结构的带宽瓶颈和低效率,引入了内存中处理(PIM)等体系结构。然而,现有的PIM体系结构在功率、性能、面积、能源效率和可编程性之间进行了权衡。为了更好地在硬件加速器中同时实现能效和灵活性标准,我们在本工作中引入了一种基于多功能查找表(LUT)的可重构PIM架构。所提出的体系结构是一个多核体系结构,每个核心包括处理元素(pe),一个独立的处理器,具有使用高速可重构lut构建的可编程功能单元。建议的lut可以执行各种操作,包括卷积、池化和激活CNN加速所需的操作。此外,建议的lut能够同时提供与不同功能相关的多个输出,而无需为不同的功能设计不同的lut。这导致优化的面积和电力开销。此外,我们还设计了特殊功能的lut,它可以同时提供乘法和积累的输出以及特殊的激活函数,如双曲线和s型曲线。我们已经评估了各种cnn,如LeNet, AlexNet和resnet18,34,50。我们的实验结果表明,当AlexNet在所提出的架构上实现时,其能效比基于dram的基于lut的PIM架构最高提高200倍,吞吐量提高1.5倍。
{"title":"Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration","authors":"Sathwika Bavikadi, Purab Ranjan Sutradhar, A. Ganguly, Sai Manoj Pudukotai Dinakarrao","doi":"10.1109/ISQED57927.2023.10129338","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129338","url":null,"abstract":"Emerging applications including deep neural networks (DNNs) and convolutional neural networks (CNNs) employ massive amounts of data to perform computations and data analysis. Such applications often lead to resource constraints and impose large overheads in data movement between memory and compute units. Several architectures such as Processing-in-Memory (PIM) are introduced to alleviate the bandwidth bottlenecks and inefficiency of traditional computing architectures. However, the existing PIM architectures represent a trade-off between power, performance, area, energy efficiency, and programmability. To better achieve the energy-efficiency and flexibility criteria simultaneously in hardware accelerators, we introduce a multi-functional look-up-table (LUT)-based reconfigurable PIM architecture in this work. The proposed architecture is a many-core architecture, each core comprises processing elements (PEs), a stand-alone processor with programmable functional units built using high-speed reconfigurable LUTs. The proposed LUTs can perform various operations, including convolutional, pooling, and activation that are required for CNN acceleration. Additionally, the proposed LUTs are capable of providing multiple outputs relating to different functionalities simultaneously without the need to design different LUTs for different functionalities. This leads to optimized area and power overheads. Furthermore, we also design special-function LUTs, which can provide simultaneous outputs for multiplication and accumulation as well as special activation functions such as hyperbolics and sigmoids. We have evaluated various CNNs such as LeNet, AlexNet, and ResNet18,34,50. Our experimental results have demonstrated that when AlexNet is implemented on the proposed architecture shows a maximum of 200× higher energy efficiency and 1.5× higher throughput than a DRAM-based LUT-based PIM architecture.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Training from Streaming Data with Concept Drift on FPGAs fpga上概念漂移的流数据在线训练
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129312
Esther Roorda, S. Wilton
In dynamic environments, the inputs to machine learning models may exhibit statistical changes over time, through what is called concept drift. Incremental training can allow machine learning models to adapt to changing conditions and maintain high accuracy by continuously updating network parameters. In the context of FPGA-based accelerators however, online incremental learning is challenging due to resource and communication constraints, as well as the absence of labelled training data. These challenges have not been fully evaluated or addressed in existing research. In this paper, we present and evaluate strategies for performing incremental training on streaming data with concept drift on FPGA-based platforms. We first present FPGA-based implementations of existing training algorithms to demonstrate the viability of online training with concept shift and to evaluate design tradeoffs. We then propose a technique for online training without labelled data and demonstrate its potential in the context of FPGA-based hardware acceleration.
在动态环境中,通过所谓的概念漂移,机器学习模型的输入可能会随着时间的推移而出现统计变化。增量训练可以使机器学习模型适应不断变化的条件,并通过不断更新网络参数来保持较高的准确性。然而,在基于fpga的加速器的背景下,由于资源和通信的限制,以及缺乏标记的训练数据,在线增量学习是具有挑战性的。这些挑战尚未在现有研究中得到充分评估或解决。在本文中,我们提出并评估了在基于fpga的平台上对具有概念漂移的流数据进行增量训练的策略。我们首先提出了基于fpga的现有训练算法的实现,以证明在线培训的可行性,并评估设计权衡。然后,我们提出了一种无标记数据的在线训练技术,并展示了其在基于fpga的硬件加速背景下的潜力。
{"title":"Online Training from Streaming Data with Concept Drift on FPGAs","authors":"Esther Roorda, S. Wilton","doi":"10.1109/ISQED57927.2023.10129312","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129312","url":null,"abstract":"In dynamic environments, the inputs to machine learning models may exhibit statistical changes over time, through what is called concept drift. Incremental training can allow machine learning models to adapt to changing conditions and maintain high accuracy by continuously updating network parameters. In the context of FPGA-based accelerators however, online incremental learning is challenging due to resource and communication constraints, as well as the absence of labelled training data. These challenges have not been fully evaluated or addressed in existing research. In this paper, we present and evaluate strategies for performing incremental training on streaming data with concept drift on FPGA-based platforms. We first present FPGA-based implementations of existing training algorithms to demonstrate the viability of online training with concept shift and to evaluate design tradeoffs. We then propose a technique for online training without labelled data and demonstrate its potential in the context of FPGA-based hardware acceleration.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130706785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enlarging Reliable Pairs via Inter-Distance Offset for a PUF Entropy-Boosting Algorithm 基于间隔偏移的PUF熵增强算法中可靠对的扩大
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129308
Md. Omar Faruque, Wenjie Che
Physically Unclonable Functions (PUFs) are emerging hardware security primitives that leverage random variations during chip fabrication to generate unique secrets. The amount of random secrets that can be extracted from a limited number of physical PUF components can be measured by entropy bits. Existing strategies of pairing or grouping N RO-PUF elements have an entropy upper bound limited by log2(N!) or O(N•log2(N)). A recently proposed entropy boosting technique [9] improves the entropy bits to be quadratically large at N(N-1)/2 or O(N^2), significantly improved the RO-PUF hardware utilization efficiency in generating secrets. However, the improved amount of random secrets comes at the cost of discarding a large portion of unreliable bits. In this paper, we propose an "Inter-Distance Offset (IDO)" technique that converts those unreliable pairs to be reliable by adjusting the pair inter-distance to an appropriate range. Theoretical analysis of the ratio of converted unreliable bits is provided along with experimental validations. Experimental evaluations on reliability, Entropy and reliability tradeoffs are given using real RO PUF datasets in [10]. Information leakage is analyzed and evaluated using PUF datasets to identify those offset ranges that leak no information. The proposed technique improves the portion of reliable (quadratically large) entropy bits by 20% and 100% respectively for different offset ranges. Hardware implementation on FPGAs demonstrates that the proposed technique is lightweight in implementation and runtime.
物理不可克隆函数(puf)是新兴的硬件安全原语,它利用芯片制造过程中的随机变化来生成独特的秘密。可以从有限数量的物理PUF组件中提取的随机秘密的数量可以通过熵位来测量。现有的N个RO-PUF元素配对或分组策略的熵上界限制为log2(N!)或O(N•log2(N))。最近提出的一种熵增强技术[9]将熵位提高到N(N-1)/2或O(N^2)的二次大,显著提高了RO-PUF在生成秘密时的硬件利用率。然而,随机秘密数量的增加是以丢弃大量不可靠比特为代价的。在本文中,我们提出了一种“距离间偏移(IDO)”技术,通过调整对间距离到适当的范围,将那些不可靠的对转换为可靠的对。对转换不可靠位的比率进行了理论分析,并进行了实验验证。在[10]中使用真实RO PUF数据集给出了可靠性、熵和可靠性权衡的实验评估。使用PUF数据集对信息泄漏进行分析和评估,以确定未泄漏信息的偏移范围。在不同的偏移量范围内,该技术将可靠(二次大)熵比特的比例分别提高了20%和100%。在fpga上的硬件实现表明,该技术在实现和运行时都是轻量级的。
{"title":"Enlarging Reliable Pairs via Inter-Distance Offset for a PUF Entropy-Boosting Algorithm","authors":"Md. Omar Faruque, Wenjie Che","doi":"10.1109/ISQED57927.2023.10129308","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129308","url":null,"abstract":"Physically Unclonable Functions (PUFs) are emerging hardware security primitives that leverage random variations during chip fabrication to generate unique secrets. The amount of random secrets that can be extracted from a limited number of physical PUF components can be measured by entropy bits. Existing strategies of pairing or grouping N RO-PUF elements have an entropy upper bound limited by log2(N!) or O(N•log2(N)). A recently proposed entropy boosting technique [9] improves the entropy bits to be quadratically large at N(N-1)/2 or O(N^2), significantly improved the RO-PUF hardware utilization efficiency in generating secrets. However, the improved amount of random secrets comes at the cost of discarding a large portion of unreliable bits. In this paper, we propose an \"Inter-Distance Offset (IDO)\" technique that converts those unreliable pairs to be reliable by adjusting the pair inter-distance to an appropriate range. Theoretical analysis of the ratio of converted unreliable bits is provided along with experimental validations. Experimental evaluations on reliability, Entropy and reliability tradeoffs are given using real RO PUF datasets in [10]. Information leakage is analyzed and evaluated using PUF datasets to identify those offset ranges that leak no information. The proposed technique improves the portion of reliable (quadratically large) entropy bits by 20% and 100% respectively for different offset ranges. Hardware implementation on FPGAs demonstrates that the proposed technique is lightweight in implementation and runtime.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122029368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISQED 2023 Organizing Committee ISQED 2023组委会
Pub Date : 2023-04-05 DOI: 10.1109/isqed57927.2023.10129289
{"title":"ISQED 2023 Organizing Committee","authors":"","doi":"10.1109/isqed57927.2023.10129289","DOIUrl":"https://doi.org/10.1109/isqed57927.2023.10129289","url":null,"abstract":"","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122211515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Instruction Set for Flexible Dilated Convolutions and Mixed-Precision Operands 灵活扩展卷积和混合精度操作数的轻量级指令集
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129341
Simon Friedrich, Shambhavi Balamuthu Sampath, R. Wittig, M. Vemparala, Nael Fasfous, E. Matús, W. Stechele, G. Fettweis
Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.
专门用于对象检测和语义分割的现代深度神经网络需要特定的操作来增加或保持其特征映射的分辨率。因此,使用了更通用的卷积层,称为转置卷积和扩展卷积,在输入特征或权重的元素之间添加大量的零。通常,标准的神经网络硬件加速器以直接的方式处理这些卷积,而不注意添加的零,从而导致计算时间增加。为了解决这个问题,最近的工作建议用额外的硬件跳过冗余的元素,或者只在有限的膨胀率范围内有效地解决问题。我们提出了一种加速转置和扩展卷积的通用方法,该方法在支持所有扩展速率的同时不会引入任何硬件开销。为了实现这一目标,我们引入了一种新的精确可扩展的轻量级指令集和存储方案,可以应用于不同的卷积变体。这使得DeepLabV3+的速度提高了5倍,优于最近提出的设计方法。对所有工作负载的精确可扩展执行的支持进一步提高了PointPillars、DeepLabV3+和ENet网络的计算时间加速。与最先进的商用EdgeTPU相比,我们设计的加速器的ResNet-50的指令足迹减少了60%。
{"title":"Lightweight Instruction Set for Flexible Dilated Convolutions and Mixed-Precision Operands","authors":"Simon Friedrich, Shambhavi Balamuthu Sampath, R. Wittig, M. Vemparala, Nael Fasfous, E. Matús, W. Stechele, G. Fettweis","doi":"10.1109/ISQED57927.2023.10129341","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129341","url":null,"abstract":"Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":" 33","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120832453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DSEAdd: FPGA based Design Space Exploration for Approximate Adders with Variable Bit-precision 基于FPGA的可变位精度近似加法器的设计空间探索
Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129364
Archie Mishra, N. Rao
Functional approximation methods have been used to exploit the inherent error tolerance of several applications. Approximate computing reduces the resources utilized at the cost of acceptable accuracy loss. Designers need to follow a systematic approach to arrive at an optimized design configuration based on certain constraints. In this work, we present DSEAdd: an FPGA-based automated design space exploration (DSE) framework targeting variable bit-width approximate adders. Given a certain area, timing or accuracy (ATA) constraint, the approach helps to identify the best adder configuration. We introduce a metric known as Figure of Merit (FOM) to quantify the area, performance and accuracy of the design. We test the DSE framework by running a set of 74 design configurations. We demonstrate the use of FOM as a metric to choose the best adder configuration. We observe that we can obtain an area-optimized design with a 9.7% reduction in resource usage at the cost of only 0.3% accuracy, but with a lower bit precision (8-bit instead of 32-bits). Further, at low bit precisions, a slight compromise in the area (0.35%) can help improve the accuracy dramatically (17.7%). To achieve the best trade-off between accuracy and resources, we propose a configuration with 2 or 3 sub-adders. Lastly, we note that a performance-optimized design is difficult to achieve at higher bit-precision.
泛函近似方法已被用于开发几种应用的固有容错性。近似计算以可接受的精度损失为代价,减少了资源的利用。设计师需要遵循一种系统的方法,以达到基于某些约束的优化设计配置。在这项工作中,我们提出了DSEAdd:一个基于fpga的自动设计空间探索(DSE)框架,目标是可变位宽近似加法器。给定一定的面积、时间或精度(ATA)约束,该方法有助于确定最佳加法器配置。我们引入了一个被称为优点图(FOM)的度量来量化设计的面积、性能和精度。我们通过运行一组74个设计配置来测试DSE框架。我们将演示使用FOM作为选择最佳加法器配置的度量。我们观察到,我们可以获得一个面积优化设计,在只有0.3%精度的代价下,资源使用减少了9.7%,但比特精度较低(8位而不是32位)。此外,在较低的比特精度下,稍微降低0.35%的面积可以显著提高精度(17.7%)。为了实现精度和资源之间的最佳权衡,我们提出了一个具有2或3个子加法器的配置。最后,我们注意到性能优化设计很难在更高的位精度下实现。
{"title":"DSEAdd: FPGA based Design Space Exploration for Approximate Adders with Variable Bit-precision","authors":"Archie Mishra, N. Rao","doi":"10.1109/ISQED57927.2023.10129364","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129364","url":null,"abstract":"Functional approximation methods have been used to exploit the inherent error tolerance of several applications. Approximate computing reduces the resources utilized at the cost of acceptable accuracy loss. Designers need to follow a systematic approach to arrive at an optimized design configuration based on certain constraints. In this work, we present DSEAdd: an FPGA-based automated design space exploration (DSE) framework targeting variable bit-width approximate adders. Given a certain area, timing or accuracy (ATA) constraint, the approach helps to identify the best adder configuration. We introduce a metric known as Figure of Merit (FOM) to quantify the area, performance and accuracy of the design. We test the DSE framework by running a set of 74 design configurations. We demonstrate the use of FOM as a metric to choose the best adder configuration. We observe that we can obtain an area-optimized design with a 9.7% reduction in resource usage at the cost of only 0.3% accuracy, but with a lower bit precision (8-bit instead of 32-bits). Further, at low bit precisions, a slight compromise in the area (0.35%) can help improve the accuracy dramatically (17.7%). To achieve the best trade-off between accuracy and resources, we propose a configuration with 2 or 3 sub-adders. Lastly, we note that a performance-optimized design is difficult to achieve at higher bit-precision.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128778345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISQED 2023 Best Papers ISQED 2023最佳论文
Pub Date : 2023-04-05 DOI: 10.1109/isqed57927.2023.10129392
{"title":"ISQED 2023 Best Papers","authors":"","doi":"10.1109/isqed57927.2023.10129392","DOIUrl":"https://doi.org/10.1109/isqed57927.2023.10129392","url":null,"abstract":"","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128173383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 24th International Symposium on Quality Electronic Design (ISQED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1