2023 24th International Symposium on Quality Electronic Design (ISQED)最新文献

英文中文

Binary Synaptic Array for Inference and Training with Built-in RRAM Electroforming Circuit 基于内置RRAM电铸电路的二元突触阵列推理与训练

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129360

Ashvinikumar Dongre, G. Trivedi

Resistive Random Access Memory (RRAM) is extensively used for the implementation of synapses. Even though a fresh metal oxide RRAM sampled in the pristine state cannot exhibit resistive switching before electroforming, the integration of the electroforming circuit in RRAM based applications has not been discussed thoroughly. A major challenge in integrating forming circuits is the high voltage required for the forming process. The 4T-1R structure used for the implementation extends the applicability of the array to inference as well as training. The ADCs used to convert the RRAM current to digital output consume lots of area and power. They also suffer from nonlinearity that needs special attention, increasing the design complexity. In this work, we present an RRAM array with a circuit designed to isolate the peripheral circuitry during forming to avoid malfunctioning. We also propose an RRAM current sensor circuit that converts the RRAM current to output pulses that are converted to digital output. Since there is a large gap between the two resistive states, the synapse is tolerant to 25% cycle-to-cycle and device-to-device variation. We test the functionality of the array in the presence of Random Telegraph Noise (RTN) that is inherent to RRAM. The compliance current for the proposed design is 100 µA. The proposed RRAM array is 2.7× more energy efficient than the recent state-of-the-art designs. The area of the RRAM current sensor circuit is 18.1µm × 27.3µm.

电阻式随机存取存储器(RRAM)广泛用于突触的实现。尽管在原始状态下采样的新鲜金属氧化物RRAM在电铸前不能表现出电阻开关，但电铸电路在基于RRAM的应用中的集成尚未得到彻底的讨论。集成成型电路的一个主要挑战是成型过程所需的高电压。用于实现的4T-1R结构扩展了阵列对推理和训练的适用性。用于将RRAM电流转换为数字输出的adc消耗大量面积和功率。它们还存在需要特别注意的非线性，从而增加了设计的复杂性。在这项工作中，我们提出了一种RRAM阵列，其电路设计用于在形成过程中隔离外围电路以避免故障。我们还提出了一种RRAM电流传感器电路，该电路将RRAM电流转换为输出脉冲，再转换为数字输出。由于两种电阻状态之间有很大的间隙，突触可以容忍25%的周期到周期和器件到器件的变化。我们在RRAM固有的随机电报噪声(RTN)存在的情况下测试了阵列的功能。建议设计的符合电流为100µA。拟议的RRAM阵列比最近最先进的设计节能2.7倍。RRAM电流传感器电路的面积为18.1µm × 27.3µm。

{"title":"Binary Synaptic Array for Inference and Training with Built-in RRAM Electroforming Circuit","authors":"Ashvinikumar Dongre, G. Trivedi","doi":"10.1109/ISQED57927.2023.10129360","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129360","url":null,"abstract":"Resistive Random Access Memory (RRAM) is extensively used for the implementation of synapses. Even though a fresh metal oxide RRAM sampled in the pristine state cannot exhibit resistive switching before electroforming, the integration of the electroforming circuit in RRAM based applications has not been discussed thoroughly. A major challenge in integrating forming circuits is the high voltage required for the forming process. The 4T-1R structure used for the implementation extends the applicability of the array to inference as well as training. The ADCs used to convert the RRAM current to digital output consume lots of area and power. They also suffer from nonlinearity that needs special attention, increasing the design complexity. In this work, we present an RRAM array with a circuit designed to isolate the peripheral circuitry during forming to avoid malfunctioning. We also propose an RRAM current sensor circuit that converts the RRAM current to output pulses that are converted to digital output. Since there is a large gap between the two resistive states, the synapse is tolerant to 25% cycle-to-cycle and device-to-device variation. We test the functionality of the array in the presence of Random Telegraph Noise (RTN) that is inherent to RRAM. The compliance current for the proposed design is 100 µA. The proposed RRAM array is 2.7× more energy efficient than the recent state-of-the-art designs. The area of the RRAM current sensor circuit is 18.1µm × 27.3µm.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132258725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel Implementation of High-Performance Polynomial Multiplication for Unified KEM Saber based on TMVP Design Strategy 基于TMVP设计策略的统一KEM军刀高性能多项式乘法新实现

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129320

Pengzhou He, Jiafeng Xie

The rapid advancement in quantum technology has initiated a new round of exploration of efficient implementation of post-quantum cryptography (PQC) on hardware platforms. Key encapsulation mechanism (KEM) Saber, a module lattice-based PQC, is one of the four encryption scheme finalists in the third-round National Institute of Standards and Technology (NIST) standardization process. In this paper, we propose a novel Toeplitz Matrix-Vector Product (TMVP)-based design strategy to efficiently implement polynomial multiplication (essential arithmetic operation) for KEM Saber. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the polynomial multiplication of KEM Saber into a desired mathematical form for further developing into the proposed TMVP-based algorithm for high-performance operation; (ii) then, we have followed the proposed TMVP-based algorithm to innovatively transfer the derived algorithm into a unified polynomial multiplication structure (fits all security ranks) with the help of a series of algorithm-to-architecture co-implementation/mapping techniques; (iii) finally, detailed implementation results and complexity analysis have confirmed the efficiency of the proposed TMVP design strategy. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design has at least less 30.92% area-delay product (ADP) than the competing ones.

量子技术的快速发展引发了在硬件平台上高效实现后量子密码术(PQC)的新一轮探索。密钥封装机制(KEM) Saber是一种基于模块格的PQC，是美国国家标准与技术研究院(NIST)第三轮标准化过程中的四个加密方案决赛选手之一。本文提出了一种新的基于Toeplitz矩阵向量积(TMVP)的设计策略，以有效地实现KEM Saber的多项式乘法(基本算术运算)。提出的工作包括三层相互依存的努力:(i)首先，我们将KEM Saber的多项式乘法公式化为所需的数学形式，以便进一步发展为提出的基于tmvp的高性能操作算法;(ii)然后，我们根据提出的基于tmvp的算法，借助一系列算法到架构的协同实现/映射技术，创新地将导出的算法转换为统一的多项式乘法结构(适用于所有安全级别);(iii)最后，详细的实施结果和复杂性分析证实了所提出的TMVP设计策略的有效性。具体而言，现场可编程门阵列(FPGA)的实现结果表明，该设计比竞争对手的面积延迟积(ADP)至少低30.92%。

{"title":"Novel Implementation of High-Performance Polynomial Multiplication for Unified KEM Saber based on TMVP Design Strategy","authors":"Pengzhou He, Jiafeng Xie","doi":"10.1109/ISQED57927.2023.10129320","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129320","url":null,"abstract":"The rapid advancement in quantum technology has initiated a new round of exploration of efficient implementation of post-quantum cryptography (PQC) on hardware platforms. Key encapsulation mechanism (KEM) Saber, a module lattice-based PQC, is one of the four encryption scheme finalists in the third-round National Institute of Standards and Technology (NIST) standardization process. In this paper, we propose a novel Toeplitz Matrix-Vector Product (TMVP)-based design strategy to efficiently implement polynomial multiplication (essential arithmetic operation) for KEM Saber. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the polynomial multiplication of KEM Saber into a desired mathematical form for further developing into the proposed TMVP-based algorithm for high-performance operation; (ii) then, we have followed the proposed TMVP-based algorithm to innovatively transfer the derived algorithm into a unified polynomial multiplication structure (fits all security ranks) with the help of a series of algorithm-to-architecture co-implementation/mapping techniques; (iii) finally, detailed implementation results and complexity analysis have confirmed the efficiency of the proposed TMVP design strategy. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design has at least less 30.92% area-delay product (ADP) than the competing ones.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132433384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Flexible Cluster Tool Simulation Framework with Wafer Batch Dispatching Time Recommendation 一种具有批量调度时间建议的灵活集群工具仿真框架

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129375

Hsin-Ping Yen, Shiuan-Hau Huang, Yan-Hsiu Liu, Kuang-Hsien Tseng, J. Kung, Yi-Ting Li, Yung-Chih Chen, Chun-Yao Wang

The semiconductor manufacturing process consists of multiple steps and is usually time-consuming. Information like the turnaround time of a certain batch of wafers can be very useful for manufacturing engineers. A simulation model of manufacturing process can help predict the performance of manufacturing process efficiently, which is very beneficial to the manufacturing engineers. The simulation result can also deliver messages to system engineers for achieving better throughput after adjustment. In this work, we propose a flexible simulation framework for a cluster tool. We implemented the simulator in C++ language with SystemC. The batch information used for the design of simulator was gathered from industrial data. The experimental results show that there is only less than 2% difference between the simulation and the manufacturing data in terms of entire processing time, which indicates the high accuracy of the simulator. The experimental results with the proposed dispatching method achieve a higher throughput compared to the manufacturing data such that the dispatching time points can be recommended to the system engineers.

半导体制造过程由多个步骤组成，通常很耗时。像某一批晶圆的周转时间这样的信息对制造工程师来说非常有用。制造过程的仿真模型可以有效地预测制造过程的性能，这对制造工程师是非常有益的。仿真结果也可以为系统工程师提供信息，以便在调整后获得更好的吞吐量。在这项工作中，我们提出了一个灵活的集群工具仿真框架。我们用c++语言和SystemC语言实现了模拟器。用于模拟器设计的批量信息是从工业数据中收集的。实验结果表明，在整个加工时间内，仿真结果与制造数据相差不到2%，表明仿真结果具有较高的精度。实验结果表明，与制造数据相比，该调度方法的吞吐量更高，可以向系统工程师推荐调度时间点。

{"title":"A Flexible Cluster Tool Simulation Framework with Wafer Batch Dispatching Time Recommendation","authors":"Hsin-Ping Yen, Shiuan-Hau Huang, Yan-Hsiu Liu, Kuang-Hsien Tseng, J. Kung, Yi-Ting Li, Yung-Chih Chen, Chun-Yao Wang","doi":"10.1109/ISQED57927.2023.10129375","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129375","url":null,"abstract":"The semiconductor manufacturing process consists of multiple steps and is usually time-consuming. Information like the turnaround time of a certain batch of wafers can be very useful for manufacturing engineers. A simulation model of manufacturing process can help predict the performance of manufacturing process efficiently, which is very beneficial to the manufacturing engineers. The simulation result can also deliver messages to system engineers for achieving better throughput after adjustment. In this work, we propose a flexible simulation framework for a cluster tool. We implemented the simulator in C++ language with SystemC. The batch information used for the design of simulator was gathered from industrial data. The experimental results show that there is only less than 2% difference between the simulation and the manufacturing data in terms of entire processing time, which indicates the high accuracy of the simulator. The experimental results with the proposed dispatching method achieve a higher throughput compared to the manufacturing data such that the dispatching time points can be recommended to the system engineers.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124175275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ISQED 2023 Cover Page ISQED 2023封面

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/isqed57927.2023.10129321

引用次数: 0

Automatic Subnetwork Search Through Dynamic Differentiable Neuron Pruning 基于动态可微神经元剪枝的自动子网搜索

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129379

Zigeng Wang, Bingbing Li, Xia Xiao, Tianyun Zhang, Mikhail A. Bragin, Bing Yan, Caiwen Ding, S. Rajasekaran

Locating and pruning redundant neurons from deep neural networks (DNNs) is the focal point of DNN subnetwork search. Recent advance mainly targets at pruning neuron through heuristic "hard" constraints or through penalizing neurons. However, these two methods heavily rely on expert knowledge in designing model-and-task-specific constraints and penalization, which prohibits easily applying pruning to general models. In this paper, we propose an automatic non-expert-friendly differentiable subnetwork search algorithm which dynamically adjusts the layer-wise neuron-pruning penalty based on sensitivity of Lagrangian multipliers. The idea is to introduce "soft" neuron-cardinality layer-wise constraints and then relax them through Lagrangian multipliers. The sensitivity nature of the multipliers is then exploited to iteratively determine the appropriate pruning penalization hyper-parameters during the differentiable neuron pruning procedure. In this way, the model weight, model subnetwork and layer-wise penalty hyper-parameters are simultaneously learned, relieving the prior knowledge requirements and reducing the time for trail-and-error. Results show that our method can select the state-of-the-art slim subnetwork architecture. For VGG-like on CIFAR10, more than 6× neuron compression rate is achieved without accuracy drop and without retraining. Accuracy rates of 66.3% and 57.8% are achieved for 150M and 50M FLOPs for MobileNetV1, and accuracy rates of 73.46% and 66.94% are achieved for 200M and 100M FLOPs for MobileNetV2, respectively.

从深度神经网络(DNN)中定位和修剪冗余神经元是DNN子网络搜索的重点。最近的研究进展主要集中在通过启发式“硬”约束或惩罚神经元来修剪神经元。然而，这两种方法在设计特定于模型和任务的约束和惩罚时严重依赖于专家知识，这使得对一般模型进行修剪变得不容易。本文提出了一种基于拉格朗日乘子的灵敏度动态调整分层神经元修剪惩罚的非专家友好型可微子网络自动搜索算法。这个想法是引入“软”神经元基数分层约束，然后通过拉格朗日乘子放松它们。然后利用乘法器的敏感性在可微神经元修剪过程中迭代确定适当的修剪惩罚超参数。通过这种方法，可以同时学习模型权值、模型子网和分层惩罚超参数，减轻了对先验知识的要求，减少了跟踪误差的时间。结果表明，该方法可以选择最先进的精简子网结构。对于CIFAR10上的vgg -样，在不降低准确率和不重新训练的情况下，实现了超过6倍的神经元压缩率。MobileNetV1的150M和50M FLOPs的准确率分别为66.3%和57.8%，MobileNetV2的200M和100M FLOPs的准确率分别为73.46%和66.94%。

{"title":"Automatic Subnetwork Search Through Dynamic Differentiable Neuron Pruning","authors":"Zigeng Wang, Bingbing Li, Xia Xiao, Tianyun Zhang, Mikhail A. Bragin, Bing Yan, Caiwen Ding, S. Rajasekaran","doi":"10.1109/ISQED57927.2023.10129379","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129379","url":null,"abstract":"Locating and pruning redundant neurons from deep neural networks (DNNs) is the focal point of DNN subnetwork search. Recent advance mainly targets at pruning neuron through heuristic \"hard\" constraints or through penalizing neurons. However, these two methods heavily rely on expert knowledge in designing model-and-task-specific constraints and penalization, which prohibits easily applying pruning to general models. In this paper, we propose an automatic non-expert-friendly differentiable subnetwork search algorithm which dynamically adjusts the layer-wise neuron-pruning penalty based on sensitivity of Lagrangian multipliers. The idea is to introduce \"soft\" neuron-cardinality layer-wise constraints and then relax them through Lagrangian multipliers. The sensitivity nature of the multipliers is then exploited to iteratively determine the appropriate pruning penalization hyper-parameters during the differentiable neuron pruning procedure. In this way, the model weight, model subnetwork and layer-wise penalty hyper-parameters are simultaneously learned, relieving the prior knowledge requirements and reducing the time for trail-and-error. Results show that our method can select the state-of-the-art slim subnetwork architecture. For VGG-like on CIFAR10, more than 6× neuron compression rate is achieved without accuracy drop and without retraining. Accuracy rates of 66.3% and 57.8% are achieved for 150M and 50M FLOPs for MobileNetV1, and accuracy rates of 73.46% and 66.94% are achieved for 200M and 100M FLOPs for MobileNetV2, respectively.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"27 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120841629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable Low-Cost Sorting Network with Weighted Bit-Streams 具有加权比特流的可扩展低成本排序网络

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129357

Brady Prince, M. Najafi, Bingzhe Li

Sorting is a fundamental function in many applications from data processing to database systems. For high performance, sorting-hardware based sorting designs are implemented by conventional binary or emerging stochastic computing (SC) approaches. Binary designs are fast and energy-efficient but costly to implement. SC-based designs, on the other hand, are area and power-efficient but slow and energy-hungry. So, the previous studies of the hardware-based sorting further faced scalability issues. In this work, we propose a novel scalable low-cost design for implementing sorting networks. We borrow the concept of SC for the area- and power efficiency but use weighted stochastic bit-streams to address the high latency and energy consumption issue of SC designs. A new lock and swap (LAS) unit is proposed to sort weighted bit-streams. The LAS-based sorting network can determine the result of comparing different input values early and then map the inputs to the corresponding outputs based on shorter weighted bit-streams. Experimental results show that the proposed design approach achieves much better hardware scalability than prior work. Especially, as increasing the number of inputs, the proposed scheme can reduce the energy consumption by about 3.8% - 93% compared to prior binary and SC-based designs.

从数据处理到数据库系统，排序是许多应用程序中的基本功能。为了获得高性能，基于排序硬件的排序设计是通过传统的二进制或新兴的随机计算(SC)方法实现的。二进制设计快速且节能，但实现起来代价高昂。另一方面，基于sc的设计，面积和功率效率高，但速度慢，耗能大。因此，先前的基于硬件的排序研究进一步面临可伸缩性问题。在这项工作中，我们提出了一种新的可扩展的低成本设计来实现分类网络。我们借用SC的概念来提高面积和功率效率，但使用加权随机比特流来解决SC设计的高延迟和能耗问题。提出了一种新的锁与交换(LAS)单元来对加权比特流进行排序。基于las的排序网络可以提前确定不同输入值比较的结果，然后根据较短的加权比特流将输入映射到相应的输出。实验结果表明，该方法具有较好的硬件可扩展性。特别是，随着输入数量的增加，与先前的二进制和基于sc的设计相比，所提出的方案可以减少约3.8% - 93%的能耗。

{"title":"Scalable Low-Cost Sorting Network with Weighted Bit-Streams","authors":"Brady Prince, M. Najafi, Bingzhe Li","doi":"10.1109/ISQED57927.2023.10129357","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129357","url":null,"abstract":"Sorting is a fundamental function in many applications from data processing to database systems. For high performance, sorting-hardware based sorting designs are implemented by conventional binary or emerging stochastic computing (SC) approaches. Binary designs are fast and energy-efficient but costly to implement. SC-based designs, on the other hand, are area and power-efficient but slow and energy-hungry. So, the previous studies of the hardware-based sorting further faced scalability issues. In this work, we propose a novel scalable low-cost design for implementing sorting networks. We borrow the concept of SC for the area- and power efficiency but use weighted stochastic bit-streams to address the high latency and energy consumption issue of SC designs. A new lock and swap (LAS) unit is proposed to sort weighted bit-streams. The LAS-based sorting network can determine the result of comparing different input values early and then map the inputs to the corresponding outputs based on shorter weighted bit-streams. Experimental results show that the proposed design approach achieves much better hardware scalability than prior work. Especially, as increasing the number of inputs, the proposed scheme can reduce the energy consumption by about 3.8% - 93% compared to prior binary and SC-based designs.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121886731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Hardware Performance Counter Enhanced Watchdog for Embedded Software Security 硬件性能计数器增强的嵌入式软件安全看门狗

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129291

Karl Ott, R. Mahapatra

This paper proposes a novel use of long-short term memory autoencoders coupled with a hardware watchdog timer to the enhance robustness and security of embedded software. With more and more embedded systems being rapidly deployed due to the Internet of Things boom security for embedded systems is becoming a crucial factor. The proposed technique in this paper aims to create a mechanism that can be trained in an unsupervised fashion and detect anomalous execution of embedded software. This is done through the use of long-short term memory autoencoders and a hardware watchdog timer. The proposed technique is evaluated in two scenarios: the first is for detecting generic arbitrary code execution. It can accomplish this with an average accuracy of 91%. The second scenario detecting when there is a malfunction and the program starts executing instructions randomly. It can detect this with an average of accuracy of 88%.

为了提高嵌入式软件的鲁棒性和安全性，本文提出了一种长短期记忆自编码器与硬件看门狗定时器相结合的新方法。随着物联网的蓬勃发展，越来越多的嵌入式系统被快速部署，嵌入式系统的安全性成为一个至关重要的因素。本文提出的技术旨在创建一种机制，可以以无监督的方式进行训练，并检测嵌入式软件的异常执行。这是通过使用长短期记忆自动编码器和硬件看门狗定时器来完成的。在两种情况下对所提出的技术进行了评估:第一种是检测通用的任意代码执行。它可以以91%的平均准确率完成这一任务。第二种情况是检测何时出现故障，程序开始随机执行指令。它可以以88%的平均准确率检测到这一点。

引用次数: 0

Cryogenic In-memory Binary Multiplier Using Quantum Anomalous Hall Effect Memories 利用量子反常霍尔效应存储器的低温内存二进制乘法器

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129345

Arun Govindankutty, Shamiul Alam, Sanjay Das, A. Aziz, Sumitha George

Cryogenic memory technologies are garnering attention due to their natural synergy with quantum-computing systems, space applications, and ultra-fast superconducting processors. A recently proposed device, based on a twisted bilayer graphene (tBLG) on hexagonal boron nitride(hBN) shows immense promise as a scalable cryogenic memory. This device exhibits two topologically-protected variation tolerant non-volatile resistive states governed by the quantum anomalous Hall effect (QAHE). The implied memory states are read by the direction of the Hall voltage appearing across the two terminals of the device. The four terminal structure of the device and the Hall voltage property can be utilized to design a compact memory array suitable for in-memory computing. In this work, we design a simple in-memory binary multiplier, otherwise a complex circuit with traditional technologies, by utilizing the series addition of Hall voltages in the memory array. In addition, our novel in-memory binary-multiplier does not explicitly change the memory array architecture unlike DRAM in-memory multipliers. We also demonstrate bit-wise AND operation and partial product summation using our proposed design. Compared to a cutting-edge in-memory DRAM implementation our design is highly compact and significantly reduces processing complexity. Our simulations show an ultra-low power budget of 52nW /bit multiplication. Our designs demonstrate that QAHE devices are powerful candidates for future cryogenic in-memory computing.

低温存储技术因其与量子计算系统、空间应用、超高速超导处理器的天然协同作用而备受关注。最近提出的一种基于六方氮化硼(hBN)上扭曲双层石墨烯(tBLG)的器件显示出作为可扩展低温存储器的巨大前景。该器件表现出由量子反常霍尔效应(QAHE)控制的两种拓扑保护的容变非易失性电阻态。隐含的记忆状态通过出现在器件两端的霍尔电压的方向读取。利用器件的四端结构和霍尔电压特性，可以设计出适合于内存计算的紧凑存储阵列。在这项工作中，我们设计了一个简单的内存二进制乘法器，否则是传统技术的复杂电路，利用霍尔电压在存储阵列中的串联加法。此外，我们的新型内存二进制乘法器不像DRAM内存乘法器那样显式地改变内存阵列架构。我们还使用我们提出的设计演示了位与运算和部分乘积求和。与先进的内存DRAM实现相比，我们的设计非常紧凑，显著降低了处理复杂性。我们的模拟显示了52nW /bit乘法的超低功耗预算。我们的设计表明QAHE器件是未来低温内存计算的有力候选者。

{"title":"Cryogenic In-memory Binary Multiplier Using Quantum Anomalous Hall Effect Memories","authors":"Arun Govindankutty, Shamiul Alam, Sanjay Das, A. Aziz, Sumitha George","doi":"10.1109/ISQED57927.2023.10129345","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129345","url":null,"abstract":"Cryogenic memory technologies are garnering attention due to their natural synergy with quantum-computing systems, space applications, and ultra-fast superconducting processors. A recently proposed device, based on a twisted bilayer graphene (tBLG) on hexagonal boron nitride(hBN) shows immense promise as a scalable cryogenic memory. This device exhibits two topologically-protected variation tolerant non-volatile resistive states governed by the quantum anomalous Hall effect (QAHE). The implied memory states are read by the direction of the Hall voltage appearing across the two terminals of the device. The four terminal structure of the device and the Hall voltage property can be utilized to design a compact memory array suitable for in-memory computing. In this work, we design a simple in-memory binary multiplier, otherwise a complex circuit with traditional technologies, by utilizing the series addition of Hall voltages in the memory array. In addition, our novel in-memory binary-multiplier does not explicitly change the memory array architecture unlike DRAM in-memory multipliers. We also demonstrate bit-wise AND operation and partial product summation using our proposed design. Compared to a cutting-edge in-memory DRAM implementation our design is highly compact and significantly reduces processing complexity. Our simulations show an ultra-low power budget of 52nW /bit multiplication. Our designs demonstrate that QAHE devices are powerful candidates for future cryogenic in-memory computing.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126471367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Design and Evaluation of multipliers for hardware accelerated on-chip EdDSA 硬件加速片上EdDSA乘法器的设计与评价

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129381

Harshita Gupta, Mayank Kabra, Nitin D. Patwari, C. PrashanthH., M. Rao

The paper presents optimized implementations of Edwards curve digital signature algorithm (EdDSA) which is based on a popular Ed25519 instance. When compared to current digital signature methods, this algorithm considerably reduces the execution time without compromising security. Despite being used in several popular applications, hardware implementation and characteristics is not reported. The proposed work aims to characterize on-chip EdDSA using four different state-of-the-art (SOTA) multipliers. Multiplier forms critical design component in the EdDSA implementation, hence different SOTA multipliers are characterized for hardware metrics and its impact on the overall EdDSA module is investigated. Four different multipliers in the form of Conventional polynomial (CA), Karat-suba (KA), overlap-free-Karatsuba (OKA), overlap-free based multilpier strategy (OBS), along with the default array multiplier which are traditionally employed in hardware designs were investigated for 32-bit and 64-bit data format individually. These multipliers were further employed for designing on-chip EdDSA and its characteristics are presented. CA based on-chip EdDSA was characterized to work reliably at a maximum operating frequency of 120 MHz, whereas OBS and OKA derived on-chip EdDSA presented the most compact on-chip designs. The on-chip EdDSA work is a step towards attaining reliable on-chip cryptosystems in the future.

本文提出了一种基于Ed25519实例的爱德华兹曲线数字签名算法(EdDSA)。与当前的数字签名方法相比，该算法在不影响安全性的情况下大大缩短了执行时间。尽管在一些流行的应用程序中使用，但没有报道硬件实现和特性。提出的工作旨在使用四种不同的最先进的(SOTA)乘法器来表征片上EdDSA。乘法器是EdDSA实现中的关键设计组件，因此不同的SOTA乘法器具有硬件指标的特征，并研究了其对整个EdDSA模块的影响。针对32位和64位数据格式，分别研究了传统多项式(CA)、卡拉suba (KA)、无重叠卡拉suba (OKA)、基于无重叠的乘法器策略(OBS)四种不同的乘法器，以及传统硬件设计中使用的默认阵列乘法器。将这些乘法器进一步应用于片上EdDSA的设计，并介绍了其特性。基于CA的片上EdDSA的最大工作频率为120 MHz，而基于OBS和OKA的片上EdDSA具有最紧凑的片上设计。片上EdDSA工作是未来实现可靠片上密码系统的一步。

{"title":"Design and Evaluation of multipliers for hardware accelerated on-chip EdDSA","authors":"Harshita Gupta, Mayank Kabra, Nitin D. Patwari, C. PrashanthH., M. Rao","doi":"10.1109/ISQED57927.2023.10129381","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129381","url":null,"abstract":"The paper presents optimized implementations of Edwards curve digital signature algorithm (EdDSA) which is based on a popular Ed25519 instance. When compared to current digital signature methods, this algorithm considerably reduces the execution time without compromising security. Despite being used in several popular applications, hardware implementation and characteristics is not reported. The proposed work aims to characterize on-chip EdDSA using four different state-of-the-art (SOTA) multipliers. Multiplier forms critical design component in the EdDSA implementation, hence different SOTA multipliers are characterized for hardware metrics and its impact on the overall EdDSA module is investigated. Four different multipliers in the form of Conventional polynomial (CA), Karat-suba (KA), overlap-free-Karatsuba (OKA), overlap-free based multilpier strategy (OBS), along with the default array multiplier which are traditionally employed in hardware designs were investigated for 32-bit and 64-bit data format individually. These multipliers were further employed for designing on-chip EdDSA and its characteristics are presented. CA based on-chip EdDSA was characterized to work reliably at a maximum operating frequency of 120 MHz, whereas OBS and OKA derived on-chip EdDSA presented the most compact on-chip designs. The on-chip EdDSA work is a step towards attaining reliable on-chip cryptosystems in the future.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125731846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Fast Electromigration Simulation for Chip Power Grids 芯片电网的快速电迁移仿真

2023 24th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2023-04-05 DOI: 10.1109/ISQED57927.2023.10129371

B. Shahriari, F. Najm

Electromigration (EM) continues to be a serious concern for large chip design. We are focused on EM in the on-chip power grid, because grid lines carry mostly unidirectional currents and because of the very large sizes of modern grids. In the last few years, the capability to simulate EM has become available by simulating the stress in metal lines, which is the main cause of EM-induced failures. In this work, we have improved on the state of the art by developing a new EM simulator that is both faster and has better features than previous work. The work builds on recent results on the equivalence between stress and voltage, and introduces both a model reduction technique that provides up to 4.2X speedup, and a very efficient method for updating the grid currents during the void growth phase.

电迁移(EM)仍然是大型芯片设计的一个严重问题。我们专注于芯片上电网中的电磁，因为电网线路大多携带单向电流，而且现代电网的规模非常大。在过去的几年里，通过模拟金属线中的应力来模拟电磁的能力已经成为可能，金属线中的应力是电磁诱发失效的主要原因。在这项工作中，我们通过开发一种新的EM模拟器来改进目前的技术水平，该模拟器比以前的工作更快，并且具有更好的功能。这项工作建立在应力和电压之间等效性的最新研究结果的基础上，并引入了一种提供高达4.2倍加速的模型简化技术，以及一种在空隙生长阶段非常有效地更新网格电流的方法。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 24th International Symposium on Quality Electronic Design (ISQED)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀