Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129360
Ashvinikumar Dongre, G. Trivedi
Resistive Random Access Memory (RRAM) is extensively used for the implementation of synapses. Even though a fresh metal oxide RRAM sampled in the pristine state cannot exhibit resistive switching before electroforming, the integration of the electroforming circuit in RRAM based applications has not been discussed thoroughly. A major challenge in integrating forming circuits is the high voltage required for the forming process. The 4T-1R structure used for the implementation extends the applicability of the array to inference as well as training. The ADCs used to convert the RRAM current to digital output consume lots of area and power. They also suffer from nonlinearity that needs special attention, increasing the design complexity. In this work, we present an RRAM array with a circuit designed to isolate the peripheral circuitry during forming to avoid malfunctioning. We also propose an RRAM current sensor circuit that converts the RRAM current to output pulses that are converted to digital output. Since there is a large gap between the two resistive states, the synapse is tolerant to 25% cycle-to-cycle and device-to-device variation. We test the functionality of the array in the presence of Random Telegraph Noise (RTN) that is inherent to RRAM. The compliance current for the proposed design is 100 µA. The proposed RRAM array is 2.7× more energy efficient than the recent state-of-the-art designs. The area of the RRAM current sensor circuit is 18.1µm × 27.3µm.
{"title":"Binary Synaptic Array for Inference and Training with Built-in RRAM Electroforming Circuit","authors":"Ashvinikumar Dongre, G. Trivedi","doi":"10.1109/ISQED57927.2023.10129360","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129360","url":null,"abstract":"Resistive Random Access Memory (RRAM) is extensively used for the implementation of synapses. Even though a fresh metal oxide RRAM sampled in the pristine state cannot exhibit resistive switching before electroforming, the integration of the electroforming circuit in RRAM based applications has not been discussed thoroughly. A major challenge in integrating forming circuits is the high voltage required for the forming process. The 4T-1R structure used for the implementation extends the applicability of the array to inference as well as training. The ADCs used to convert the RRAM current to digital output consume lots of area and power. They also suffer from nonlinearity that needs special attention, increasing the design complexity. In this work, we present an RRAM array with a circuit designed to isolate the peripheral circuitry during forming to avoid malfunctioning. We also propose an RRAM current sensor circuit that converts the RRAM current to output pulses that are converted to digital output. Since there is a large gap between the two resistive states, the synapse is tolerant to 25% cycle-to-cycle and device-to-device variation. We test the functionality of the array in the presence of Random Telegraph Noise (RTN) that is inherent to RRAM. The compliance current for the proposed design is 100 µA. The proposed RRAM array is 2.7× more energy efficient than the recent state-of-the-art designs. The area of the RRAM current sensor circuit is 18.1µm × 27.3µm.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132258725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129320
Pengzhou He, Jiafeng Xie
The rapid advancement in quantum technology has initiated a new round of exploration of efficient implementation of post-quantum cryptography (PQC) on hardware platforms. Key encapsulation mechanism (KEM) Saber, a module lattice-based PQC, is one of the four encryption scheme finalists in the third-round National Institute of Standards and Technology (NIST) standardization process. In this paper, we propose a novel Toeplitz Matrix-Vector Product (TMVP)-based design strategy to efficiently implement polynomial multiplication (essential arithmetic operation) for KEM Saber. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the polynomial multiplication of KEM Saber into a desired mathematical form for further developing into the proposed TMVP-based algorithm for high-performance operation; (ii) then, we have followed the proposed TMVP-based algorithm to innovatively transfer the derived algorithm into a unified polynomial multiplication structure (fits all security ranks) with the help of a series of algorithm-to-architecture co-implementation/mapping techniques; (iii) finally, detailed implementation results and complexity analysis have confirmed the efficiency of the proposed TMVP design strategy. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design has at least less 30.92% area-delay product (ADP) than the competing ones.
{"title":"Novel Implementation of High-Performance Polynomial Multiplication for Unified KEM Saber based on TMVP Design Strategy","authors":"Pengzhou He, Jiafeng Xie","doi":"10.1109/ISQED57927.2023.10129320","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129320","url":null,"abstract":"The rapid advancement in quantum technology has initiated a new round of exploration of efficient implementation of post-quantum cryptography (PQC) on hardware platforms. Key encapsulation mechanism (KEM) Saber, a module lattice-based PQC, is one of the four encryption scheme finalists in the third-round National Institute of Standards and Technology (NIST) standardization process. In this paper, we propose a novel Toeplitz Matrix-Vector Product (TMVP)-based design strategy to efficiently implement polynomial multiplication (essential arithmetic operation) for KEM Saber. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the polynomial multiplication of KEM Saber into a desired mathematical form for further developing into the proposed TMVP-based algorithm for high-performance operation; (ii) then, we have followed the proposed TMVP-based algorithm to innovatively transfer the derived algorithm into a unified polynomial multiplication structure (fits all security ranks) with the help of a series of algorithm-to-architecture co-implementation/mapping techniques; (iii) finally, detailed implementation results and complexity analysis have confirmed the efficiency of the proposed TMVP design strategy. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design has at least less 30.92% area-delay product (ADP) than the competing ones.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132433384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129375
Hsin-Ping Yen, Shiuan-Hau Huang, Yan-Hsiu Liu, Kuang-Hsien Tseng, J. Kung, Yi-Ting Li, Yung-Chih Chen, Chun-Yao Wang
The semiconductor manufacturing process consists of multiple steps and is usually time-consuming. Information like the turnaround time of a certain batch of wafers can be very useful for manufacturing engineers. A simulation model of manufacturing process can help predict the performance of manufacturing process efficiently, which is very beneficial to the manufacturing engineers. The simulation result can also deliver messages to system engineers for achieving better throughput after adjustment. In this work, we propose a flexible simulation framework for a cluster tool. We implemented the simulator in C++ language with SystemC. The batch information used for the design of simulator was gathered from industrial data. The experimental results show that there is only less than 2% difference between the simulation and the manufacturing data in terms of entire processing time, which indicates the high accuracy of the simulator. The experimental results with the proposed dispatching method achieve a higher throughput compared to the manufacturing data such that the dispatching time points can be recommended to the system engineers.
{"title":"A Flexible Cluster Tool Simulation Framework with Wafer Batch Dispatching Time Recommendation","authors":"Hsin-Ping Yen, Shiuan-Hau Huang, Yan-Hsiu Liu, Kuang-Hsien Tseng, J. Kung, Yi-Ting Li, Yung-Chih Chen, Chun-Yao Wang","doi":"10.1109/ISQED57927.2023.10129375","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129375","url":null,"abstract":"The semiconductor manufacturing process consists of multiple steps and is usually time-consuming. Information like the turnaround time of a certain batch of wafers can be very useful for manufacturing engineers. A simulation model of manufacturing process can help predict the performance of manufacturing process efficiently, which is very beneficial to the manufacturing engineers. The simulation result can also deliver messages to system engineers for achieving better throughput after adjustment. In this work, we propose a flexible simulation framework for a cluster tool. We implemented the simulator in C++ language with SystemC. The batch information used for the design of simulator was gathered from industrial data. The experimental results show that there is only less than 2% difference between the simulation and the manufacturing data in terms of entire processing time, which indicates the high accuracy of the simulator. The experimental results with the proposed dispatching method achieve a higher throughput compared to the manufacturing data such that the dispatching time points can be recommended to the system engineers.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124175275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129379
Zigeng Wang, Bingbing Li, Xia Xiao, Tianyun Zhang, Mikhail A. Bragin, Bing Yan, Caiwen Ding, S. Rajasekaran
Locating and pruning redundant neurons from deep neural networks (DNNs) is the focal point of DNN subnetwork search. Recent advance mainly targets at pruning neuron through heuristic "hard" constraints or through penalizing neurons. However, these two methods heavily rely on expert knowledge in designing model-and-task-specific constraints and penalization, which prohibits easily applying pruning to general models. In this paper, we propose an automatic non-expert-friendly differentiable subnetwork search algorithm which dynamically adjusts the layer-wise neuron-pruning penalty based on sensitivity of Lagrangian multipliers. The idea is to introduce "soft" neuron-cardinality layer-wise constraints and then relax them through Lagrangian multipliers. The sensitivity nature of the multipliers is then exploited to iteratively determine the appropriate pruning penalization hyper-parameters during the differentiable neuron pruning procedure. In this way, the model weight, model subnetwork and layer-wise penalty hyper-parameters are simultaneously learned, relieving the prior knowledge requirements and reducing the time for trail-and-error. Results show that our method can select the state-of-the-art slim subnetwork architecture. For VGG-like on CIFAR10, more than 6× neuron compression rate is achieved without accuracy drop and without retraining. Accuracy rates of 66.3% and 57.8% are achieved for 150M and 50M FLOPs for MobileNetV1, and accuracy rates of 73.46% and 66.94% are achieved for 200M and 100M FLOPs for MobileNetV2, respectively.
{"title":"Automatic Subnetwork Search Through Dynamic Differentiable Neuron Pruning","authors":"Zigeng Wang, Bingbing Li, Xia Xiao, Tianyun Zhang, Mikhail A. Bragin, Bing Yan, Caiwen Ding, S. Rajasekaran","doi":"10.1109/ISQED57927.2023.10129379","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129379","url":null,"abstract":"Locating and pruning redundant neurons from deep neural networks (DNNs) is the focal point of DNN subnetwork search. Recent advance mainly targets at pruning neuron through heuristic \"hard\" constraints or through penalizing neurons. However, these two methods heavily rely on expert knowledge in designing model-and-task-specific constraints and penalization, which prohibits easily applying pruning to general models. In this paper, we propose an automatic non-expert-friendly differentiable subnetwork search algorithm which dynamically adjusts the layer-wise neuron-pruning penalty based on sensitivity of Lagrangian multipliers. The idea is to introduce \"soft\" neuron-cardinality layer-wise constraints and then relax them through Lagrangian multipliers. The sensitivity nature of the multipliers is then exploited to iteratively determine the appropriate pruning penalization hyper-parameters during the differentiable neuron pruning procedure. In this way, the model weight, model subnetwork and layer-wise penalty hyper-parameters are simultaneously learned, relieving the prior knowledge requirements and reducing the time for trail-and-error. Results show that our method can select the state-of-the-art slim subnetwork architecture. For VGG-like on CIFAR10, more than 6× neuron compression rate is achieved without accuracy drop and without retraining. Accuracy rates of 66.3% and 57.8% are achieved for 150M and 50M FLOPs for MobileNetV1, and accuracy rates of 73.46% and 66.94% are achieved for 200M and 100M FLOPs for MobileNetV2, respectively.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"27 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120841629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129357
Brady Prince, M. Najafi, Bingzhe Li
Sorting is a fundamental function in many applications from data processing to database systems. For high performance, sorting-hardware based sorting designs are implemented by conventional binary or emerging stochastic computing (SC) approaches. Binary designs are fast and energy-efficient but costly to implement. SC-based designs, on the other hand, are area and power-efficient but slow and energy-hungry. So, the previous studies of the hardware-based sorting further faced scalability issues. In this work, we propose a novel scalable low-cost design for implementing sorting networks. We borrow the concept of SC for the area- and power efficiency but use weighted stochastic bit-streams to address the high latency and energy consumption issue of SC designs. A new lock and swap (LAS) unit is proposed to sort weighted bit-streams. The LAS-based sorting network can determine the result of comparing different input values early and then map the inputs to the corresponding outputs based on shorter weighted bit-streams. Experimental results show that the proposed design approach achieves much better hardware scalability than prior work. Especially, as increasing the number of inputs, the proposed scheme can reduce the energy consumption by about 3.8% - 93% compared to prior binary and SC-based designs.
{"title":"Scalable Low-Cost Sorting Network with Weighted Bit-Streams","authors":"Brady Prince, M. Najafi, Bingzhe Li","doi":"10.1109/ISQED57927.2023.10129357","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129357","url":null,"abstract":"Sorting is a fundamental function in many applications from data processing to database systems. For high performance, sorting-hardware based sorting designs are implemented by conventional binary or emerging stochastic computing (SC) approaches. Binary designs are fast and energy-efficient but costly to implement. SC-based designs, on the other hand, are area and power-efficient but slow and energy-hungry. So, the previous studies of the hardware-based sorting further faced scalability issues. In this work, we propose a novel scalable low-cost design for implementing sorting networks. We borrow the concept of SC for the area- and power efficiency but use weighted stochastic bit-streams to address the high latency and energy consumption issue of SC designs. A new lock and swap (LAS) unit is proposed to sort weighted bit-streams. The LAS-based sorting network can determine the result of comparing different input values early and then map the inputs to the corresponding outputs based on shorter weighted bit-streams. Experimental results show that the proposed design approach achieves much better hardware scalability than prior work. Especially, as increasing the number of inputs, the proposed scheme can reduce the energy consumption by about 3.8% - 93% compared to prior binary and SC-based designs.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121886731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129291
Karl Ott, R. Mahapatra
This paper proposes a novel use of long-short term memory autoencoders coupled with a hardware watchdog timer to the enhance robustness and security of embedded software. With more and more embedded systems being rapidly deployed due to the Internet of Things boom security for embedded systems is becoming a crucial factor. The proposed technique in this paper aims to create a mechanism that can be trained in an unsupervised fashion and detect anomalous execution of embedded software. This is done through the use of long-short term memory autoencoders and a hardware watchdog timer. The proposed technique is evaluated in two scenarios: the first is for detecting generic arbitrary code execution. It can accomplish this with an average accuracy of 91%. The second scenario detecting when there is a malfunction and the program starts executing instructions randomly. It can detect this with an average of accuracy of 88%.
{"title":"Hardware Performance Counter Enhanced Watchdog for Embedded Software Security","authors":"Karl Ott, R. Mahapatra","doi":"10.1109/ISQED57927.2023.10129291","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129291","url":null,"abstract":"This paper proposes a novel use of long-short term memory autoencoders coupled with a hardware watchdog timer to the enhance robustness and security of embedded software. With more and more embedded systems being rapidly deployed due to the Internet of Things boom security for embedded systems is becoming a crucial factor. The proposed technique in this paper aims to create a mechanism that can be trained in an unsupervised fashion and detect anomalous execution of embedded software. This is done through the use of long-short term memory autoencoders and a hardware watchdog timer. The proposed technique is evaluated in two scenarios: the first is for detecting generic arbitrary code execution. It can accomplish this with an average accuracy of 91%. The second scenario detecting when there is a malfunction and the program starts executing instructions randomly. It can detect this with an average of accuracy of 88%.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129885913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129345
Arun Govindankutty, Shamiul Alam, Sanjay Das, A. Aziz, Sumitha George
Cryogenic memory technologies are garnering attention due to their natural synergy with quantum-computing systems, space applications, and ultra-fast superconducting processors. A recently proposed device, based on a twisted bilayer graphene (tBLG) on hexagonal boron nitride(hBN) shows immense promise as a scalable cryogenic memory. This device exhibits two topologically-protected variation tolerant non-volatile resistive states governed by the quantum anomalous Hall effect (QAHE). The implied memory states are read by the direction of the Hall voltage appearing across the two terminals of the device. The four terminal structure of the device and the Hall voltage property can be utilized to design a compact memory array suitable for in-memory computing. In this work, we design a simple in-memory binary multiplier, otherwise a complex circuit with traditional technologies, by utilizing the series addition of Hall voltages in the memory array. In addition, our novel in-memory binary-multiplier does not explicitly change the memory array architecture unlike DRAM in-memory multipliers. We also demonstrate bit-wise AND operation and partial product summation using our proposed design. Compared to a cutting-edge in-memory DRAM implementation our design is highly compact and significantly reduces processing complexity. Our simulations show an ultra-low power budget of 52nW /bit multiplication. Our designs demonstrate that QAHE devices are powerful candidates for future cryogenic in-memory computing.
{"title":"Cryogenic In-memory Binary Multiplier Using Quantum Anomalous Hall Effect Memories","authors":"Arun Govindankutty, Shamiul Alam, Sanjay Das, A. Aziz, Sumitha George","doi":"10.1109/ISQED57927.2023.10129345","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129345","url":null,"abstract":"Cryogenic memory technologies are garnering attention due to their natural synergy with quantum-computing systems, space applications, and ultra-fast superconducting processors. A recently proposed device, based on a twisted bilayer graphene (tBLG) on hexagonal boron nitride(hBN) shows immense promise as a scalable cryogenic memory. This device exhibits two topologically-protected variation tolerant non-volatile resistive states governed by the quantum anomalous Hall effect (QAHE). The implied memory states are read by the direction of the Hall voltage appearing across the two terminals of the device. The four terminal structure of the device and the Hall voltage property can be utilized to design a compact memory array suitable for in-memory computing. In this work, we design a simple in-memory binary multiplier, otherwise a complex circuit with traditional technologies, by utilizing the series addition of Hall voltages in the memory array. In addition, our novel in-memory binary-multiplier does not explicitly change the memory array architecture unlike DRAM in-memory multipliers. We also demonstrate bit-wise AND operation and partial product summation using our proposed design. Compared to a cutting-edge in-memory DRAM implementation our design is highly compact and significantly reduces processing complexity. Our simulations show an ultra-low power budget of 52nW /bit multiplication. Our designs demonstrate that QAHE devices are powerful candidates for future cryogenic in-memory computing.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126471367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129381
Harshita Gupta, Mayank Kabra, Nitin D. Patwari, C. PrashanthH., M. Rao
The paper presents optimized implementations of Edwards curve digital signature algorithm (EdDSA) which is based on a popular Ed25519 instance. When compared to current digital signature methods, this algorithm considerably reduces the execution time without compromising security. Despite being used in several popular applications, hardware implementation and characteristics is not reported. The proposed work aims to characterize on-chip EdDSA using four different state-of-the-art (SOTA) multipliers. Multiplier forms critical design component in the EdDSA implementation, hence different SOTA multipliers are characterized for hardware metrics and its impact on the overall EdDSA module is investigated. Four different multipliers in the form of Conventional polynomial (CA), Karat-suba (KA), overlap-free-Karatsuba (OKA), overlap-free based multilpier strategy (OBS), along with the default array multiplier which are traditionally employed in hardware designs were investigated for 32-bit and 64-bit data format individually. These multipliers were further employed for designing on-chip EdDSA and its characteristics are presented. CA based on-chip EdDSA was characterized to work reliably at a maximum operating frequency of 120 MHz, whereas OBS and OKA derived on-chip EdDSA presented the most compact on-chip designs. The on-chip EdDSA work is a step towards attaining reliable on-chip cryptosystems in the future.
{"title":"Design and Evaluation of multipliers for hardware accelerated on-chip EdDSA","authors":"Harshita Gupta, Mayank Kabra, Nitin D. Patwari, C. PrashanthH., M. Rao","doi":"10.1109/ISQED57927.2023.10129381","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129381","url":null,"abstract":"The paper presents optimized implementations of Edwards curve digital signature algorithm (EdDSA) which is based on a popular Ed25519 instance. When compared to current digital signature methods, this algorithm considerably reduces the execution time without compromising security. Despite being used in several popular applications, hardware implementation and characteristics is not reported. The proposed work aims to characterize on-chip EdDSA using four different state-of-the-art (SOTA) multipliers. Multiplier forms critical design component in the EdDSA implementation, hence different SOTA multipliers are characterized for hardware metrics and its impact on the overall EdDSA module is investigated. Four different multipliers in the form of Conventional polynomial (CA), Karat-suba (KA), overlap-free-Karatsuba (OKA), overlap-free based multilpier strategy (OBS), along with the default array multiplier which are traditionally employed in hardware designs were investigated for 32-bit and 64-bit data format individually. These multipliers were further employed for designing on-chip EdDSA and its characteristics are presented. CA based on-chip EdDSA was characterized to work reliably at a maximum operating frequency of 120 MHz, whereas OBS and OKA derived on-chip EdDSA presented the most compact on-chip designs. The on-chip EdDSA work is a step towards attaining reliable on-chip cryptosystems in the future.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125731846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129371
B. Shahriari, F. Najm
Electromigration (EM) continues to be a serious concern for large chip design. We are focused on EM in the on-chip power grid, because grid lines carry mostly unidirectional currents and because of the very large sizes of modern grids. In the last few years, the capability to simulate EM has become available by simulating the stress in metal lines, which is the main cause of EM-induced failures. In this work, we have improved on the state of the art by developing a new EM simulator that is both faster and has better features than previous work. The work builds on recent results on the equivalence between stress and voltage, and introduces both a model reduction technique that provides up to 4.2X speedup, and a very efficient method for updating the grid currents during the void growth phase.
{"title":"Fast Electromigration Simulation for Chip Power Grids","authors":"B. Shahriari, F. Najm","doi":"10.1109/ISQED57927.2023.10129371","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129371","url":null,"abstract":"Electromigration (EM) continues to be a serious concern for large chip design. We are focused on EM in the on-chip power grid, because grid lines carry mostly unidirectional currents and because of the very large sizes of modern grids. In the last few years, the capability to simulate EM has become available by simulating the stress in metal lines, which is the main cause of EM-induced failures. In this work, we have improved on the state of the art by developing a new EM simulator that is both faster and has better features than previous work. The work builds on recent results on the equivalence between stress and voltage, and introduces both a model reduction technique that provides up to 4.2X speedup, and a very efficient method for updating the grid currents during the void growth phase.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116638410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}