Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129307
Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao
In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.
{"title":"eDRAM-OESP: A novel performance efficient in-embedded-DRAM-compute design for on-edge signal processing application","authors":"Mayank Kabra, C. PrashanthH., Kedar Deshpande, M. Rao","doi":"10.1109/ISQED57927.2023.10129307","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129307","url":null,"abstract":"In-Memory-Computing (IMC) architectures allow arithmetic and logical functionalities around the memory arrays to effectively use the memory bandwidth and avoid frequent data movement to the processor. As expected, the IMC architecture leads to high throughput performance and significant energy savings primarily due to less workload moving data from memory to the computing core. Embedded DRAM (eDRAM), composed of 1-transistor, 1-capacitor (1T1C) bit cell with logic block enables computing with benefits in terms of power savings and high performance, favorable for embedded computing engines. The work proposes a novel in-eDRAM-compute design employing a 1T1C eDRAM cell with the bit-serial computation that targets 3x throughput efficiency by arranging the operand bits in an interleaved manner. The interleaved eDRAM architecture enables to employ reading corresponding bits of multiple operands from the memory cells at the same time, and also allows to write back post computing in the same activate window, thereby saving on the multiple precharge and activate cycles. Additionally, the interleaved architecture allows pipelining the continuously arriving digitized signal and processes the same. The computing block in the form of a 1-bit adder with a multiplexer unit is optimized for different hardware metrics such as delay, power, and product of power-and-delay (PDP) for adopting the design per the specifications.The eDRAM-based efficient computing design is evaluated for 1-bit adder and further characterized for 8-bit, and 16bit adders, multipliers, and 1-D convolution of varying filter sizes. The proposed design exhibited improvement in computing time by 31% for 16-bit addition and 30.6% for 8-bit addition over the existing state-of-the-art work. The bit-serial in-eDRAM-compute design achieved the best performance of 2.5 ms of computing time and 120 nJ of energy for performing a 1-D convolution operation. The in-eDRAM-compute design is a step towards designing embedded memory with convolutional neural network (CNN) compute capability for customized real-time edge inferencing applications.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121215472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129363
Jaspinder Kaur, Shirshendu Das
The Last Level Cache (LLC) of modern multicore processors is normally shared between different cores and applications. Dynamic cache partitioning is applied to the LLC for fairly distributing the LLC space among the applications. Recently, Covert Channel Attacks (CCA) becomes a major security issue for modern multicore systems. In CCA, two malicious applications: spy and Trojan, run in two different cores. Trojan normally runs in a secure core and knows some secret information. Through CCA, Trojan communicates this information to the spy. A well-known technique to perform such an attack is Prime Probe (P+P). It performs the attack by using the shared behavior of LLC space. Cache partitioning is considered a defense against such CCA. Partitioning makes the applications isolated in the LLC and they cannot evict each other block from the LLC. Hence, the existing P+P based attacks are not possible while dynamic partitioning is applied to LLC. However, in this work, we have proposed a modified CCA (based on P+P) which can establish a covert channel on top of the dynamic cache partitioning technique applied to LLC. Such kinds of attacks must need to be handled carefully in modern processors. A possible defense mechanism for the new attack is also discussed in this paper.
{"title":"ACPC: Covert Channel Attack on Last Level Cache using Dynamic Cache Partitioning","authors":"Jaspinder Kaur, Shirshendu Das","doi":"10.1109/ISQED57927.2023.10129363","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129363","url":null,"abstract":"The Last Level Cache (LLC) of modern multicore processors is normally shared between different cores and applications. Dynamic cache partitioning is applied to the LLC for fairly distributing the LLC space among the applications. Recently, Covert Channel Attacks (CCA) becomes a major security issue for modern multicore systems. In CCA, two malicious applications: spy and Trojan, run in two different cores. Trojan normally runs in a secure core and knows some secret information. Through CCA, Trojan communicates this information to the spy. A well-known technique to perform such an attack is Prime Probe (P+P). It performs the attack by using the shared behavior of LLC space. Cache partitioning is considered a defense against such CCA. Partitioning makes the applications isolated in the LLC and they cannot evict each other block from the LLC. Hence, the existing P+P based attacks are not possible while dynamic partitioning is applied to LLC. However, in this work, we have proposed a modified CCA (based on P+P) which can establish a covert channel on top of the dynamic cache partitioning technique applied to LLC. Such kinds of attacks must need to be handled carefully in modern processors. A possible defense mechanism for the new attack is also discussed in this paper.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124517840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129359
Zhiyao Xie, Tao Zhang, Yifeng Peng
The growing IC complexity has led to a compelling need for design efficiency improvement through new electronic design automation (EDA) methodologies. In recent years, many innovative machine learning (ML)-based solutions have been proposed for EDA applications. While these ML solutions demonstrate great potential in the circuit design flow, however, the hidden security and model reliability problems are rarely discussed until recently. In this paper, we present some latest research advances in the security and reliability challenges in ML for EDA.
{"title":"Security and Reliability Challenges in Machine Learning for EDA: Latest Advances","authors":"Zhiyao Xie, Tao Zhang, Yifeng Peng","doi":"10.1109/ISQED57927.2023.10129359","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129359","url":null,"abstract":"The growing IC complexity has led to a compelling need for design efficiency improvement through new electronic design automation (EDA) methodologies. In recent years, many innovative machine learning (ML)-based solutions have been proposed for EDA applications. While these ML solutions demonstrate great potential in the circuit design flow, however, the hidden security and model reliability problems are rarely discussed until recently. In this paper, we present some latest research advances in the security and reliability challenges in ML for EDA.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"618 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123067898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129330
Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, M. Meijer, W. Dehaene, M. Verhelst
Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switching the dataflow from one layer to the next layer within one DNN model can result in hardware inefficiencies stemming from memory data layout mismatch among the layers. Unfortunately, all existing frameworks treat each layer independently and typically model memories as black boxes (one large monolithic wide memory), which ignores the data layout and can not deal with the data layout dependencies of sequential layers. These frameworks are not capable of doing dataflow cross-layer optimization. This work, hence, aims at cross-layer dataflow optimization, taking the data dependency and data layout reshuffling overheads among layers into account. Additionally, we propose to exploit the multibank memories typically present in modern DNN accelerators towards efficiently reshuffling data to support more dataflow at low overhead. These innovations are supported through the Cross-layer Memory-aware Dataflow Scheduler (CMDS). CMDS can model DNN execution energy/latency while considering the different data layout requirements due to the varied optimal dataflow of layers. Compared with the state-of-the-art (SOTA), which performs layer-optimized memory-unaware scheduling, CMDS achieves up to 5.5× energy reduction and 1.35× latency reduction with negligible hardware cost.
{"title":"CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories","authors":"Man Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, M. Meijer, W. Dehaene, M. Verhelst","doi":"10.1109/ISQED57927.2023.10129330","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129330","url":null,"abstract":"Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single \"dataflow\" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switching the dataflow from one layer to the next layer within one DNN model can result in hardware inefficiencies stemming from memory data layout mismatch among the layers. Unfortunately, all existing frameworks treat each layer independently and typically model memories as black boxes (one large monolithic wide memory), which ignores the data layout and can not deal with the data layout dependencies of sequential layers. These frameworks are not capable of doing dataflow cross-layer optimization. This work, hence, aims at cross-layer dataflow optimization, taking the data dependency and data layout reshuffling overheads among layers into account. Additionally, we propose to exploit the multibank memories typically present in modern DNN accelerators towards efficiently reshuffling data to support more dataflow at low overhead. These innovations are supported through the Cross-layer Memory-aware Dataflow Scheduler (CMDS). CMDS can model DNN execution energy/latency while considering the different data layout requirements due to the varied optimal dataflow of layers. Compared with the state-of-the-art (SOTA), which performs layer-optimized memory-unaware scheduling, CMDS achieves up to 5.5× energy reduction and 1.35× latency reduction with negligible hardware cost.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122085485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129285
M. Vemuri, Umamaheswara Rao Tida
Metal inter-layer via (MIV) in Monolithic three-dimensional integrated circuits (M3D-IC) is used to connect inter-layer devices and provide power and clock signals across multiple layers. The size of MIV is comparable to logic gates because of the significant reduction in substrate layers due to sequential integration. Despite MIV’s small size, the impact of MIV on the performance of adjacent devices should be considered to implement IC designs in M3D-IC technology. In this work, we systematically study the changes in performance of transistors when they are placed near MIV to understand the effect of MIV on adjacent devices when MIV passes through the substrate. Simulation results suggest that the keep-out-zone (KOZ) for MIV should be considered to ensure the reliability of M3DIC technology and this KOZ is highly dependent on the M3DIC process. In this paper, we show that the transistor placed near MIV considering the M1 metal pitch as the separation will have up to 68, 668× increase in leakage current, when the channel doping is 1015cm−3, source/drain doping of 1018cm−3 and substrate layer height of 100 nm. We also show that, this increase in leakage current can also be reduced significantly by having KOZ around MIV, which is dependent on the process.
{"title":"Metal Inter-layer Via Keep-out-zone in M3D IC: A Critical Process-aware Design Consideration","authors":"M. Vemuri, Umamaheswara Rao Tida","doi":"10.1109/ISQED57927.2023.10129285","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129285","url":null,"abstract":"Metal inter-layer via (MIV) in Monolithic three-dimensional integrated circuits (M3D-IC) is used to connect inter-layer devices and provide power and clock signals across multiple layers. The size of MIV is comparable to logic gates because of the significant reduction in substrate layers due to sequential integration. Despite MIV’s small size, the impact of MIV on the performance of adjacent devices should be considered to implement IC designs in M3D-IC technology. In this work, we systematically study the changes in performance of transistors when they are placed near MIV to understand the effect of MIV on adjacent devices when MIV passes through the substrate. Simulation results suggest that the keep-out-zone (KOZ) for MIV should be considered to ensure the reliability of M3DIC technology and this KOZ is highly dependent on the M3DIC process. In this paper, we show that the transistor placed near MIV considering the M1 metal pitch as the separation will have up to 68, 668× increase in leakage current, when the channel doping is 1015cm−3, source/drain doping of 1018cm−3 and substrate layer height of 100 nm. We also show that, this increase in leakage current can also be reduced significantly by having KOZ around MIV, which is dependent on the process.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114069743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129284
Juneet Kumar Meka, R. Vemuri
EDA methods in general and hardware security applications in particular require careful benchmarking covering nominal and corner cases of various design parameters. Attributed graph grammars have been used for generating interesting and constraint-satisfying structures in various domains of design. This paper shows how attributed graph transformation systems can be effectively adapted to automatically generate synthetic circuit structures that meet arbitrary constraints on various design parameters and how the method is flexible and scalable. We discuss the method in detail and demonstrate its utility for an example hardware security application, the sequential satisfiability attack.
{"title":"Attributed Graph Transformation for Generating Synthetic Benchmarks for Hardware Security","authors":"Juneet Kumar Meka, R. Vemuri","doi":"10.1109/ISQED57927.2023.10129284","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129284","url":null,"abstract":"EDA methods in general and hardware security applications in particular require careful benchmarking covering nominal and corner cases of various design parameters. Attributed graph grammars have been used for generating interesting and constraint-satisfying structures in various domains of design. This paper shows how attributed graph transformation systems can be effectively adapted to automatically generate synthetic circuit structures that meet arbitrary constraints on various design parameters and how the method is flexible and scalable. We discuss the method in detail and demonstrate its utility for an example hardware security application, the sequential satisfiability attack.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132134634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129298
Luke R. Upton, Guénolé Lallement, M. Scott, Joyce E. S. Taylor, R. Radway, D. Rich, M. Nelson, S. Mitra, B. Murmann
Many emerging resistive memory characterization efforts are constrained to small-batch, device-level studies due to a lack of test structure read/write bandwidth. To address this issue, we present a Yield Test Vehicle (YTV) for characterizing resistive RAM (RRAM) at the array level in SkyWater’s 130 nm technology. The YTV provides 16-bit word read/write access with 7 bits (3.3 µS - 425 µS) of linear reference conductance range, and an onboard controller prevents excessive cell writes responsible for yield deterioration. The 100 mm2 YTV die has an aggregate 8.8 Mb capacity and operates at a clock frequency of up to 50 MHz. The readout’s wide input conductance dynamic range and modular peripheral circuit design allow rapid adaptation for characterizing other resistive memory technologies.
{"title":"Testbench on a Chip: A Yield Test Vehicle for Resistive Memory Devices","authors":"Luke R. Upton, Guénolé Lallement, M. Scott, Joyce E. S. Taylor, R. Radway, D. Rich, M. Nelson, S. Mitra, B. Murmann","doi":"10.1109/ISQED57927.2023.10129298","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129298","url":null,"abstract":"Many emerging resistive memory characterization efforts are constrained to small-batch, device-level studies due to a lack of test structure read/write bandwidth. To address this issue, we present a Yield Test Vehicle (YTV) for characterizing resistive RAM (RRAM) at the array level in SkyWater’s 130 nm technology. The YTV provides 16-bit word read/write access with 7 bits (3.3 µS - 425 µS) of linear reference conductance range, and an onboard controller prevents excessive cell writes responsible for yield deterioration. The 100 mm2 YTV die has an aggregate 8.8 Mb capacity and operates at a clock frequency of up to 50 MHz. The readout’s wide input conductance dynamic range and modular peripheral circuit design allow rapid adaptation for characterizing other resistive memory technologies.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129722659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129339
Brendan Reidy, D. Duggan, Bernard Glasauer, Peng Su, Ramtin Zand
The rapid identification of contributing factors to failures in a high-volume electronic product manufacturing environment is critical to reduce disruption to production and mitigate potential quality and reliability risks. As system complexity and component usage continues to increase, it is becoming more and more challenging to manually process the large volume of data that are continuously generated by production processes. In this paper, we utilize various machine learning (ML) techniques to classify components on the printed circuit board assemblies (PCBAs) as defective or non-defective based on an input feature map including features like the date the component is manufactured, the side of board on which the component is placed, the location of the component on the board, etc. We then implement a feature importance algorithm to detect the underlying cause of the component failure. Three ML models including support vector machine, random forest, and neural network are trained and implemented for feature importance analysis using a dataset obtained from over 10 million components on various PCBA boards. Due to the intrinsic characteristics of the dataset, such as a significant imbalance between defective and non-defective cases, pre-processing techniques such as upsampling and downsampling are necessary to increase the performance of the models. The results show that all the developed ML models can achieve more than 99% accuracy. Finally, we show that our proposed feature importance approach is capable of correctly identifying the main cause of defects for given components.
{"title":"Application of Machine Learning for Quality Risk Factor Analysis of Electronic Assemblies","authors":"Brendan Reidy, D. Duggan, Bernard Glasauer, Peng Su, Ramtin Zand","doi":"10.1109/ISQED57927.2023.10129339","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129339","url":null,"abstract":"The rapid identification of contributing factors to failures in a high-volume electronic product manufacturing environment is critical to reduce disruption to production and mitigate potential quality and reliability risks. As system complexity and component usage continues to increase, it is becoming more and more challenging to manually process the large volume of data that are continuously generated by production processes. In this paper, we utilize various machine learning (ML) techniques to classify components on the printed circuit board assemblies (PCBAs) as defective or non-defective based on an input feature map including features like the date the component is manufactured, the side of board on which the component is placed, the location of the component on the board, etc. We then implement a feature importance algorithm to detect the underlying cause of the component failure. Three ML models including support vector machine, random forest, and neural network are trained and implemented for feature importance analysis using a dataset obtained from over 10 million components on various PCBA boards. Due to the intrinsic characteristics of the dataset, such as a significant imbalance between defective and non-defective cases, pre-processing techniques such as upsampling and downsampling are necessary to increase the performance of the models. The results show that all the developed ML models can achieve more than 99% accuracy. Finally, we show that our proposed feature importance approach is capable of correctly identifying the main cause of defects for given components.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125089997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129292
Deepraj Soni, M. Nabeel, Homer Gamil, O. Mazonka, Brandon Reagen, R. Karri, M. Maniatakos
Fully homomorphic encryption (FHE) promises data protection by computation on encrypted data, but demands resource-intensive computation. The most fundamental resource of FHE is modular multiplier, which needs to be evaluated for efficient implementation. In this work, we develop and evaluate ASIC implementations of the modular multiplier at the block-level and at the system-level. We study the efficiency of the multipliers in terms of performance-for-area and performance-for-power. Since these ASICs are used in FHE, we explore these multipliers within this system-level context with on-chip memory and interconnect limits. We explore ASIC implementations of modular multiplications using a state-of-the-art 22nm technology node with constant operand throughput to ensure a fair comparison. The study yields key insights about the performance-for-area efficiency and power efficiency of bit-serial and bit-parallel designs: Bit-parallel designs are more efficient than their bitserial counterparts. Montgomery multipliers with constrained modulus are the most power-efficient and area-efficient design. Iterative Montgomery multipliers incur minimum peak power for a polynomial multiplication, making them suitable for low-power voltage sources.
{"title":"Design Space Exploration of Modular Multipliers for ASIC FHE accelerators","authors":"Deepraj Soni, M. Nabeel, Homer Gamil, O. Mazonka, Brandon Reagen, R. Karri, M. Maniatakos","doi":"10.1109/ISQED57927.2023.10129292","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129292","url":null,"abstract":"Fully homomorphic encryption (FHE) promises data protection by computation on encrypted data, but demands resource-intensive computation. The most fundamental resource of FHE is modular multiplier, which needs to be evaluated for efficient implementation. In this work, we develop and evaluate ASIC implementations of the modular multiplier at the block-level and at the system-level. We study the efficiency of the multipliers in terms of performance-for-area and performance-for-power. Since these ASICs are used in FHE, we explore these multipliers within this system-level context with on-chip memory and interconnect limits. We explore ASIC implementations of modular multiplications using a state-of-the-art 22nm technology node with constant operand throughput to ensure a fair comparison. The study yields key insights about the performance-for-area efficiency and power efficiency of bit-serial and bit-parallel designs: Bit-parallel designs are more efficient than their bitserial counterparts. Montgomery multipliers with constrained modulus are the most power-efficient and area-efficient design. Iterative Montgomery multipliers incur minimum peak power for a polynomial multiplication, making them suitable for low-power voltage sources.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129190216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-05DOI: 10.1109/ISQED57927.2023.10129385
Cheng-Yen Lee, S. Khatri, S. Vrudhula
In this paper, we present a pseudo-flash based digital low dropout (Digital LDO) voltage regulator. The novelty of our pseudo-flash based Digital LDO (PFD-LDO) voltage regulator lies in the fact that we use pseudo-flash (or alternately, flash) transistor subarrays for voltage regulation. By changing the threshold voltage (and thereby, the ON resistance) of these transistors, we can use the same design to meet different regulator specifications. The threshold voltage can be programmed either at the factory by the manufacturer or in the field by the user. This gives the manufacturer the ability to offer a family of LDO regulators with a single design, a significant economic advantage. In addition, aging effects and temperature variations are effectively erased since the threshold voltage of the pseudo-flash (or flash) transistors can be tuned to a fine degree in the field. Similarly, process variations can be cancelled after manufacturing in the factory. These advantages are absent in traditional LDO regulators. Our design uses two subarrays. A coarse subarray is used to reduce the recovery time and output voltage overshoot/undershoot, while a fine subarray regulates the output voltage, minimizing the output voltage ripple. Unlike state-of-the-art LDO regulators, our design can realize multiple specifications with the same circuit. For example, we demonstrate that the Vout of the proposed PFD-LDO regulator can range from 0.7V to 1.7V when the supply voltage VIN ranges from 0.8V to 1.8V, using the same circuit design. Over this voltage range, the proposed PFD-LDO regulator achieves Vshoot < 144mV, trec < 0.41µs and Vripple < 7.3mV when the Imax ranges from 15mA to 250mA.
{"title":"A Novel Pseudo-Flash Based Digital Low Dropout (LDO) Voltage Regulator","authors":"Cheng-Yen Lee, S. Khatri, S. Vrudhula","doi":"10.1109/ISQED57927.2023.10129385","DOIUrl":"https://doi.org/10.1109/ISQED57927.2023.10129385","url":null,"abstract":"In this paper, we present a pseudo-flash based digital low dropout (Digital LDO) voltage regulator. The novelty of our pseudo-flash based Digital LDO (PFD-LDO) voltage regulator lies in the fact that we use pseudo-flash (or alternately, flash) transistor subarrays for voltage regulation. By changing the threshold voltage (and thereby, the ON resistance) of these transistors, we can use the same design to meet different regulator specifications. The threshold voltage can be programmed either at the factory by the manufacturer or in the field by the user. This gives the manufacturer the ability to offer a family of LDO regulators with a single design, a significant economic advantage. In addition, aging effects and temperature variations are effectively erased since the threshold voltage of the pseudo-flash (or flash) transistors can be tuned to a fine degree in the field. Similarly, process variations can be cancelled after manufacturing in the factory. These advantages are absent in traditional LDO regulators. Our design uses two subarrays. A coarse subarray is used to reduce the recovery time and output voltage overshoot/undershoot, while a fine subarray regulates the output voltage, minimizing the output voltage ripple. Unlike state-of-the-art LDO regulators, our design can realize multiple specifications with the same circuit. For example, we demonstrate that the Vout of the proposed PFD-LDO regulator can range from 0.7V to 1.7V when the supply voltage VIN ranges from 0.8V to 1.8V, using the same circuit design. Over this voltage range, the proposed PFD-LDO regulator achieves Vshoot < 144mV, trec < 0.41µs and Vripple < 7.3mV when the Imax ranges from 15mA to 250mA.","PeriodicalId":315053,"journal":{"name":"2023 24th International Symposium on Quality Electronic Design (ISQED)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126635363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}