Shanglin Zhou, Mikhail A. Bragin, Deniz Gurevin, Lynn Pepin, Fei Miao, Caiwen Ding
Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.
{"title":"Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning","authors":"Shanglin Zhou, Mikhail A. Bragin, Deniz Gurevin, Lynn Pepin, Fei Miao, Caiwen Ding","doi":"10.1145/3624476","DOIUrl":"https://doi.org/10.1145/3624476","url":null,"abstract":"Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135063048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iris Hru Jiang, David Chinnery, Gracieli Posser, Jens Lienig
equation and proximal group alternating direction method of multipliers ( ADMM ). A fast computation of 3D Poisson’s equation and a parameter updating scheme are presented to accelerate the convergence of the optimization problem. “A Fast Optimal Double Row Legalization Algorithm,” by Hougardy et al., improves the legalization step in standard-cell placement by minimizing cell displacement for both single-row and double-row height cells, assuming a fixed left-to-right ordering within each row. In doing so, the authors do not artificially bound the maximum cell movement and can guarantee to find an optimal solution with minimum cell displacement
{"title":"Introduction to the Special Section on Advances in Physical Design Automation","authors":"Iris Hru Jiang, David Chinnery, Gracieli Posser, Jens Lienig","doi":"10.1145/3604593","DOIUrl":"https://doi.org/10.1145/3604593","url":null,"abstract":"equation and proximal group alternating direction method of multipliers ( ADMM ). A fast computation of 3D Poisson’s equation and a parameter updating scheme are presented to accelerate the convergence of the optimization problem. “A Fast Optimal Double Row Legalization Algorithm,” by Hougardy et al., improves the legalization step in standard-cell placement by minimizing cell displacement for both single-row and double-row height cells, assuming a fixed left-to-right ordering within each row. In doing so, the authors do not artificially bound the maximum cell movement and can guarantee to find an optimal solution with minimum cell displacement","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136192256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this article, we show how to optimize squared cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time 𝒪( n · log ( n )), where n denotes the number of cells involved. Opposed to prior works, we do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Our approach also allows us to include gridding and movebound constraints for the cells. Experimental results show an average percental decrease of over 26% in the total squared movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.
{"title":"A Fast Optimal Double-row Legalization Algorithm","authors":"Stefan Hougardy, Meike Neuwohner, Ulrike Schorr","doi":"10.1145/3579844","DOIUrl":"https://doi.org/10.1145/3579844","url":null,"abstract":"In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this article, we show how to optimize squared cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time 𝒪( n · log ( n )), where n denotes the number of cells involved. Opposed to prior works, we do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Our approach also allows us to include gridding and movebound constraints for the cells. Experimental results show an average percental decrease of over 26% in the total squared movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136298928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sample preparation is an inherent procedure of many biochemical applications, and digital microfluidic biochips (DMBs) proved to be very effective in performing such a procedure. In a single mixing step, conventional DMBs can mix two droplets in 1:1 ratio only. Due to this limitation, DMBs suffer from heavy fluid wastage and often require a lot of mixing steps. However, the next-generation DMBs, i.e., micro-electrode-dot-array (MEDA) biochips can realize multiple mixing ratios, which in general helps in minimizing the number of mixing operations. In this paper, we present a heuristic-based sample preparation algorithm, specifically a mixing algorithm called Division by Factor Method for MEDA that exploits the mixing models of MEDA biochips. We propose another mixing algorithm for MEDA biochips called Single Target Waste Minimization (STWM), which minimizes the wastage of fluids and determines an efficient mixing graph. We also propose an advanced methodology for multiple target reagent mixing problems called Multi-Target Waste Minimization (MTWM), which determines efficient mixing graphs for different target ratios by maximizing the sharing of fluids and minimizing the fluid wastage. Simulation results suggest that the proposed STWM and MTWM methods outperform the state-of-the-art methods in terms of minimizing the amount of fluid wastage, reducing the total usage of reagent fluids, and minimizing the number of mixing operations.
{"title":"Multi-Target Fluid Mixing in MEDA Biochips: Theory and an Attempt towards Waste Minimization","authors":"Debraj Kundu, Sudip Roy","doi":"10.1145/3622785","DOIUrl":"https://doi.org/10.1145/3622785","url":null,"abstract":"Sample preparation is an inherent procedure of many biochemical applications, and digital microfluidic biochips (DMBs) proved to be very effective in performing such a procedure. In a single mixing step, conventional DMBs can mix two droplets in 1:1 ratio only. Due to this limitation, DMBs suffer from heavy fluid wastage and often require a lot of mixing steps. However, the next-generation DMBs, i.e., micro-electrode-dot-array (MEDA) biochips can realize multiple mixing ratios, which in general helps in minimizing the number of mixing operations. In this paper, we present a heuristic-based sample preparation algorithm, specifically a mixing algorithm called Division by Factor Method for MEDA that exploits the mixing models of MEDA biochips. We propose another mixing algorithm for MEDA biochips called Single Target Waste Minimization (STWM), which minimizes the wastage of fluids and determines an efficient mixing graph. We also propose an advanced methodology for multiple target reagent mixing problems called Multi-Target Waste Minimization (MTWM), which determines efficient mixing graphs for different target ratios by maximizing the sharing of fluids and minimizing the fluid wastage. Simulation results suggest that the proposed STWM and MTWM methods outperform the state-of-the-art methods in terms of minimizing the amount of fluid wastage, reducing the total usage of reagent fluids, and minimizing the number of mixing operations.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48611534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discrete Hartley transform is a core component of digital signal processing because of its advantages of fast computing speed and less power consumption. Traditional FPGA-based implementation methods have the disadvantage of high latency, which cannot meet the needs of energy-efficient computing in the Internet of Things era. Therefore, A programmable analog memory computing circuit is proposed to accelerate FHT and IFHT calculations for large-scale one-step matrix computation. By adjusting the weight of memristor, different scales of FHT calculation can be achieved. PSPICE simulation results show that the average accuracy of the proposed circuit can reach 99.9%, and the speed can also reach the level of 0.1μs. The robustness analysis shows that the circuit can tolerate a certain degree of programming error and resistance tolerance. The designed analog circuit is applied to image compression processing, and the image compression accuracy can reach 99.9%.
{"title":"Programmable In-memory Computing Circuit of Fast Hartley Transform","authors":"Q. Hong, Richeng Huang, Pin-an Xiao, Jun Yu Li, Jingru Sun, Jiliang Zhang","doi":"10.1145/3618112","DOIUrl":"https://doi.org/10.1145/3618112","url":null,"abstract":"Discrete Hartley transform is a core component of digital signal processing because of its advantages of fast computing speed and less power consumption. Traditional FPGA-based implementation methods have the disadvantage of high latency, which cannot meet the needs of energy-efficient computing in the Internet of Things era. Therefore, A programmable analog memory computing circuit is proposed to accelerate FHT and IFHT calculations for large-scale one-step matrix computation. By adjusting the weight of memristor, different scales of FHT calculation can be achieved. PSPICE simulation results show that the average accuracy of the proposed circuit can reach 99.9%, and the speed can also reach the level of 0.1μs. The robustness analysis shows that the circuit can tolerate a certain degree of programming error and resistance tolerance. The designed analog circuit is applied to image compression processing, and the image compression accuracy can reach 99.9%.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44030534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Embedded system applications often require guarantees regarding non-functional properties when executed on a given MPSoC platform. Examples of such requirements include real-time, energy or safety properties on corresponding programs. One option to implement the enforcement of such requirements is by a reactive control loop, where an enforcer decides based on a system response (feedback) how to control the system, e.g., by adapting the number of cores allocated to a program or by scaling the voltage/frequency mode of involved processors. Typically, a violation of a requirement must either never happen in case of strict enforcement, or only happen temporally (in case of so-called loose enforcement). However, it is a challenge to design enforcers for which it is possible to give formal guarantees with respect to requirements, especially in the presence of typically largely varying environmental input (workload) per execution. Technically, an enforcement strategy can be formally modeled by a finite state machine (FSM) and the uncertain environment determining the workload by a discrete-time Markov chain. It has been shown in previous work that this formalization allows the formal verification of temporal properties (verification goals) regarding the fulfillment of requirements for a given enforcement strategy. In this paper, we consider the so far unsolved problem of design space exploration and automatic synthesis of enforcement automata that maximize a number of deterministic and probabilistic verification goals formulated on a given set of non-functional requirements. For the design space exploration (DSE), an approach based on multi-objective evolutionary algorithms is proposed in which enforcement automata are encoded as genes of states and state transition conditions. For each individual, the verification goals are evaluated using probabilistic model checking. At the end, the DSE returns a set of efficient FSMs in terms of probabilities of meeting given requirements. As experimental results, we present three use cases while considering requirements on latency and energy consumption.
{"title":"Automatic Synthesis of FSMs for Enforcing Non-functional Requirements on MPSoCs using Multi-Objective Evolutionary Algorithms","authors":"Khalil Esper, S. Wildermann, J. Teich","doi":"10.1145/3617832","DOIUrl":"https://doi.org/10.1145/3617832","url":null,"abstract":"Embedded system applications often require guarantees regarding non-functional properties when executed on a given MPSoC platform. Examples of such requirements include real-time, energy or safety properties on corresponding programs. One option to implement the enforcement of such requirements is by a reactive control loop, where an enforcer decides based on a system response (feedback) how to control the system, e.g., by adapting the number of cores allocated to a program or by scaling the voltage/frequency mode of involved processors. Typically, a violation of a requirement must either never happen in case of strict enforcement, or only happen temporally (in case of so-called loose enforcement). However, it is a challenge to design enforcers for which it is possible to give formal guarantees with respect to requirements, especially in the presence of typically largely varying environmental input (workload) per execution. Technically, an enforcement strategy can be formally modeled by a finite state machine (FSM) and the uncertain environment determining the workload by a discrete-time Markov chain. It has been shown in previous work that this formalization allows the formal verification of temporal properties (verification goals) regarding the fulfillment of requirements for a given enforcement strategy. In this paper, we consider the so far unsolved problem of design space exploration and automatic synthesis of enforcement automata that maximize a number of deterministic and probabilistic verification goals formulated on a given set of non-functional requirements. For the design space exploration (DSE), an approach based on multi-objective evolutionary algorithms is proposed in which enforcement automata are encoded as genes of states and state transition conditions. For each individual, the verification goals are evaluated using probabilistic model checking. At the end, the DSE returns a set of efficient FSMs in terms of probabilities of meeting given requirements. As experimental results, we present three use cases while considering requirements on latency and energy consumption.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47572110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The resistive random access memory (RRAM) based in-memory computing (IMC) is an emerging architecture to address the challenge of the “memory wall” problem. The complementary resistive switch (CRS) cell connects two bipolar RRAM elements anti-serially to reduce the sneak current in the crossbar array. The CRS array is a generic computing platform, for the arbitrary logic functions can be implemented in it. The IMC CRS LUT consumes fewer CRS cells than the static CRS LUT. The CRS array has built-in polymorphic characteristics because the correct logic function can not be distinguished based on the circuit layout. However, the logic state of every CRS cell can be readout after each operation. It helps the attacker to recover the correct function of the IMC CRS LUT. This work discusses the resistance analysis attack of the IMC LUT based on the CRS array. The proposed resistance analysis attack method is able to be applied to different computation styles based on the CRS array, such as the CRS IMPLY, CRS NOR-OR/NAND-AND, etc. The attacker can recover the logic function of the LUT by tracing the states of CRS cells. Furthermore, an improved IMC CRS LUT method is proposed and discussed to enhance security. The simulation and analysis results show that the improved IMC CRS LUT can resist various attacks, and it maintains the polymorphic characteristics of the IMC CRS LUT. And the N-bit full adder circuit based on the improved IMC CRS NOR-OR LUTs achieves the best performance compared with the previous counterparts.
{"title":"The Resistance Analysis Attack and Security Enhancement of the IMC LUT based on the Complementary Resistive Switch Cells","authors":"Xiaole Cui, Mingqi Yin, Han-Yang Liu, Xiaohui Cui","doi":"10.1145/3616870","DOIUrl":"https://doi.org/10.1145/3616870","url":null,"abstract":"The resistive random access memory (RRAM) based in-memory computing (IMC) is an emerging architecture to address the challenge of the “memory wall” problem. The complementary resistive switch (CRS) cell connects two bipolar RRAM elements anti-serially to reduce the sneak current in the crossbar array. The CRS array is a generic computing platform, for the arbitrary logic functions can be implemented in it. The IMC CRS LUT consumes fewer CRS cells than the static CRS LUT. The CRS array has built-in polymorphic characteristics because the correct logic function can not be distinguished based on the circuit layout. However, the logic state of every CRS cell can be readout after each operation. It helps the attacker to recover the correct function of the IMC CRS LUT. This work discusses the resistance analysis attack of the IMC LUT based on the CRS array. The proposed resistance analysis attack method is able to be applied to different computation styles based on the CRS array, such as the CRS IMPLY, CRS NOR-OR/NAND-AND, etc. The attacker can recover the logic function of the LUT by tracing the states of CRS cells. Furthermore, an improved IMC CRS LUT method is proposed and discussed to enhance security. The simulation and analysis results show that the improved IMC CRS LUT can resist various attacks, and it maintains the polymorphic characteristics of the IMC CRS LUT. And the N-bit full adder circuit based on the improved IMC CRS NOR-OR LUTs achieves the best performance compared with the previous counterparts.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43964170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the technological advancements in the last few decades, several applications have emerged that demand more computing power and on-chip and off-chip memories. However, the scaling of memory technologies is not at par with computing throughput of modern day multi-core processors. Conventional memory technologies such as SRAM and DRAM have technological limitations to meet large on-chip memory requirements owing to their low packaging density and high leakage power. In order to meet the ever-increasing demand for memory, researchers came up with alternative solutions, such as emerging non-volatile memory technologies such as STT-RAM, PCM and ReRAM. However, these memory technologies have limited write endurance and high write energy. This emphasizes the need for a policy that will reduce the writes or distribute the writes uniformly across the memory thereby enhancing its lifetime by delaying the early wear out of memory cells due to frequent writes. We propose two techniques, Enhanced-Virtually Split Cache (E-ViSC) and Protean-Virtually Split Cache (P-ViSC), which dynamically adjust the cache configuration to distribute the writes uniformly across the memory to enhance the lifetime. Experimental studies show that E-ViSC and P-ViSC improve lifetime of NVM L2 caches by upto 2.31x and 1.97x respectively.
{"title":"Self Adaptive Logical Split Cache techniques for delayed aging of NVM LLC","authors":"S. Sivakumar, John Jose","doi":"10.1145/3616871","DOIUrl":"https://doi.org/10.1145/3616871","url":null,"abstract":"Due to the technological advancements in the last few decades, several applications have emerged that demand more computing power and on-chip and off-chip memories. However, the scaling of memory technologies is not at par with computing throughput of modern day multi-core processors. Conventional memory technologies such as SRAM and DRAM have technological limitations to meet large on-chip memory requirements owing to their low packaging density and high leakage power. In order to meet the ever-increasing demand for memory, researchers came up with alternative solutions, such as emerging non-volatile memory technologies such as STT-RAM, PCM and ReRAM. However, these memory technologies have limited write endurance and high write energy. This emphasizes the need for a policy that will reduce the writes or distribute the writes uniformly across the memory thereby enhancing its lifetime by delaying the early wear out of memory cells due to frequent writes. We propose two techniques, Enhanced-Virtually Split Cache (E-ViSC) and Protean-Virtually Split Cache (P-ViSC), which dynamically adjust the cache configuration to distribute the writes uniformly across the memory to enhance the lifetime. Experimental studies show that E-ViSC and P-ViSC improve lifetime of NVM L2 caches by upto 2.31x and 1.97x respectively.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48492701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Power dissipation is considered one of the important issues in low power Very-large-scale integration (VLSI) circuit design, and is related to the threshold voltage. Generally, the sub-threshold leakage current and the leakage power dissipation are increased by reducing the threshold voltage. The overall performance of the circuit completely depends on this leakage power dissipation. Because this leakage and power consumption causes the components that are functioning by the battery for a long period to washed-out rapidly. In this research, the reversible logic gate-based 9T static random access memory (SRAM) is designed in 14nm FinFET technology to reduce leakage power consumption in memory related applications. The Schmitt-trigger (ST)-based 9T SRAM cell is designed to attain high read-write stability and low power consumption using a single bit line structure. The reversible logic gates of Feynman (FG) and Fredkin gate (FRG) are combined to develop a row and column decoder in an SRAM design to diminish the leakage power. Moreover, the transistor stacking effect is applied to the proposed memory design to reduce the leakage power in active mode. The proposed reversible logic and transistor stacking based SRAM design is implemented in Tanner EDA Tool version 16.0. It also performs both read and write operations using the proposed circuit. The performance measures of read access time (RAT), write access time (WAT), read, write, and static power by varying supply voltage and temperature, delay and stability analysis (read/write static noise margin) are examined and compared with existing SRAM designs.
功耗是低功耗超大规模集成电路(VLSI)设计中的一个重要问题,它与阈值电压有关。一般通过降低阈值电压来增大亚阈值泄漏电流和泄漏功耗。电路的整体性能完全取决于这个泄漏功耗。因为这种漏电和功耗会导致电池长时间运行的组件迅速被冲走。在本研究中,采用14nm FinFET技术设计了基于可逆逻辑门的9T静态随机存取存储器(SRAM),以降低存储器相关应用中的泄漏功耗。基于施密特触发器(ST)的9T SRAM单元设计用于实现高读写稳定性和使用单位线结构的低功耗。在SRAM设计中,将费曼门(FG)和弗雷德金门(FRG)的可逆逻辑门相结合,形成行和列解码器,以减小泄漏功率。此外,将晶体管堆叠效应应用于所提出的存储器设计中,以降低有源模式下的泄漏功率。所提出的基于可逆逻辑和晶体管堆叠的SRAM设计在Tanner EDA Tool version 16.0中实现。它还使用所建议的电路执行读和写操作。通过改变电源电压和温度、延迟和稳定性分析(读/写静态噪声裕度),检查了读访问时间(RAT)、写访问时间(WAT)、读、写和静态功率的性能指标,并与现有的SRAM设计进行了比较。
{"title":"Design of Enhanced Reversible 9T SRAM Design for the Reduction in Sub-Threshold Leakage Current with14nm FinFET Technology","authors":"P. Praveen, D. R. Singh","doi":"10.1145/3616538","DOIUrl":"https://doi.org/10.1145/3616538","url":null,"abstract":"Power dissipation is considered one of the important issues in low power Very-large-scale integration (VLSI) circuit design, and is related to the threshold voltage. Generally, the sub-threshold leakage current and the leakage power dissipation are increased by reducing the threshold voltage. The overall performance of the circuit completely depends on this leakage power dissipation. Because this leakage and power consumption causes the components that are functioning by the battery for a long period to washed-out rapidly. In this research, the reversible logic gate-based 9T static random access memory (SRAM) is designed in 14nm FinFET technology to reduce leakage power consumption in memory related applications. The Schmitt-trigger (ST)-based 9T SRAM cell is designed to attain high read-write stability and low power consumption using a single bit line structure. The reversible logic gates of Feynman (FG) and Fredkin gate (FRG) are combined to develop a row and column decoder in an SRAM design to diminish the leakage power. Moreover, the transistor stacking effect is applied to the proposed memory design to reduce the leakage power in active mode. The proposed reversible logic and transistor stacking based SRAM design is implemented in Tanner EDA Tool version 16.0. It also performs both read and write operations using the proposed circuit. The performance measures of read access time (RAT), write access time (WAT), read, write, and static power by varying supply voltage and temperature, delay and stability analysis (read/write static noise margin) are examined and compared with existing SRAM designs.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48953571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The decreasing operational voltage and scaled technology node for memory designing has widened the gap between two crucial parameters for an SRAM – delay and power. As the demand for internet of things is increasing, the need for round the clock connectivity is increasing. This mandates designing a cell with a capability to switch between low power and high speed operation. Thus, this paper presents design of dual mode operational 7T cell that can switch between single port and dual port configuration. The proposed reconfigurable cell can operate as single port or dual port cell single ended 7T cell. The reconfigurability in the cell is realized using control signals. The noise stability of the bit cell is obtained to be 333, 333, and 470 mV for read, hold and write modes respectively. The robustness of the cell against temperature variation, process variation and voltage variation is also analyzed. The performance variation in each parameter will not have a dramatic impact as it is within manageable limit. Its write time is 0.14 ns, while 5 ps are required for a successful read operation. The dual port configuration of the cell supports pipelining and thus operates faster than its single port configuration.
{"title":"A single–Port to Dual-Port Reconfigurable 7T SRAM Bit Cell for High Speed, Power Saving and Low Voltage Application","authors":"B. Rawat, P. Mittal","doi":"10.1145/3616872","DOIUrl":"https://doi.org/10.1145/3616872","url":null,"abstract":"The decreasing operational voltage and scaled technology node for memory designing has widened the gap between two crucial parameters for an SRAM – delay and power. As the demand for internet of things is increasing, the need for round the clock connectivity is increasing. This mandates designing a cell with a capability to switch between low power and high speed operation. Thus, this paper presents design of dual mode operational 7T cell that can switch between single port and dual port configuration. The proposed reconfigurable cell can operate as single port or dual port cell single ended 7T cell. The reconfigurability in the cell is realized using control signals. The noise stability of the bit cell is obtained to be 333, 333, and 470 mV for read, hold and write modes respectively. The robustness of the cell against temperature variation, process variation and voltage variation is also analyzed. The performance variation in each parameter will not have a dramatic impact as it is within manageable limit. Its write time is 0.14 ns, while 5 ps are required for a successful read operation. The dual port configuration of the cell supports pipelining and thus operates faster than its single port configuration.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49269863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}