In modern digital signal processing (DSP) and graphics applications, the arithmetic sum-of-products, shifters and adders are important modules, contributing a significant amount to the overall delay of the system. A datapath structure consisting of multiple arithmetic sum-of-product, shifter and adder blocks is often found in the timing-critical path of the chip. In this paper, we propose a new operator-level merging technique to synthesize this type of datapath structure. In our approach, we combine the shifting operation with the partial product reduction stage of the sum-of-product blocks. This enables us to implement the functionality of the original design by using only one carry- propagate adder block (instead of two carry-propagate adders). As a result, the timing-critical path of the design gets shortened by a significant percentage and the overall performance of the design improves. Our experimental data shows that the datapath block generated by our approach is significantly faster (13.28% on average) with a modest area penalty (3.24% on average) than the corresponding block generated by a commercially available best-in-class datapath synthesis tool. These improvements were verified on placed-and-routed designs as well.
{"title":"A Merged Synthesis Technique for Fast Arithmetic Blocks Involving Sum-of-Products and Shifters","authors":"Sabyasachi Das, S. Khatri","doi":"10.1109/VLSI.2008.112","DOIUrl":"https://doi.org/10.1109/VLSI.2008.112","url":null,"abstract":"In modern digital signal processing (DSP) and graphics applications, the arithmetic sum-of-products, shifters and adders are important modules, contributing a significant amount to the overall delay of the system. A datapath structure consisting of multiple arithmetic sum-of-product, shifter and adder blocks is often found in the timing-critical path of the chip. In this paper, we propose a new operator-level merging technique to synthesize this type of datapath structure. In our approach, we combine the shifting operation with the partial product reduction stage of the sum-of-product blocks. This enables us to implement the functionality of the original design by using only one carry- propagate adder block (instead of two carry-propagate adders). As a result, the timing-critical path of the design gets shortened by a significant percentage and the overall performance of the design improves. Our experimental data shows that the datapath block generated by our approach is significantly faster (13.28% on average) with a modest area penalty (3.24% on average) than the corresponding block generated by a commercially available best-in-class datapath synthesis tool. These improvements were verified on placed-and-routed designs as well.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129646298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amlan Ghosh, R. Rao, Jae-Joon Kim, C. Chuang, Richard B. Brown
The need for efficient and accurate detection schemes to mitigate the impact of process variations on the parametric yield of integrated circuits has increased in the nm design era. In this paper, a new variation detection technique is presented that uses slew as a metric along with delay to determine the mismatch between the drive strengths of NMOS and PMOS devices. The importance of considering both of these metrics is illustrated and a new slew-rate monitoring circuit is presented for measuring slew of a signal from the critical path of a circuit. Design considerations, simulation results and characteristics of the slew-rate monitor circuitry in a 45 nm SOI technology are presented, and a sensitivity of 1 MHz/ps is achieved. This scheme can detect the threshold voltage variation in the order of mV, with a sensitivity of 0.95 MHz/mV.
{"title":"On-Chip Process Variation Detection Using Slew-Rate Monitoring Circuit","authors":"Amlan Ghosh, R. Rao, Jae-Joon Kim, C. Chuang, Richard B. Brown","doi":"10.1109/VLSI.2008.67","DOIUrl":"https://doi.org/10.1109/VLSI.2008.67","url":null,"abstract":"The need for efficient and accurate detection schemes to mitigate the impact of process variations on the parametric yield of integrated circuits has increased in the nm design era. In this paper, a new variation detection technique is presented that uses slew as a metric along with delay to determine the mismatch between the drive strengths of NMOS and PMOS devices. The importance of considering both of these metrics is illustrated and a new slew-rate monitoring circuit is presented for measuring slew of a signal from the critical path of a circuit. Design considerations, simulation results and characteristics of the slew-rate monitor circuitry in a 45 nm SOI technology are presented, and a sensitivity of 1 MHz/ps is achieved. This scheme can detect the threshold voltage variation in the order of mV, with a sensitivity of 0.95 MHz/mV.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127669664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankur Gupta, Rajat Chauhan, V. Menezes, V. Narang, H. M. Roopashree
Voltage scaling is one of the knobs that is used today to control both static and the active power for SoCs. The SoC core supply voltage is scaled adaptively based on the performance needs. But it is also required to maintain the external electrical chip interface protocol, which may run at a different voltage level. The chip interfaces need to operate reliably under adaptively scaling core voltage and fixed 10 supply voltage. Within the 10 circuits, voltage level shifters are used to communicate between two voltage domains. This paper examines the performance of a conventional voltage level shifter and describes a novel high performance level shifter that is more robust under adapting voltage scaling.
{"title":"A Robust Level-Shifter Design for Adaptive Voltage Scaling","authors":"Ankur Gupta, Rajat Chauhan, V. Menezes, V. Narang, H. M. Roopashree","doi":"10.1109/VLSI.2008.61","DOIUrl":"https://doi.org/10.1109/VLSI.2008.61","url":null,"abstract":"Voltage scaling is one of the knobs that is used today to control both static and the active power for SoCs. The SoC core supply voltage is scaled adaptively based on the performance needs. But it is also required to maintain the external electrical chip interface protocol, which may run at a different voltage level. The chip interfaces need to operate reliably under adaptively scaling core voltage and fixed 10 supply voltage. Within the 10 circuits, voltage level shifters are used to communicate between two voltage domains. This paper examines the performance of a conventional voltage level shifter and describes a novel high performance level shifter that is more robust under adapting voltage scaling.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131478293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an efficient architecture to implement low power variable block size motion estimation (VBSME) using full search. Power reduction is achieved by performing the search in two steps: low pixel resolution and full pixel resolution. We analysed the computation and memory units needed to support these two search modes. The proposed architecture reduces the total energy consumption by 50% with 6% additional area compared to the conventional architecture.
{"title":"Low Power Hardware Architecture for VBSME Using Pixel Truncation","authors":"A. Bahari, T. Arslan, A. Erdogan","doi":"10.1109/VLSI.2008.100","DOIUrl":"https://doi.org/10.1109/VLSI.2008.100","url":null,"abstract":"This paper presents an efficient architecture to implement low power variable block size motion estimation (VBSME) using full search. Power reduction is achieved by performing the search in two steps: low pixel resolution and full pixel resolution. We analysed the computation and memory units needed to support these two search modes. The proposed architecture reduces the total energy consumption by 50% with 6% additional area compared to the conventional architecture.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115644103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The complexity of today's Multi-Processors System-on-Chip (MPSoC) requires new design methodologies to solve time-to-market and design cost problems. In SoC for which several subsystems are connected together, we notice that lots of design time is wasted on solving the inter-subsystem (global) communication problem. In this paper, we propose a novel communication exploration method based on a multi-abstraction levels exploration. With this work, the inter-subsystem communication structure can be optimized at the beginning of the design process by using simulation models at three different abstraction levels. The simulation at the higher abstraction level allows designers to explore parameters of the interconnection model at the more detailed abstraction level. Some design loop cases can be avoided by using this exploration method. With the Motion-JPEG case study, we illustrate the whole communication exploration process step by step. From experimental results, we show that compared with the cycle accurate simulation, the inter-subsystem communication can be well optimized and evaluated at higher abstraction levels.
{"title":"MPSoC Communication Architecture Exploration Using an Abstraction Refinement Method","authors":"Hao Shen, F. Pétrot","doi":"10.1109/VLSI.2008.64","DOIUrl":"https://doi.org/10.1109/VLSI.2008.64","url":null,"abstract":"The complexity of today's Multi-Processors System-on-Chip (MPSoC) requires new design methodologies to solve time-to-market and design cost problems. In SoC for which several subsystems are connected together, we notice that lots of design time is wasted on solving the inter-subsystem (global) communication problem. In this paper, we propose a novel communication exploration method based on a multi-abstraction levels exploration. With this work, the inter-subsystem communication structure can be optimized at the beginning of the design process by using simulation models at three different abstraction levels. The simulation at the higher abstraction level allows designers to explore parameters of the interconnection model at the more detailed abstraction level. Some design loop cases can be avoided by using this exploration method. With the Motion-JPEG case study, we illustrate the whole communication exploration process step by step. From experimental results, we show that compared with the cycle accurate simulation, the inter-subsystem communication can be well optimized and evaluated at higher abstraction levels.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115735898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper analytical expressions for optimal Vdd and Vth to minimize energy for a given speed constraint are derived. These expressions are based on the EKV model for transistors and are valid in both strong inversion and sub threshold regions. The effect of gate leakage on the optimal Vdd and Vth is analyzed. A new gradient based algorithm for controlling Vdd and Vth based on delay and power monitoring results is proposed. A Vdd-Vth controller which uses the algorithm to dynamically control the supply and threshold voltage of a representative logic block (sum of absolute difference computation of an MPEG decoder) is designed. Simulation results using 65 nm predictive technology models are given.
{"title":"Unified Vdd Vth Optimization Based DVFM Controller for a Logic Block","authors":"S. Kannan, N. S. Sreeram, B. Amrutur","doi":"10.1109/VLSI.2008.69","DOIUrl":"https://doi.org/10.1109/VLSI.2008.69","url":null,"abstract":"In this paper analytical expressions for optimal V<sub>dd</sub> and V<sub>th</sub> to minimize energy for a given speed constraint are derived. These expressions are based on the EKV model for transistors and are valid in both strong inversion and sub threshold regions. The effect of gate leakage on the optimal V<sub>dd</sub> and V<sub>th</sub> is analyzed. A new gradient based algorithm for controlling V<sub>dd</sub> and V<sub>th</sub> based on delay and power monitoring results is proposed. A V<sub>dd</sub>-V<sub>th</sub> controller which uses the algorithm to dynamically control the supply and threshold voltage of a representative logic block (sum of absolute difference computation of an MPEG decoder) is designed. Simulation results using 65 nm predictive technology models are given.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115355520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the architecture and performance of a 0.35 mu, 1 GHZ, CMOS timing generator using array of delay lock loop. The timing generator is implemented as an array of delay locked loops. This architecture enables a timing generator with sub gate delay resolution to be implemented. The proposed delay lock loops uses novel multiplexer based dual phase and frequency detector along with a charge pump where the injected charge approaches zero as the loop approaches lock on the leading edge and the trailing edge of an input clock reference. This greatly reduces the timing jitter, loop locks to both the leading and trailing clock edges as the dual phase and frequency detector along with charge pump converts the phase difference in to voltages. Test results show a timing jitter of less than 20 pS for the DLL (delay lock loop) circuit .The DLL has a dead zone less than 0.01 nS in the phase characteristics and has low phase sensitivity errors. The timing generator is implemented as an array of delay locked loops (Kostamovaara, 2000) which exponentially reduce the locking time. An experimental proto type was simulated at 0.7 mu and 0.35 mu technologies with a supply voltage of 5 V and 3.3 V respectively.
{"title":"0.35µ, 1 GHz, CMOS Timing Generator Using Array of Digital Delay Lock Loops","authors":"B. Srinivasan, V. Chandratre, Menka Tewani","doi":"10.1109/VLSI.2008.95","DOIUrl":"https://doi.org/10.1109/VLSI.2008.95","url":null,"abstract":"This paper describes the architecture and performance of a 0.35 mu, 1 GHZ, CMOS timing generator using array of delay lock loop. The timing generator is implemented as an array of delay locked loops. This architecture enables a timing generator with sub gate delay resolution to be implemented. The proposed delay lock loops uses novel multiplexer based dual phase and frequency detector along with a charge pump where the injected charge approaches zero as the loop approaches lock on the leading edge and the trailing edge of an input clock reference. This greatly reduces the timing jitter, loop locks to both the leading and trailing clock edges as the dual phase and frequency detector along with charge pump converts the phase difference in to voltages. Test results show a timing jitter of less than 20 pS for the DLL (delay lock loop) circuit .The DLL has a dead zone less than 0.01 nS in the phase characteristics and has low phase sensitivity errors. The timing generator is implemented as an array of delay locked loops (Kostamovaara, 2000) which exponentially reduce the locking time. An experimental proto type was simulated at 0.7 mu and 0.35 mu technologies with a supply voltage of 5 V and 3.3 V respectively.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116605238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a way to improve the yield of memory products by selecting the appropriate test strategy for memory Built- in Self-Test (BIST). We argue that by testing the memory through a sequence of test algorithms which differ in their fault coverage, it is possible to bin the memory into multiple yield bins and increase the yield and product revenue. Further, the test strategy must take into consideration the usage model of the memory. Thus, a number of video and audio buffers are used in sequential access mode, but are overtested using conventional memory test algorithms which model a large number of defects which do not impact the operation of the buffers. We propose a binning strategy where memory test algorithms are applied in different order of strictness such that bins have a specific defect / fault grade. Depending on the applications some of these bins need not be discarded but sold at a lower price as the functionality would never catch the fault due to its usage of memory. We introduce the notion of a test map for the on-chip memories in a SoC and provide results of yield simulation on two specific test strategies called "Most Strict First" and "Least Strict First". Our simulations indicate that significant improvements in yield are possible through the adoption of the proposed technique. We show that the BIST controller area and run-time overheads also reduce when information about the usage model of the memory, such as sequential access, is exploited.
{"title":"Memory Yield Improvement through Multiple Test Sequences and Application-Aware Fault Models","authors":"A. Kokrady, C. Ravikumar, N. Chandrachoodan","doi":"10.1109/VLSI.2008.115","DOIUrl":"https://doi.org/10.1109/VLSI.2008.115","url":null,"abstract":"In this paper, we propose a way to improve the yield of memory products by selecting the appropriate test strategy for memory Built- in Self-Test (BIST). We argue that by testing the memory through a sequence of test algorithms which differ in their fault coverage, it is possible to bin the memory into multiple yield bins and increase the yield and product revenue. Further, the test strategy must take into consideration the usage model of the memory. Thus, a number of video and audio buffers are used in sequential access mode, but are overtested using conventional memory test algorithms which model a large number of defects which do not impact the operation of the buffers. We propose a binning strategy where memory test algorithms are applied in different order of strictness such that bins have a specific defect / fault grade. Depending on the applications some of these bins need not be discarded but sold at a lower price as the functionality would never catch the fault due to its usage of memory. We introduce the notion of a test map for the on-chip memories in a SoC and provide results of yield simulation on two specific test strategies called \"Most Strict First\" and \"Least Strict First\". Our simulations indicate that significant improvements in yield are possible through the adoption of the proposed technique. We show that the BIST controller area and run-time overheads also reduce when information about the usage model of the memory, such as sequential access, is exploited.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127187398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The design for manufacturing and yield (DFM&Y) is fast becoming an indispensable consideration in today's SoCs. Most current flows only consider manufacturability and yield at the lowest levels: process, layout and circuit. As such, these metrics are treated as an afterthought. With advanced process nodes, it has become increasingly expensive-and soon prohibitive-to guarantee bit level error free chips. The challenge now is to design reliable systems using chips that may have some faults. This has lead to approaches that consider DFM&Y at the system level where more benefit can be reaped, and to consider the problem across the design layers. This tutorial covers cross layer approach to design for DFM&Y spanning from the application all the way to manufacturing, overviews various techniques being explored today, and demonstrates its effectiveness on key applications including wireless, multimedia and imaging. We believe that this tutorial will benefit a large percentage of the attendees at VLSI Design 2008, and should elicit an excellent response at the VLSI Design 2008 conference, he tutorial is intended for application designers, chip architects, managers, CAD tool developers, researchers and students interested in System-on-Chip design, platform-based design methodologies, and trends in design for manufacturing and yield at the system level. Attendees should have basic (undergraduate-level) knowledge of VLSI Design and SoC design flows. Familiarity with architectural concepts such as IP based design, and applications such as wireless and multimedia is desirable, but not required. No specific knowledge of CAD tools or modeling languages is required for this tutorial.
{"title":"Cross-Layer Approaches to Designing Reliable Systems Using Unreliable Chips","authors":"F. Kurdahi, N. Dutt, A. Eltawil, S. Nassif","doi":"10.1109/VLSI.2008.135","DOIUrl":"https://doi.org/10.1109/VLSI.2008.135","url":null,"abstract":"The design for manufacturing and yield (DFM&Y) is fast becoming an indispensable consideration in today's SoCs. Most current flows only consider manufacturability and yield at the lowest levels: process, layout and circuit. As such, these metrics are treated as an afterthought. With advanced process nodes, it has become increasingly expensive-and soon prohibitive-to guarantee bit level error free chips. The challenge now is to design reliable systems using chips that may have some faults. This has lead to approaches that consider DFM&Y at the system level where more benefit can be reaped, and to consider the problem across the design layers. This tutorial covers cross layer approach to design for DFM&Y spanning from the application all the way to manufacturing, overviews various techniques being explored today, and demonstrates its effectiveness on key applications including wireless, multimedia and imaging. We believe that this tutorial will benefit a large percentage of the attendees at VLSI Design 2008, and should elicit an excellent response at the VLSI Design 2008 conference, he tutorial is intended for application designers, chip architects, managers, CAD tool developers, researchers and students interested in System-on-Chip design, platform-based design methodologies, and trends in design for manufacturing and yield at the system level. Attendees should have basic (undergraduate-level) knowledge of VLSI Design and SoC design flows. Familiarity with architectural concepts such as IP based design, and applications such as wireless and multimedia is desirable, but not required. No specific knowledge of CAD tools or modeling languages is required for this tutorial.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127573659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Satish Anand Verkila, Sivakumar Bondada, B. Amrutur
In this paper, we present dynamic voltage and frequency Managed 256 x 64 SRAM block in 65 nm technology, for frequency ranging from 100 MHz to 1 GHz. The total energy is minimized for any operating frequency in the above range and leakage energy is minimized during standby mode. Since noise margin of SRAM cell deteriorates at low voltages, we propose static noise margin improvement circuitry, which symmetrizes the SRAM cell by controlling the body bias of pull down NMOS transistor. We used a 9T SRAM cell that isolates Read and hold noise margin and has less leakage. We have implemented an efficient technique of pushing address decoder into zigzag- super-cut-off in stand-by mode without affecting its performance in active mode of operation. The read bit line (RBL) voltage drop is controlled and pre-charge of bit lines is done only when needed for reducing power wastage.
在本文中,我们提出了动态电压和频率管理的256 x 64 SRAM块在65纳米技术,频率范围从100 MHz到1 GHz。在上述范围内的任何工作频率下,总能量最小,待机模式下泄漏能量最小。由于SRAM电池的噪声裕度在低电压下恶化,我们提出了静态噪声裕度改善电路,该电路通过控制下拉NMOS晶体管的体偏置来使SRAM电池对称。我们使用9T SRAM单元,隔离读取和保持噪声裕度,泄漏更少。我们实现了一种有效的将地址解码器在待机模式下推入之字形超截止的技术,而不影响其在主动工作模式下的性能。控制读位线(RBL)电压降,仅在需要时才对位线进行预充电,以减少功率浪费。
{"title":"A 100MHz to 1GHz, 0.35V to 1.5V Supply 256 x 64 SRAM Block Using Symmetrized 9T SRAM Cell with Controlled Read","authors":"Satish Anand Verkila, Sivakumar Bondada, B. Amrutur","doi":"10.1109/VLSI.2008.89","DOIUrl":"https://doi.org/10.1109/VLSI.2008.89","url":null,"abstract":"In this paper, we present dynamic voltage and frequency Managed 256 x 64 SRAM block in 65 nm technology, for frequency ranging from 100 MHz to 1 GHz. The total energy is minimized for any operating frequency in the above range and leakage energy is minimized during standby mode. Since noise margin of SRAM cell deteriorates at low voltages, we propose static noise margin improvement circuitry, which symmetrizes the SRAM cell by controlling the body bias of pull down NMOS transistor. We used a 9T SRAM cell that isolates Read and hold noise margin and has less leakage. We have implemented an efficient technique of pushing address decoder into zigzag- super-cut-off in stand-by mode without affecting its performance in active mode of operation. The read bit line (RBL) voltage drop is controlled and pre-charge of bit lines is done only when needed for reducing power wastage.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116807025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}