Shiv Kumar, V. Chandratre, S. Mohammed, C. K. Pithawa
Non-Manhattan CMOS devices are gaining attention because of their special properties. In this paper waffle and closed gate structures are discussed and issues related to their use in CAD tools are addressed. The waffle devices are used where large aspect ratio with low parasitic capacitances and lower silicon overhead is required. The closed-gate layout is used in rad-hard digital libraries due to their edgeless geometry. These devices are difficult to handle because Process Design Kit (PDK) is developed for Manhattan geometries. This paper discusses how the models for Manhattan devices can’t be extended to predict accurate I-V characteristics of the non-manhattan devices. An analytical model is developed to map non-Manhattan devices to equivalent Manhattan devices. A test chip in 0.7?m CMOS technology was developed to validate the concept. The PDK was modified to introduce these structures in normal analog design flow.
{"title":"Extraction of Aspect Ratio for Non-Manhattan CMOS Devices","authors":"Shiv Kumar, V. Chandratre, S. Mohammed, C. K. Pithawa","doi":"10.1109/VLSID.2011.71","DOIUrl":"https://doi.org/10.1109/VLSID.2011.71","url":null,"abstract":"Non-Manhattan CMOS devices are gaining attention because of their special properties. In this paper waffle and closed gate structures are discussed and issues related to their use in CAD tools are addressed. The waffle devices are used where large aspect ratio with low parasitic capacitances and lower silicon overhead is required. The closed-gate layout is used in rad-hard digital libraries due to their edgeless geometry. These devices are difficult to handle because Process Design Kit (PDK) is developed for Manhattan geometries. This paper discusses how the models for Manhattan devices can’t be extended to predict accurate I-V characteristics of the non-manhattan devices. An analytical model is developed to map non-Manhattan devices to equivalent Manhattan devices. A test chip in 0.7?m CMOS technology was developed to validate the concept. The PDK was modified to introduce these structures in normal analog design flow.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116725033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Rithe, Sharon Chou, Jie Gu, Alice Wang, Satyendra R. Datla, G. Gammie, D. Buss, A. Chandrakasan
When CMOS is operated at a supply voltage of 0.5V and below, Random Do pant Fluctuations (RDFs) result in a stochastic component of logic delay that can be comparable to the nominal delay. Moreover, the Probability Density Function (PDF) of this stochastic delay can be highly non-Gaussian. The Non-Linear, Operating Point Analysis of Local Variations (NLOPALV) technique has been shown to be accurate and computationally efficient in simulating any point on the delay PDF of a logic Timing Path (TP). This paper applies the NLOPALV approach to characterizing the stochastic delay of logic cells. NLOPALV theory is presented, and NLOPALV is used to characterize a cell library designed in 28 nm CMOS. NLOPALV is accurate to within 5% compared to SPICE-based Monte Carlo analysis.
{"title":"Cell Library Characterization at Low Voltage Using Non-linear Operating Point Analysis of Local Variations","authors":"R. Rithe, Sharon Chou, Jie Gu, Alice Wang, Satyendra R. Datla, G. Gammie, D. Buss, A. Chandrakasan","doi":"10.1109/VLSID.2011.43","DOIUrl":"https://doi.org/10.1109/VLSID.2011.43","url":null,"abstract":"When CMOS is operated at a supply voltage of 0.5V and below, Random Do pant Fluctuations (RDFs) result in a stochastic component of logic delay that can be comparable to the nominal delay. Moreover, the Probability Density Function (PDF) of this stochastic delay can be highly non-Gaussian. The Non-Linear, Operating Point Analysis of Local Variations (NLOPALV) technique has been shown to be accurate and computationally efficient in simulating any point on the delay PDF of a logic Timing Path (TP). This paper applies the NLOPALV approach to characterizing the stochastic delay of logic cells. NLOPALV theory is presented, and NLOPALV is used to characterize a cell library designed in 28 nm CMOS. NLOPALV is accurate to within 5% compared to SPICE-based Monte Carlo analysis.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131924997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Zhang, R. Häußler, M. Olbrich, H. Kinzelbach, E. Barke
In statistical analysis, modeling circuit performance for non-linear problems demands large computational effort. In semi-custom design, statistical leakage library characterization is a highly complex yet fundamental task. The log-linear model provides an unacceptable poor accuracy in modeling a large number of standard cells. To improve model quality, simply increasing model order is not practicable because it leads to an exponential increase in run time. Instead of assuming one model type for the entire library beforehand, we developed an approach generating a model for each cell individually. The key contribution is the use of a cross term matrix and an active sampling scheme, which significantly reduces model size and model generation time. The effectiveness of our approach is clearly shown by experiments on industrial standard cell libraries. As we regard the circuit block as a black box, our approach is suitable for modeling various circuit performances.
{"title":"A Statistical Learning Based Modeling Approach and Its Application in Leakage Library Characterization","authors":"Min Zhang, R. Häußler, M. Olbrich, H. Kinzelbach, E. Barke","doi":"10.1109/VLSID.2011.23","DOIUrl":"https://doi.org/10.1109/VLSID.2011.23","url":null,"abstract":"In statistical analysis, modeling circuit performance for non-linear problems demands large computational effort. In semi-custom design, statistical leakage library characterization is a highly complex yet fundamental task. The log-linear model provides an unacceptable poor accuracy in modeling a large number of standard cells. To improve model quality, simply increasing model order is not practicable because it leads to an exponential increase in run time. Instead of assuming one model type for the entire library beforehand, we developed an approach generating a model for each cell individually. The key contribution is the use of a cross term matrix and an active sampling scheme, which significantly reduces model size and model generation time. The effectiveness of our approach is clearly shown by experiments on industrial standard cell libraries. As we regard the circuit block as a black box, our approach is suitable for modeling various circuit performances.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134084382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SoC designers typically use a processor simulator to generate a memory trace and apply the generated trace to a memory simulator in order to collect the performance statistics of a complete system. This is an inaccurate process for most applications, making it difficult to optimize the processor and memory configurations. In this paper, we study the problems encountered in the typical simulation approach and propose a methodology which utilizes an interface layer component to link the processor simulator and memory simulator seamlessly. The interface layer component presented in this paper can be used as the connector between the processor module and memory module in building an execution-driven approach which can be applied to process run-time memory requests rather than the traditional trace driven simulation approaches. By applying the proposed interface layer component to link the processor simulator and memory simulator, the estimated performance statistics of the system and the average power consumption of the memory system can be collected with high accuracy. We prove the necessity of our approach by evaluating six benchmarks. Over these benchmarks, there is an 80% variation in the choice of memory latency to achieve the most accurate power consumption and a 16% variation in the choice of memory latency to achieve the most accurate execution time. The increase in accuracy comes at an average increase in simulation time of 13.5%.
{"title":"Realizing Cycle Accurate Processor Memory Simulation via Interface Abstraction","authors":"S. Min, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/VLSID.2011.36","DOIUrl":"https://doi.org/10.1109/VLSID.2011.36","url":null,"abstract":"SoC designers typically use a processor simulator to generate a memory trace and apply the generated trace to a memory simulator in order to collect the performance statistics of a complete system. This is an inaccurate process for most applications, making it difficult to optimize the processor and memory configurations. In this paper, we study the problems encountered in the typical simulation approach and propose a methodology which utilizes an interface layer component to link the processor simulator and memory simulator seamlessly. The interface layer component presented in this paper can be used as the connector between the processor module and memory module in building an execution-driven approach which can be applied to process run-time memory requests rather than the traditional trace driven simulation approaches. By applying the proposed interface layer component to link the processor simulator and memory simulator, the estimated performance statistics of the system and the average power consumption of the memory system can be collected with high accuracy. We prove the necessity of our approach by evaluating six benchmarks. Over these benchmarks, there is an 80% variation in the choice of memory latency to achieve the most accurate power consumption and a 16% variation in the choice of memory latency to achieve the most accurate execution time. The increase in accuracy comes at an average increase in simulation time of 13.5%.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133894388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A framework for efficiently capturing the rich micro architectural space of a substantial Matlab like library of DSP functions for a regular Coarse Grain Reconfigurable Architecture (CGRA) fabric is proposed. A subset of C has been proposed to model the DSP functions and an automatic tool to generate the configware for the CGRA fabric developed. A method to estimate the average energy of such functions is reported with error margin of less than 3%. Such a framework is proposed as the basis for raising the abstraction to automate synthesis of the entire physical layers.
{"title":"A Library Development Framework for a Coarse Grain Reconfigurable Architecture","authors":"Omer Malik, A. Hemani, M. A. Shami","doi":"10.1109/VLSID.2011.54","DOIUrl":"https://doi.org/10.1109/VLSID.2011.54","url":null,"abstract":"A framework for efficiently capturing the rich micro architectural space of a substantial Matlab like library of DSP functions for a regular Coarse Grain Reconfigurable Architecture (CGRA) fabric is proposed. A subset of C has been proposed to model the DSP functions and an automatic tool to generate the configware for the CGRA fabric developed. A method to estimate the average energy of such functions is reported with error margin of less than 3%. Such a framework is proposed as the basis for raising the abstraction to automate synthesis of the entire physical layers.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125975064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaobo Zhang, X. Kavousianos, K. Chakrabarty, Y. Tsiatouhas
Multi-threshold CMOS is a very effective technique for reducing standby leakage power during long periods of inactivity. Recently, a power-gating scheme was presented to support multiple power-off modes and reduce the leakage power during short periods of inactivity. However, this scheme suffers from high sensitivity to process variations, which impedes manufacturability and also limits its applicability to at most two intermediate power-off modes. We propose a new power-gating technique that is tolerant to process variations and scalable to more than two intermediate power-off modes. In addition, the proposed design requires minimum design effort and offers greater power reduction and smaller area cost than the previous method. Analysis and extensive simulation results demonstrate the effectiveness of the proposed design.
{"title":"A Robust and Reconfigurable Multi-mode Power Gating Architecture","authors":"Zhaobo Zhang, X. Kavousianos, K. Chakrabarty, Y. Tsiatouhas","doi":"10.1109/VLSID.2011.29","DOIUrl":"https://doi.org/10.1109/VLSID.2011.29","url":null,"abstract":"Multi-threshold CMOS is a very effective technique for reducing standby leakage power during long periods of inactivity. Recently, a power-gating scheme was presented to support multiple power-off modes and reduce the leakage power during short periods of inactivity. However, this scheme suffers from high sensitivity to process variations, which impedes manufacturability and also limits its applicability to at most two intermediate power-off modes. We propose a new power-gating technique that is tolerant to process variations and scalable to more than two intermediate power-off modes. In addition, the proposed design requires minimum design effort and offers greater power reduction and smaller area cost than the previous method. Analysis and extensive simulation results demonstrate the effectiveness of the proposed design.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"31 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120811760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Banerjee, Vishwanath Natarajan, Shreyas Sen, A. Chatterjee, G. Srinivasan, S. Bhattacharya
Test time and test complexity reduction has become a critical challenge in modern RF testing. Prior “alternative” test methods have achieved fast testing at the cost of using supervised learning algorithms that require “training”. In contrast, behavioral model parameter estimation based test methods require the use of accurate models but no “training” is necessary, reducing test deployment costs. In this work, a new test generation approach is proposed that allows behavioral model parameter estimation to be performed from a single optimized OFDM data frame. A genetic multi-tone test stimulus optimization algorithm is developed to maximize the accuracy with which a nonlinear solver can determine RF transceiver model parameters from raw downconverted test response data. The transceiver model proposed is the most comprehensive to date and includes AM-PM distortion and 5th order nonlinearity effects. Simulation results show that using the optimized multitone test stimulus, all the model parameters can be computed accurately using a single data acquisition (4X-5X faster than prior parameter estimation techniques and comparable to alternative test times). Data from an experiment performed on a hardware prototype validates the proposed concept.
{"title":"Optimized Multitone Test Stimulus Driven Diagnosis of RF Transceivers Using Model Parameter Estimation","authors":"A. Banerjee, Vishwanath Natarajan, Shreyas Sen, A. Chatterjee, G. Srinivasan, S. Bhattacharya","doi":"10.1109/VLSID.2011.65","DOIUrl":"https://doi.org/10.1109/VLSID.2011.65","url":null,"abstract":"Test time and test complexity reduction has become a critical challenge in modern RF testing. Prior “alternative” test methods have achieved fast testing at the cost of using supervised learning algorithms that require “training”. In contrast, behavioral model parameter estimation based test methods require the use of accurate models but no “training” is necessary, reducing test deployment costs. In this work, a new test generation approach is proposed that allows behavioral model parameter estimation to be performed from a single optimized OFDM data frame. A genetic multi-tone test stimulus optimization algorithm is developed to maximize the accuracy with which a nonlinear solver can determine RF transceiver model parameters from raw downconverted test response data. The transceiver model proposed is the most comprehensive to date and includes AM-PM distortion and 5th order nonlinearity effects. Simulation results show that using the optimized multitone test stimulus, all the model parameters can be computed accurately using a single data acquisition (4X-5X faster than prior parameter estimation techniques and comparable to alternative test times). Data from an experiment performed on a hardware prototype validates the proposed concept.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132614342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aggressive speed and voltage binning schemes are widely used in the industry to combat process variation. Generating structural tests that are effective for speed and voltage binning is very important to reduce cost and improve quality. We observe that hazards are common along critical paths of many industrial designs and conventional path-delay ATPG is ineffective for paths with static hazards. We propose a directed transition fault ATPG scheme that works with commercial ATPG tools to test the critical paths with hazards. The proposed scheme is implemented on industrial designs and silicon results are presented.
{"title":"Hazard-Aware Directed Transition Fault ATPG for Effective Critical Path Test","authors":"V. Devanathan, Ishaan Santhosh Shah","doi":"10.1109/VLSID.2011.42","DOIUrl":"https://doi.org/10.1109/VLSID.2011.42","url":null,"abstract":"Aggressive speed and voltage binning schemes are widely used in the industry to combat process variation. Generating structural tests that are effective for speed and voltage binning is very important to reduce cost and improve quality. We observe that hazards are common along critical paths of many industrial designs and conventional path-delay ATPG is ineffective for paths with static hazards. We propose a directed transition fault ATPG scheme that works with commercial ATPG tools to test the critical paths with hazards. The proposed scheme is implemented on industrial designs and silicon results are presented.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121671815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a novel floor planning algorithm for GPUs. Floor planning is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Thread (SIMT) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floor plan solution space, where we evaluate concurrent moves on a given floor plan. We illustrate several performance optimization techniques for this algorithm in GPUs. Compared to the sequential algorithm, our techniques achieve 4-30X speedup for a range of MCNC benchmarks, while delivering comparable or better solution quality.
{"title":"A GPU Algorithm for IC Floorplanning: Specification, Analysis and Optimization","authors":"Yiding Han, Koushik Chakraborty, Sanghamitra Roy, Vilasita Kuntamukkala","doi":"10.1109/VLSID.2011.19","DOIUrl":"https://doi.org/10.1109/VLSID.2011.19","url":null,"abstract":"In this paper, we propose a novel floor planning algorithm for GPUs. Floor planning is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Thread (SIMT) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floor plan solution space, where we evaluate concurrent moves on a given floor plan. We illustrate several performance optimization techniques for this algorithm in GPUs. Compared to the sequential algorithm, our techniques achieve 4-30X speedup for a range of MCNC benchmarks, while delivering comparable or better solution quality.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114799886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael B. Henry, Robert Lyerly, L. Nazhandali, A. Fruehling, D. Peroulis
For periodic and event-driven applications with long standby times, controlling leakage is essential. This paper investigates using MEMS switches for power gating processors, which eliminates standby leakage power and allows for highly scalable processing. We show that when power gating with a MEMS switch, low technology nodes and low threshold voltages, which offer low switching energy and high speeds, are optimal. We also compare a MEMS-gated processor to two recent low leakage processors and show that it is ideal for applications with 100+ ms standby times. With CMOS compatibility on the horizon, MEMS switches are an attractive option for low-leakage applications.
{"title":"MEMS-Based Power Gating for Highly Scalable Periodic and Event-Driven Processing","authors":"Michael B. Henry, Robert Lyerly, L. Nazhandali, A. Fruehling, D. Peroulis","doi":"10.1109/VLSID.2011.66","DOIUrl":"https://doi.org/10.1109/VLSID.2011.66","url":null,"abstract":"For periodic and event-driven applications with long standby times, controlling leakage is essential. This paper investigates using MEMS switches for power gating processors, which eliminates standby leakage power and allows for highly scalable processing. We show that when power gating with a MEMS switch, low technology nodes and low threshold voltages, which offer low switching energy and high speeds, are optimal. We also compare a MEMS-gated processor to two recent low leakage processors and show that it is ideal for applications with 100+ ms standby times. With CMOS compatibility on the horizon, MEMS switches are an attractive option for low-leakage applications.","PeriodicalId":371062,"journal":{"name":"2011 24th Internatioal Conference on VLSI Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125974598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}