In nanometer regime, IC designers are struggling between significant variation effects and tight power constraints. The conventional approach - using timing safety margin, consumes power continuously to guard against low probability timing variations. Such power inefficiency is largely eliminated in the Razor technology which detects and corrects variation induced timing errors at runtime. However, the error correction scheme of Razor causes pipeline stalling/flushing and therefore is not preferred in real-time systems or sequential circuits with feedback loops. We propose an elastic timing scheme which can correct timing errors without stalling/flushing pipeline. This is achieved by dynamically boosting circuit speed when timing error occurs. A dynamic clock skew shifting technique is suggested to reduce the boosting cost. An optimization algorithm is also provided to minimize the cost overhead. Compared to conventional safety margin based approach, the elastic timing scheme can reduce power dissipation by 20 % - 27 % on ISCAS89 sequential circuits while retaining similar variation tolerance. After optimization, the boosting is needed for only a small portion of entire circuit. As a result, the area overhead is usually less than 5 %.
{"title":"Elastic Timing Scheme for Energy-Efficient and Robust Performance","authors":"Rupak Samanta, G. Venkataraman, N. Shah, Jiang Hu","doi":"10.1109/ISQED.2008.82","DOIUrl":"https://doi.org/10.1109/ISQED.2008.82","url":null,"abstract":"In nanometer regime, IC designers are struggling between significant variation effects and tight power constraints. The conventional approach - using timing safety margin, consumes power continuously to guard against low probability timing variations. Such power inefficiency is largely eliminated in the Razor technology which detects and corrects variation induced timing errors at runtime. However, the error correction scheme of Razor causes pipeline stalling/flushing and therefore is not preferred in real-time systems or sequential circuits with feedback loops. We propose an elastic timing scheme which can correct timing errors without stalling/flushing pipeline. This is achieved by dynamically boosting circuit speed when timing error occurs. A dynamic clock skew shifting technique is suggested to reduce the boosting cost. An optimization algorithm is also provided to minimize the cost overhead. Compared to conventional safety margin based approach, the elastic timing scheme can reduce power dissipation by 20 % - 27 % on ISCAS89 sequential circuits while retaining similar variation tolerance. After optimization, the boosting is needed for only a small portion of entire circuit. As a result, the area overhead is usually less than 5 %.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126795864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The primary goal when using physical verification tools is to achieve the best performance at the lowest cost, both in resources and time. Physical verification tools rely on multiple enabling technologies to contribute to runtime and turnaround time reduction. Using differing combinations of architecture and scaling, this paper compares and contrasts three physical verification approaches to determine the combination of factors most likely to produce the desired results in a production environment.
{"title":"Architecting for Physical Verification Performance and Scaling","authors":"J. Ferguson, R. Todd","doi":"10.1109/ISQED.2008.109","DOIUrl":"https://doi.org/10.1109/ISQED.2008.109","url":null,"abstract":"The primary goal when using physical verification tools is to achieve the best performance at the lowest cost, both in resources and time. Physical verification tools rely on multiple enabling technologies to contribute to runtime and turnaround time reduction. Using differing combinations of architecture and scaling, this paper compares and contrasts three physical verification approaches to determine the combination of factors most likely to produce the desired results in a production environment.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126827912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Cardoso, L. Rosa, F. Marques, R. Ribas, A. Reis
This paper presents a method for speeding-up ASICs by transistor reordering. The proposed method can be applied to a variety of logic styles and transistor topologies. The rationale of the obtained gains is explained through logical effort concepts. When applied to circuits based on 4-input networks, which is the case of many structured-ASIC or FPGA technologies, significant performance gains are obtained at a small area expense. This observation points out that our method can be of special interest when migrating FPGAs to ASICs. The logical effort effects on networks derived from BDDs illustrated in this paper can be exploited in a much broader range of designs.
{"title":"Speed-Up of ASICs Derived from FPGAs by Transistor Network Synthesis Including Reordering","authors":"T. Cardoso, L. Rosa, F. Marques, R. Ribas, A. Reis","doi":"10.1109/ISQED.2008.168","DOIUrl":"https://doi.org/10.1109/ISQED.2008.168","url":null,"abstract":"This paper presents a method for speeding-up ASICs by transistor reordering. The proposed method can be applied to a variety of logic styles and transistor topologies. The rationale of the obtained gains is explained through logical effort concepts. When applied to circuits based on 4-input networks, which is the case of many structured-ASIC or FPGA technologies, significant performance gains are obtained at a small area expense. This observation points out that our method can be of special interest when migrating FPGAs to ASICs. The logical effort effects on networks derived from BDDs illustrated in this paper can be exploited in a much broader range of designs.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127147437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the positive feedback loop between power grid Joule heating and the linear temperature dependence of resistivity, non-uniform temperature profiles on the power grid in high-performance IC influence the IR drop in the power grid. Lack of accurate evaluation of thermal effect on the IR drop in the power grid may lead to over-design; or worse, underestimates the IR drop due to increased local temperature. This paper presents a method to compute the temperature-dependent IR drop on the power grid extremely fast. We propose a novel thermal model and a mathematical formulation to compute the temperature profiles on the power grid efficiently. Compared to the traditional thermal lumped model, which gives a much larger thermal network than the original power grid (20 times more nodes), our model takes advantage of power grid properties, and reduces the size of the thermal equivalent network dramatically (only 13% of the size of the power grid). Iterative methods [16] are used to efficiently update the IR drops based on the new temperature profile. Experimental results show that without considering temperature impact, the worst IR drop analysis can have error up to 10%.
{"title":"Thermal-Aware IR Drop Analysis in Large Power Grid","authors":"Yu Zhong, Martin D. F. Wong","doi":"10.1109/ISQED.2008.57","DOIUrl":"https://doi.org/10.1109/ISQED.2008.57","url":null,"abstract":"Due to the positive feedback loop between power grid Joule heating and the linear temperature dependence of resistivity, non-uniform temperature profiles on the power grid in high-performance IC influence the IR drop in the power grid. Lack of accurate evaluation of thermal effect on the IR drop in the power grid may lead to over-design; or worse, underestimates the IR drop due to increased local temperature. This paper presents a method to compute the temperature-dependent IR drop on the power grid extremely fast. We propose a novel thermal model and a mathematical formulation to compute the temperature profiles on the power grid efficiently. Compared to the traditional thermal lumped model, which gives a much larger thermal network than the original power grid (20 times more nodes), our model takes advantage of power grid properties, and reduces the size of the thermal equivalent network dramatically (only 13% of the size of the power grid). Iterative methods [16] are used to efficiently update the IR drops based on the new temperature profile. Experimental results show that without considering temperature impact, the worst IR drop analysis can have error up to 10%.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126177514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As CMOS technology scales continually, interconnect power has become a significant part of total chip power. Without compromising performance, timing slacks can be utilized to optimize interconnect power efficiently. The optimization of total interconnect power is affected not only by the properties of each interconnect as well as the timing constraint, but also by the circuit topology. In this paper, we introduce a novel slack distribution algorithm IPOSA to optimize interconnect power efficiently. A piecewise linear model is proposed to quantify the relationship between interconnect power reduction and timing slack amount, considering the interconnect length and the switching activity. Monte Carlo analysis shows our piecewise model is accurate enough that the average error is 1.7%. Based on the piecewise linearity of the model, we propose an iterative slack distribution algorithm which minimizes total interconnect power with given performance constraint. The experimental results show that our algorithm can achieve 41.7% interconnect power reduction on average.
{"title":"IPOSA: A Novel Slack Distribution Algorithm for Interconnect Power Optimization","authors":"Xiang Qiu, Yuchun Ma, Xiangqing He, Xianlong Hong","doi":"10.1109/ISQED.2008.96","DOIUrl":"https://doi.org/10.1109/ISQED.2008.96","url":null,"abstract":"As CMOS technology scales continually, interconnect power has become a significant part of total chip power. Without compromising performance, timing slacks can be utilized to optimize interconnect power efficiently. The optimization of total interconnect power is affected not only by the properties of each interconnect as well as the timing constraint, but also by the circuit topology. In this paper, we introduce a novel slack distribution algorithm IPOSA to optimize interconnect power efficiently. A piecewise linear model is proposed to quantify the relationship between interconnect power reduction and timing slack amount, considering the interconnect length and the switching activity. Monte Carlo analysis shows our piecewise model is accurate enough that the average error is 1.7%. Based on the piecewise linearity of the model, we propose an iterative slack distribution algorithm which minimizes total interconnect power with given performance constraint. The experimental results show that our algorithm can achieve 41.7% interconnect power reduction on average.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130047820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Design complexity is ever increasing with multi- mode, statistical timing analysis, multi-vt/VDD low power and multi-core performance based type of designs. IEEE 1800 system verilog (Ref 1) is a natural smooth transition language to verilog (Refi and 3) for system level design and verification. Verilog RTL has been popularly used for many design tape outs. System verilog (SV) extensive support exists in verification tools viz. simulators, formal for various powerful SV specific design constructs. It is envisaged that SV will be used for design tape outs soon as many design houses started using SV specific RTL constructs for system designs involving high levels of design data abstractions for various design application keeping in view of verification support. This paper analyzes on various SV design specific constructs for design quality of results (QOR) improvement. The specific constructs discussed for design QOR improvements are 1) operator overloading using user defined types to bring in efficient implementation of data path operators like multiplier, adder, shift,.. 2) Parameterized module interface for different sized datapath, memory, fifo, register files.. 3) Configuration to bind a particular efficient architecture to a module based on QOR requirement 4) System level modules interface and arbitration using "interface"construct. 5) Multiple clock domain definition and interface. 6) IEEE1801 UPF low power design intent flow.
{"title":"System Verilog for Quality of Results (QoR)","authors":"Ravi Surepeddi","doi":"10.1109/ISQED.2008.52","DOIUrl":"https://doi.org/10.1109/ISQED.2008.52","url":null,"abstract":"Design complexity is ever increasing with multi- mode, statistical timing analysis, multi-vt/VDD low power and multi-core performance based type of designs. IEEE 1800 system verilog (Ref 1) is a natural smooth transition language to verilog (Refi and 3) for system level design and verification. Verilog RTL has been popularly used for many design tape outs. System verilog (SV) extensive support exists in verification tools viz. simulators, formal for various powerful SV specific design constructs. It is envisaged that SV will be used for design tape outs soon as many design houses started using SV specific RTL constructs for system designs involving high levels of design data abstractions for various design application keeping in view of verification support. This paper analyzes on various SV design specific constructs for design quality of results (QOR) improvement. The specific constructs discussed for design QOR improvements are 1) operator overloading using user defined types to bring in efficient implementation of data path operators like multiplier, adder, shift,.. 2) Parameterized module interface for different sized datapath, memory, fifo, register files.. 3) Configuration to bind a particular efficient architecture to a module based on QOR requirement 4) System level modules interface and arbitration using \"interface\"construct. 5) Multiple clock domain definition and interface. 6) IEEE1801 UPF low power design intent flow.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127129369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a new finite-point based approach for efficient characterization of CMOS gate. The new method identifies several key points on the I-V and Q-V curves to define the behavior of the static CMOS gate. It targets performance metrics such as timing, short-circuit power and leakage in the presence of process variations. Experimental results validate the accuracy of the new approach and yields simulation speeds more than 15X faster than BSIM based library characterization.
{"title":"Finite-Point Gate Model for Fast Timing and Power Analysis","authors":"Dinesh Ganesan, A. Mitev, Janet Roveda, Yu Cao","doi":"10.1109/ISQED.2008.17","DOIUrl":"https://doi.org/10.1109/ISQED.2008.17","url":null,"abstract":"This paper proposes a new finite-point based approach for efficient characterization of CMOS gate. The new method identifies several key points on the I-V and Q-V curves to define the behavior of the static CMOS gate. It targets performance metrics such as timing, short-circuit power and leakage in the presence of process variations. Experimental results validate the accuracy of the new approach and yields simulation speeds more than 15X faster than BSIM based library characterization.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131638774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a new asynchronous circuit design is presented. A special technique that enables fast forwarding is applied to the circuits, and the forward transition improves to less than 2. The handshaking process and cycle time of the asynchronous circuits are analyzed, and its performance and functionality under fabrication and temperature variations are evaluated through Monte Carlo simulations in 65 nm technology. The proposed asynchronous circuits are compared to the static and domino logic circuits to assess their delay variations and functional success rates.
{"title":"An Asynchronous Circuit Design with Fast Forwarding Technique at Advanced Technology Node","authors":"Chin-Khai Tang, Chun-Yen Lin, Yi-Chang Lu","doi":"10.1109/ISQED.2008.117","DOIUrl":"https://doi.org/10.1109/ISQED.2008.117","url":null,"abstract":"In this paper, a new asynchronous circuit design is presented. A special technique that enables fast forwarding is applied to the circuits, and the forward transition improves to less than 2. The handshaking process and cycle time of the asynchronous circuits are analyzed, and its performance and functionality under fabrication and temperature variations are evaluated through Monte Carlo simulations in 65 nm technology. The proposed asynchronous circuits are compared to the static and domino logic circuits to assess their delay variations and functional success rates.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131376485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Qin, Animesh Kumar, K. Ramchandran, J. Rabaey, P. Ishwar
We present an error-tolerant SRAM design optimized for ultra-low standby power. Using SRAM cell optimization techniques, the maximum data retention voltage (DRV) of a 90 nm 26 kb SRAM module is reduced from 550 mV to 220 mV. A novel error-tolerant architecture further reduces the minimum static-error-free VDD to 155 mV. With a 100 mV noise margin, a 255 mV standby VDD effectively reduces the SRAM leakage power by 98% compared to the typical standby at 1 V VDD.
{"title":"Error-Tolerant SRAM Design for Ultra-Low Power Standby Operation","authors":"H. Qin, Animesh Kumar, K. Ramchandran, J. Rabaey, P. Ishwar","doi":"10.1109/ISQED.2008.38","DOIUrl":"https://doi.org/10.1109/ISQED.2008.38","url":null,"abstract":"We present an error-tolerant SRAM design optimized for ultra-low standby power. Using SRAM cell optimization techniques, the maximum data retention voltage (DRV) of a 90 nm 26 kb SRAM module is reduced from 550 mV to 220 mV. A novel error-tolerant architecture further reduces the minimum static-error-free VDD to 155 mV. With a 100 mV noise margin, a 255 mV standby VDD effectively reduces the SRAM leakage power by 98% compared to the typical standby at 1 V VDD.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"126 33","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114052798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The floorplanning for an L-shaped layout problem can be formulated as a global optimization problem. In this paper, we will explore the feasibility of finding a globally optimal solution for such a problem by using an approximation technique. The problem formulation is first explained through a simple example with two L-shaped cells. Then, it is illustrated that the solution obtained by such an approximation can be indeed in the neighborhood of a global optimal solution. Numerical examples are used to demonstrate the possibility of using such an approach to obtain a global optimal solution.
{"title":"On the Feasibility of Obtaining a Globally Optimal Floorplanning for an L-shaped Layout Problem","authors":"T. Chang, Manish Kumar, Teng-Sheng Moh, C. Tseng","doi":"10.1109/ISQED.2008.27","DOIUrl":"https://doi.org/10.1109/ISQED.2008.27","url":null,"abstract":"The floorplanning for an L-shaped layout problem can be formulated as a global optimization problem. In this paper, we will explore the feasibility of finding a globally optimal solution for such a problem by using an approximation technique. The problem formulation is first explained through a simple example with two L-shaped cells. Then, it is illustrated that the solution obtained by such an approximation can be indeed in the neighborhood of a global optimal solution. Numerical examples are used to demonstrate the possibility of using such an approach to obtain a global optimal solution.","PeriodicalId":243121,"journal":{"name":"9th International Symposium on Quality Electronic Design (isqed 2008)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122435242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}