The introduction of clock skew at an edge-triggered flip-flop has an effect that is similar to the movement of flip-flop across combinational logic module boundaries, and these are continuous and discrete optimizations with the same effect. While this fact has been recognized before, this work, for the first time, utilizes this information to find a minimum/specified period retiming efficiently. The clock period is guaranteed to be at most one gate delay larger than a tight lower bound on the optimal clock period; this bound is achievable using a combination of intentional skew and retiming. All ISCAS89 circuits can be retimed in a few minutes by this algorithm.
{"title":"A Fresh Look at Retiming via Clock Skew Optimization","authors":"Sachin S. Sapatnekar Rahul B. Deokar","doi":"10.1109/dac.1995.249965","DOIUrl":"https://doi.org/10.1109/dac.1995.249965","url":null,"abstract":"The introduction of clock skew at an edge-triggered flip-flop has an effect that is similar to the movement of flip-flop across combinational logic module boundaries, and these are continuous and discrete optimizations with the same effect. While this fact has been recognized before, this work, for the first time, utilizes this information to find a minimum/specified period retiming efficiently. The clock period is guaranteed to be at most one gate delay larger than a tight lower bound on the optimal clock period; this bound is achievable using a combination of intentional skew and retiming. All ISCAS89 circuits can be retimed in a few minutes by this algorithm.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134335745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, it has been shown that retiming has a very strong impact on the run time of sequential, structural automatic test pattern generators (ATPGs), as well as the levels of fault coverage and fault efficiency attained. In this paper, we show that retiming preserves testability with respect to a single stuck-at fault test set by adding a prefix sequence of a pre-determined number of arbitrary input vectors. Experimental results show that high fault coverages can be achieved on high performance circuits optimized by retiming with a much less CPU time (a reduction of two orders of magnitude in several instances) than if ATPG is attempted directly on those circuits.
{"title":"On Test Set Preservation of Retimed Circuits","authors":"A. El-Maleh, T. E. Marchok, J. Rajski, W. Maly","doi":"10.1145/217474.217526","DOIUrl":"https://doi.org/10.1145/217474.217526","url":null,"abstract":"Recently, it has been shown that retiming has a very strong impact on the run time of sequential, structural automatic test pattern generators (ATPGs), as well as the levels of fault coverage and fault efficiency attained. In this paper, we show that retiming preserves testability with respect to a single stuck-at fault test set by adding a prefix sequence of a pre-determined number of arbitrary input vectors. Experimental results show that high fault coverages can be achieved on high performance circuits optimized by retiming with a much less CPU time (a reduction of two orders of magnitude in several instances) than if ATPG is attempted directly on those circuits.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131328052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Liao, S. Devadas, K. Keutzer, S. Tjiang, Albert R. Wang
We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented.
{"title":"Code Optimization Techniques for Embedded DSP Microprocessors","authors":"S. Liao, S. Devadas, K. Keutzer, S. Tjiang, Albert R. Wang","doi":"10.1145/217474.217596","DOIUrl":"https://doi.org/10.1145/217474.217596","url":null,"abstract":"We address the problem of code optimization for embedded DSP microprocessors. Such processors (e.g., those in the TMS320 series) have highly irregular datapaths, and conventional code generation methods typically result in inefficient code. In this paper we formulate and solve some optimization problems that arise in code generation for processors with irregular datapaths. In addition to instruction scheduling and register allocation, we also formulate the accumulator spilling and mode selection problems that arise in DSP microprocessors. We present optimal and heuristic algorithms that determine an instruction schedule simultaneously optimizing accumulator spilling and mode selection. Experimental results are presented.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124801025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a set cover based approach (SCP) to multi-way partitioning for minimum delay for Look-Up Table based FPGAs. SCP minimizes the number of chip-crossings on each circuit path with minimum logic duplication costs to achieve implementations with minimum delay and minimum number of chips. The overall complexity of SCP is (V2). Experimental results demonstrate that SCP produces partitions that on the average have 14% fewer chips, 28% fewer pins, and 93% fewer chip-crossings on each circuit path compared to ANN which is a simulated annealing based implementation of classical multi-way partitioning. SCP achieves this performance and compact packing at the cost of duplicating 13% of logic on the average. Additionally, in comparison with a lower bound we observe that SCP has produced near-optimal solutions.
{"title":"Multi-way Partitioning For Minimum Delay For Look-Up Table Based FPGAs","authors":"Prashant S. Sawkar, D. E. Thomas","doi":"10.1145/217474.217530","DOIUrl":"https://doi.org/10.1145/217474.217530","url":null,"abstract":"In this paper we present a set cover based approach (SCP) to multi-way partitioning for minimum delay for Look-Up Table based FPGAs. SCP minimizes the number of chip-crossings on each circuit path with minimum logic duplication costs to achieve implementations with minimum delay and minimum number of chips. The overall complexity of SCP is (V2). Experimental results demonstrate that SCP produces partitions that on the average have 14% fewer chips, 28% fewer pins, and 93% fewer chip-crossings on each circuit path compared to ANN which is a simulated annealing based implementation of classical multi-way partitioning. SCP achieves this performance and compact packing at the cost of duplicating 13% of logic on the average. Additionally, in comparison with a lower bound we observe that SCP has produced near-optimal solutions.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a novel method for topological delay optimization of combinational circuits. Unlike most previous techniques, optimization is performed after technology mapping. Therefore, exact gate delay information is known during optimization. Our method performs incremental network transformations, specifically substitutions of gate input or output signals by new gates. We present new theory which relates incremental network transformations to combinations of global clauses, and show how to detect such valid clause combinations. Employing techniques which originated in the test area, our method is capable to globally optimize large circuits. Comprehensive experimental results show that our method reduces the delay of large standard cell netlists by 23% on average. In contrast to most other delay optimization techniques, area reductions are achieved concurrently.
{"title":"Logic Clause Analysis for Delay Optimization","authors":"B. Rohfleisch, B. Wurth, K. Antreich","doi":"10.1145/217474.217608","DOIUrl":"https://doi.org/10.1145/217474.217608","url":null,"abstract":"In this paper, we present a novel method for topological delay optimization of combinational circuits. Unlike most previous techniques, optimization is performed after technology mapping. Therefore, exact gate delay information is known during optimization. Our method performs incremental network transformations, specifically substitutions of gate input or output signals by new gates. We present new theory which relates incremental network transformations to combinations of global clauses, and show how to detect such valid clause combinations. Employing techniques which originated in the test area, our method is capable to globally optimize large circuits. Comprehensive experimental results show that our method reduces the delay of large standard cell netlists by 23% on average. In contrast to most other delay optimization techniques, area reductions are achieved concurrently.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127505516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We show how to quantify the suboptimality of heuristic algorithms for NP-hard problems arising in VLSI layout. Our approach is based on the notion of constructing new scaled instances from an initial problem instance. From the given problem instance, we essentially construct doubled, tripled, etc. instances which have optimum solution costs at most twice, three times, etc. that of the original instance. By executing the heuristic on these scaled instances, and then comparing the growth of solution cost with the growth of instance size, we can measure the scaling suboptimality of the heuristic. We give experimentally determined scaling behavior of several placement and partitioning heuristics; these results suggest that siginificant improvement remains possible over current state-of-the-art methods.
{"title":"Quantified Suboptimality of VLSI Layout Heuristics","authors":"L. Hagen, D. J. Huang, A. Kahng","doi":"10.1145/217474.217532","DOIUrl":"https://doi.org/10.1145/217474.217532","url":null,"abstract":"We show how to quantify the suboptimality of heuristic algorithms for NP-hard problems arising in VLSI layout. Our approach is based on the notion of constructing new scaled instances from an initial problem instance. From the given problem instance, we essentially construct doubled, tripled, etc. instances which have optimum solution costs at most twice, three times, etc. that of the original instance. By executing the heuristic on these scaled instances, and then comparing the growth of solution cost with the growth of instance size, we can measure the scaling suboptimality of the heuristic. We give experimentally determined scaling behavior of several placement and partitioning heuristics; these results suggest that siginificant improvement remains possible over current state-of-the-art methods.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126706263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper demonstrates how fault simulation of building blocks found in data-path architectures can be performed extremely efficiently and accurately by taking advantage of their simple functional models and structural regularity. This technique can be used to accelerate the simulation of those blocks in virtually any fault simulation environment, resulting in fault simulation algorithms that can perform fault grading in a very demanding BIST environment.
{"title":"Software Accelerated Functional Fault Simulation for Data-Path Architectures","authors":"M. Kassab, N. Mukherjee, J. Rajski, J. Tyszer","doi":"10.1145/217474.217551","DOIUrl":"https://doi.org/10.1145/217474.217551","url":null,"abstract":"This paper demonstrates how fault simulation of building blocks found in data-path architectures can be performed extremely efficiently and accurately by taking advantage of their simple functional models and structural regularity. This technique can be used to accelerate the simulation of those blocks in virtually any fault simulation environment, resulting in fault simulation algorithms that can perform fault grading in a very demanding BIST environment.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126717057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has recently been shown that the boundary-element method can be used to perform accurate cross-talk simulations of three-dimensional integrated circuit interconnect. However, the computational complexity grows as N2, where N is the number of surface unknowns. Straightforward application of the fast-multipole algorithm reduces the computational complexity to order N, but produces magnified errors due to the ill-conditioning of the steady-state problem. We present a mixed surface-volume approach and prove that the formulation results in the exact steady-state solution, independent of the multipole approximations. Numerical experiments are presented to demonstrate the accuracy and efficiency of this technique. On a realistic example, the new method runs fifteen times faster than using dense-matrix iterative methods.
{"title":"Transient Simulations of Three-dimensional Integrated Circuit Interconnect Using a Mixed Surface-Volume approach","authors":"Tom Korsmeyer Mike Chou","doi":"10.1109/dac.1995.249996","DOIUrl":"https://doi.org/10.1109/dac.1995.249996","url":null,"abstract":"It has recently been shown that the boundary-element method can be used to perform accurate cross-talk simulations of three-dimensional integrated circuit interconnect. However, the computational complexity grows as N2, where N is the number of surface unknowns. Straightforward application of the fast-multipole algorithm reduces the computational complexity to order N, but produces magnified errors due to the ill-conditioning of the steady-state problem. We present a mixed surface-volume approach and prove that the formulation results in the exact steady-state solution, independent of the multipole approximations. Numerical experiments are presented to demonstrate the accuracy and efficiency of this technique. On a realistic example, the new method runs fifteen times faster than using dense-matrix iterative methods.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115367944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analog IC test occupies a significant fraction of the design cycle. Testing costs are increased by the twin requirements of high precision and accuracy in signal measurement. We discuss a system level ACOB technique for fully differential analog ICs. Our test techniques incorporate analog specific constraints such as device matching, and circuit and switching noise. They have a minimal impact on performance, area and power. The techniques can be used for both discrete and continuous time circuits, over a wide frequency range. The system level DFT scheme is also used to design a self-testing switched capacitor filter. Our checking scheme provides significant fault coverage and is demonstrably superior to other DFT techniques for differential circuits.
{"title":"System-Level Design for Test of Fully Differential Analog Circuits","authors":"Ramesh Harjani Bapiraju Vinnakota","doi":"10.1109/dac.1995.249989","DOIUrl":"https://doi.org/10.1109/dac.1995.249989","url":null,"abstract":"Analog IC test occupies a significant fraction of the design cycle. Testing costs are increased by the twin requirements of high precision and accuracy in signal measurement. We discuss a system level ACOB technique for fully differential analog ICs. Our test techniques incorporate analog specific constraints such as device matching, and circuit and switching noise. They have a minimal impact on performance, area and power. The techniques can be used for both discrete and continuous time circuits, over a wide frequency range. The system level DFT scheme is also used to design a self-testing switched capacitor filter. Our checking scheme provides significant fault coverage and is demonstrably superior to other DFT techniques for differential circuits.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116684714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a transformation, named rephasing, that manipulates the timing parameters in control-dataflow graphs. Traditionally high-level synthesis systems for DSP have either assumed that all the relative times, called phases, when corresponding samples are available at input and delay nodes are zero or have automatically assigned values to as part of the scheduling step when software pipelining is simultaneously applied. Rephasing, however, manipulates the values of these phases as a transformation before the scheduling. The advantage of this approach is that phases can be chosen to optimize the algorithm for metrics such as area and power. Moreover, rephasing can be combined with other transformations. We have developed techniques for using rephasing to optimize several design metrics. The experimental results show significant improvements.
{"title":"Rephasing: A Transformation Technique for the Manipulation of Timing Constraints","authors":"M. Potkonjak, M. Srivastava","doi":"10.1145/217474.217515","DOIUrl":"https://doi.org/10.1145/217474.217515","url":null,"abstract":"We introduce a transformation, named rephasing, that manipulates the timing parameters in control-dataflow graphs. Traditionally high-level synthesis systems for DSP have either assumed that all the relative times, called phases, when corresponding samples are available at input and delay nodes are zero or have automatically assigned values to as part of the scheduling step when software pipelining is simultaneously applied. Rephasing, however, manipulates the values of these phases as a transformation before the scheduling. The advantage of this approach is that phases can be chosen to optimize the algorithm for metrics such as area and power. Moreover, rephasing can be combined with other transformations. We have developed techniques for using rephasing to optimize several design metrics. The experimental results show significant improvements.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129674217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}