Pub Date : 1996-11-10DOI: 10.1109/ICCAD.1996.569798
Indradeep Ghosh, A. Raghunathan, N. Jha
We present a technique for extracting functional (control/dataflow) information from register transfer level (RTL) controller/data path circuits and illustrate its use in design for hierarchical testability of these circuits. This testing procedure and design for testability (DFT) technique is general enough to handle RTL control flow intensive circuits like protocol handlers as well as data flow intensive circuits like digital filters. It makes the combined controller-data path highly testable and does not require any external behavioral information. This scheme has the advantages of low area/delay/power overheads (average of 3.2%, 0.9% and 4.1%, respectively, for benchmarks), high fault coverage (over 99% for most cases), very low test generation times (because it is independent of bit-width), and the advantage of at-speed testing. Experiments show a 2-to-4 (1-to-3) orders of magnitude test generation time advantage over an efficient gate-level sequential test generator (combinational test generator that assumes full scan).
{"title":"A design for testability technique for RTL circuits using control/data flow extraction","authors":"Indradeep Ghosh, A. Raghunathan, N. Jha","doi":"10.1109/ICCAD.1996.569798","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569798","url":null,"abstract":"We present a technique for extracting functional (control/dataflow) information from register transfer level (RTL) controller/data path circuits and illustrate its use in design for hierarchical testability of these circuits. This testing procedure and design for testability (DFT) technique is general enough to handle RTL control flow intensive circuits like protocol handlers as well as data flow intensive circuits like digital filters. It makes the combined controller-data path highly testable and does not require any external behavioral information. This scheme has the advantages of low area/delay/power overheads (average of 3.2%, 0.9% and 4.1%, respectively, for benchmarks), high fault coverage (over 99% for most cases), very low test generation times (because it is independent of bit-width), and the advantage of at-speed testing. Experiments show a 2-to-4 (1-to-3) orders of magnitude test generation time advantage over an efficient gate-level sequential test generator (combinational test generator that assumes full scan).","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-11-10DOI: 10.1109/ICCAD.1996.569591
S. Dutt, W. Deng
Move-based iterative improvement partitioning methods such as the Fiduccia-Mattheyses (FM) algorithm and Krishnamurthy's Look-Ahead (LA) algorithm are widely used in VLSI CAD applications largely due to their time efficiency and ease of implementation. This class of algorithms is of the "local improvement" type. They generate relatively high quality results for small and medium size circuits. However, as VLSI circuits become larger, these algorithms are not so effective on them as direct partitioning tools. We propose new iterative-improvement methods that select cells to move with a view to moving clusters that straddle the two subsets of a partition into one of the subsets. The new algorithms significantly improve partition quality while preserving the advantage of time efficiency. Experimental results on 25 medium to large size ACM/SIGDA benchmark circuits show up to 70% improvement over FM in cutsize, with an average of per-circuit percent improvements of about 25%, and a total cut improvement of about 35%. They also outperform the recent placement-based partitioning tool Paraboli and the spectral partitioner MELO by about 17% and 23%, respectively, with less CPU time. This demonstrates the potential of iterative improvement algorithms in dealing with the increasing complexity of modern VLSI circuitry.
{"title":"VLSI circuit partitioning by cluster-removal using iterative improvement techniques","authors":"S. Dutt, W. Deng","doi":"10.1109/ICCAD.1996.569591","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569591","url":null,"abstract":"Move-based iterative improvement partitioning methods such as the Fiduccia-Mattheyses (FM) algorithm and Krishnamurthy's Look-Ahead (LA) algorithm are widely used in VLSI CAD applications largely due to their time efficiency and ease of implementation. This class of algorithms is of the \"local improvement\" type. They generate relatively high quality results for small and medium size circuits. However, as VLSI circuits become larger, these algorithms are not so effective on them as direct partitioning tools. We propose new iterative-improvement methods that select cells to move with a view to moving clusters that straddle the two subsets of a partition into one of the subsets. The new algorithms significantly improve partition quality while preserving the advantage of time efficiency. Experimental results on 25 medium to large size ACM/SIGDA benchmark circuits show up to 70% improvement over FM in cutsize, with an average of per-circuit percent improvements of about 25%, and a total cut improvement of about 35%. They also outperform the recent placement-based partitioning tool Paraboli and the spectral partitioner MELO by about 17% and 23%, respectively, with less CPU time. This demonstrates the potential of iterative improvement algorithms in dealing with the increasing complexity of modern VLSI circuitry.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"958 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127031169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a statistical power evaluation framework at the RT-level. We first discuss the power macro-modeling formulation, and then propose a simple random sampling technique to alleviate the the overhead of macro-modeling during RTL simulation. Next, we describe a regression estimator to reduce the error of the macro-modeling approach. Experimental results indicate that the execution time of the simple random sampling combined with power macro-modeling is 50 X lower than that of conventional macro-modeling while the percentage error of regression estimation combined with power macro-modeling is 16 X lower than that of conventional macro-modeling. Hence, we provide the designer with options to either improve the accuracy or the execution time when using power macro-modeling in the context of RTL simulation.
{"title":"Statistical sampling and regression analysis for RT-Level power evaluation","authors":"Cheng-Ta Hsieh, Qing Wu, Chih-Shun Ding, Massoud Pedram","doi":"10.1109/ICCAD.1996.569914","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569914","url":null,"abstract":"In this paper, we propose a statistical power evaluation framework at the RT-level. We first discuss the power macro-modeling formulation, and then propose a simple random sampling technique to alleviate the the overhead of macro-modeling during RTL simulation. Next, we describe a regression estimator to reduce the error of the macro-modeling approach. Experimental results indicate that the execution time of the simple random sampling combined with power macro-modeling is 50 X lower than that of conventional macro-modeling while the percentage error of regression estimation combined with power macro-modeling is 16 X lower than that of conventional macro-modeling. Hence, we provide the designer with options to either improve the accuracy or the execution time when using power macro-modeling in the context of RTL simulation.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128838110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-11-01DOI: 10.1109/ICCAD.1996.569830
Kyosun Kim, R. Karri, M. Potkonjak
Using the flexibility provided by multiple functionalities we have developed a new approach for permanent fault-tolerance: Heterogeneous Built-In-Resiliency (HBIR). HBIR processor synthesis imposes several unique tasks on the synthesis process: (i) latency determination targeting k-unit fault-tolerance, (ii) application-to-faulty-unit matching and (iii) HBIR scheduling and assignment algorithms. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of designs.
{"title":"Heterogeneous built-in resiliency of application specific programmable processors","authors":"Kyosun Kim, R. Karri, M. Potkonjak","doi":"10.1109/ICCAD.1996.569830","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569830","url":null,"abstract":"Using the flexibility provided by multiple functionalities we have developed a new approach for permanent fault-tolerance: Heterogeneous Built-In-Resiliency (HBIR). HBIR processor synthesis imposes several unique tasks on the synthesis process: (i) latency determination targeting k-unit fault-tolerance, (ii) application-to-faulty-unit matching and (iii) HBIR scheduling and assignment algorithms. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of designs.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132727870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-11-01DOI: 10.1109/ICCAD.1996.569831
William J. Schilp, P. Maurer
The Inversion Algorithm is an event driven algorithm whose performance meets or exceeds that of Levelized Compiled Code simulation, even when the activity rate is unrealistically high. Existing implementations of the Inversion Algorithm are based on the Zero Delay model. This paper extends the algorithm to more realistic timing models. The main problems discussed in this paper are avoiding scheduling conflicts, and minimizing the amount of storage space. These problems are made considerably more difficult by the deletion of NOT gates and the collapsing of various connections. These optimizations transform the simulation into a multi-delay simulation under the transport delay model. A complete solution to the scheduling problem is presented under these conditions.
{"title":"Unit delay simulation with the inversion algorithm","authors":"William J. Schilp, P. Maurer","doi":"10.1109/ICCAD.1996.569831","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569831","url":null,"abstract":"The Inversion Algorithm is an event driven algorithm whose performance meets or exceeds that of Levelized Compiled Code simulation, even when the activity rate is unrealistically high. Existing implementations of the Inversion Algorithm are based on the Zero Delay model. This paper extends the algorithm to more realistic timing models. The main problems discussed in this paper are avoiding scheduling conflicts, and minimizing the amount of storage space. These problems are made considerably more difficult by the deletion of NOT gates and the collapsing of various connections. These optimizations transform the simulation into a multi-delay simulation under the transport delay model. A complete solution to the scheduling problem is presented under these conditions.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115192414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-11-01DOI: 10.1109/ICCAD.1996.569635
S. Yamashita, Hiroshi Sawada, A. Nagoya
This paper presents a new method to express functional permissibilities for look-up table (LUT) based field programmable gate arrays (FPGAs). The method represents functional permissibilities by using sets of pairs of functions, not by incompletely specified functions. It makes good use of the properties of LUTs such that their internal logics can be freely changed. The permissibilities expressed by the proposed method have the desired property that at many points of a network they can be simultaneously treated. Applications of the proposed method are also presented; a method to optimize networks and a method to remove connections that are obstacles at the routing step. Preliminary experimental results are given to show the effectiveness of our proposed method.
{"title":"A new method to express functional permissibilities for LUT based FPGAs and its applications","authors":"S. Yamashita, Hiroshi Sawada, A. Nagoya","doi":"10.1109/ICCAD.1996.569635","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569635","url":null,"abstract":"This paper presents a new method to express functional permissibilities for look-up table (LUT) based field programmable gate arrays (FPGAs). The method represents functional permissibilities by using sets of pairs of functions, not by incompletely specified functions. It makes good use of the properties of LUTs such that their internal logics can be freely changed. The permissibilities expressed by the proposed method have the desired property that at many points of a network they can be simultaneously treated. Applications of the proposed method are also presented; a method to optimize networks and a method to remove connections that are obstacles at the routing step. Preliminary experimental results are given to show the effectiveness of our proposed method.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131878954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-11-01DOI: 10.1109/ICCAD.1996.569710
L. M. Silveira, M. Kamon, I. Elfadel, Jacob K. White
Since the first papers on asymptotic waveform evaluation (AWE), Pade-based reduced order models have become standard for improving coupled circuit-interconnect simulation efficiency. Such models can be accurately computed using bi-orthogonalization algorithms like Pade via Lanczos (PVL), but the resulting Pade approximates can still be unstable even when generated from stable RLC circuits. For certain classes of RC circuits it has been shown that congruence transforms, like the Arnoldi algorithm, can generate guaranteed stable and passive reduced-order models. In this paper we present a computationally efficient model-order reduction technique, the coordinate-transformed Arnoldi algorithm, and show that this method generates arbitrarily accurate and guaranteed stable reduced-order models for RLC circuits. Examples are presented which demonstrates the enhanced stability and efficiency of the new method.
自第一批关于渐近波形评估(AWE)的论文发表以来,基于pade的降阶模型已经成为提高耦合电路互连仿真效率的标准。这样的模型可以使用像Pade via Lanczos (PVL)这样的双正交化算法精确计算,但是即使由稳定的RLC电路生成,所得到的Pade近似值仍然是不稳定的。对于某些类型的RC电路,证明了同余变换,如Arnoldi算法,可以生成保证稳定的无源降阶模型。本文提出了一种计算效率很高的模型降阶技术——坐标变换Arnoldi算法,并证明了该方法可以为RLC电路生成任意精确且保证稳定的降阶模型。算例表明,该方法提高了算法的稳定性和效率。
{"title":"A coordinate-transformed Arnoldi algorithm for generating guaranteed stable reduced-order models of RLC circuits","authors":"L. M. Silveira, M. Kamon, I. Elfadel, Jacob K. White","doi":"10.1109/ICCAD.1996.569710","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569710","url":null,"abstract":"Since the first papers on asymptotic waveform evaluation (AWE), Pade-based reduced order models have become standard for improving coupled circuit-interconnect simulation efficiency. Such models can be accurately computed using bi-orthogonalization algorithms like Pade via Lanczos (PVL), but the resulting Pade approximates can still be unstable even when generated from stable RLC circuits. For certain classes of RC circuits it has been shown that congruence transforms, like the Arnoldi algorithm, can generate guaranteed stable and passive reduced-order models. In this paper we present a computationally efficient model-order reduction technique, the coordinate-transformed Arnoldi algorithm, and show that this method generates arbitrarily accurate and guaranteed stable reduced-order models for RLC circuits. Examples are presented which demonstrates the enhanced stability and efficiency of the new method.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131528991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-04-10DOI: 10.1109/ICCAD.1996.569538
H. Juan, D. Gajski, Viraphol Chaiyakul
In interactive behavioral synthesis, the designer can control the design process at every stage, including modifying the schedule of the design to improve its performance. In this paper, we present a methodology for performance optimization in interactive behavioral synthesis. Also proposed are several quality metrics and hints that can assist the user in utilizing the proposed methodology. When the user is optimizing the performance of the design, one important decision is the selection of a clock period. To facilitate clock selection by the user, we have developed an algorithm to estimate the effect of different clock periods on the execution time of the design. We have tested our methodology on several benchmarks. The experimental results support the proposed methodology by demonstrating an average improvement of 46.2% in design performance.
{"title":"Clock-driven performance optimization in interactive behavioral synthesis","authors":"H. Juan, D. Gajski, Viraphol Chaiyakul","doi":"10.1109/ICCAD.1996.569538","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.569538","url":null,"abstract":"In interactive behavioral synthesis, the designer can control the design process at every stage, including modifying the schedule of the design to improve its performance. In this paper, we present a methodology for performance optimization in interactive behavioral synthesis. Also proposed are several quality metrics and hints that can assist the user in utilizing the proposed methodology. When the user is optimizing the performance of the design, one important decision is the selection of a clock period. To facilitate clock selection by the user, we have developed an algorithm to estimate the effect of different clock periods on the execution time of the design. We have tested our methodology on several benchmarks. The experimental results support the proposed methodology by demonstrating an average improvement of 46.2% in design performance.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124307926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1996-04-01DOI: 10.1109/ICCAD.1996.571349
Man-Fai Yu, J. Darnauer, W. Dai
Many practical routing problems such as BGA, PGA, pin redistribution and test fixture routing involve routing with interchangeable pins. These routing problems, especially package layout, are becoming more difficult to do manually due to increasing speed and I/O. Currently, no commercial or university router is available for this task. In this paper, we unify these different problems as instances of the interchangeable pin routing (IPR) problem, which is NP-complete. By representing the solution space with flows in a triangulated routing network instead of grids, we developed a min-cost max-flow heuristic considering only the most important cuts in the design. The heuristic handles multiple layers, prerouted nets, and all-angle, octilinear or rectilinear wiring styles. Experiments show that the heuristic is very effective on most practical examples. It had been used to route industry designs with thousands of interchangeable pins.
{"title":"Interchangeable pin routing with application to package layout","authors":"Man-Fai Yu, J. Darnauer, W. Dai","doi":"10.1109/ICCAD.1996.571349","DOIUrl":"https://doi.org/10.1109/ICCAD.1996.571349","url":null,"abstract":"Many practical routing problems such as BGA, PGA, pin redistribution and test fixture routing involve routing with interchangeable pins. These routing problems, especially package layout, are becoming more difficult to do manually due to increasing speed and I/O. Currently, no commercial or university router is available for this task. In this paper, we unify these different problems as instances of the interchangeable pin routing (IPR) problem, which is NP-complete. By representing the solution space with flows in a triangulated routing network instead of grids, we developed a min-cost max-flow heuristic considering only the most important cuts in the design. The heuristic handles multiple layers, prerouted nets, and all-angle, octilinear or rectilinear wiring styles. Experiments show that the heuristic is very effective on most practical examples. It had been used to route industry designs with thousands of interchangeable pins.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123570436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/iccad.1996.569418
D. V. Campenhout, T. Mudge, K. Sakallah
Two methods are presented for static timing verification of sequential circuits implemented as a mix of static and domino logic. Constraints for proper operation of domino gates are derived. An important observation is that input signals to domino gates may start changing near the end of the evaluate phase. The first method models domino gates explicitly, similar to latches. The second method treats domino gates only during pre- and post-processing steps. This method is shown to be more conservative, but easier to compute.
{"title":"Timing verification of sequential domino circuits","authors":"D. V. Campenhout, T. Mudge, K. Sakallah","doi":"10.1109/iccad.1996.569418","DOIUrl":"https://doi.org/10.1109/iccad.1996.569418","url":null,"abstract":"Two methods are presented for static timing verification of sequential circuits implemented as a mix of static and domino logic. Constraints for proper operation of domino gates are derived. An important observation is that input signals to domino gates may start changing near the end of the evaluate phase. The first method models domino gates explicitly, similar to latches. The second method treats domino gates only during pre- and post-processing steps. This method is shown to be more conservative, but easier to compute.","PeriodicalId":408850,"journal":{"name":"Proceedings of International Conference on Computer Aided Design","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129927562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}