Y. G. DeCastelo-Vide-e-Souza, M. Potkonjak, A. C. Parker
System level design and behavior transformations have been rapidly establishing themselves as design steps with the most inuential impact on final performance metrics, throughput and latency, of a design. In this paper we develop a formal ILP-based approach for throughput and latency optimization when algorithm-architecture matching, retiming, and pipelining are considered simultaneously. The effectiveness of the approach is demonstrated on several real-life examples.
{"title":"Optimal ILP-based Approach for Throughput Optimization Using Simultaneous Algorithm/Architecture Matching and Retiming","authors":"Y. G. DeCastelo-Vide-e-Souza, M. Potkonjak, A. C. Parker","doi":"10.1145/217474.217516","DOIUrl":"https://doi.org/10.1145/217474.217516","url":null,"abstract":"System level design and behavior transformations have been rapidly establishing themselves as design steps with the most inuential impact on final performance metrics, throughput and latency, of a design. In this paper we develop a formal ILP-based approach for throughput and latency optimization when algorithm-architecture matching, retiming, and pipelining are considered simultaneously. The effectiveness of the approach is demonstrated on several real-life examples.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A partial scan approach proposed recently selects scan signals without considering the availability of the ip-ops (FFs). Such an approach can greatly reduce the number of scan signals since maximum freedom is allowed in scan signal selection. To actually scan the selected signals, we, however, must make them FF-driving signals. In this paper, we study the problem of modifying and retiming a circuit to make a pre-selected set of scan signals FF-driving signals while preserving the set of cycles being broken. We present a new approach for solving this problem. Based on the new approach we design an efficient algorithm. Unlike a previous algorithm which inherently has no control over the area overhead incurred during the modification, our algorithm explicitly minimizes the area overhead. The algorithm has been implemented and encouraging results were obtained.
{"title":"Partial Scan with Pre-selected Scan Signals","authors":"P. Pan, C. Liu","doi":"10.1145/217474.217528","DOIUrl":"https://doi.org/10.1145/217474.217528","url":null,"abstract":"A partial scan approach proposed recently selects scan signals without considering the availability of the ip-ops (FFs). Such an approach can greatly reduce the number of scan signals since maximum freedom is allowed in scan signal selection. To actually scan the selected signals, we, however, must make them FF-driving signals. In this paper, we study the problem of modifying and retiming a circuit to make a pre-selected set of scan signals FF-driving signals while preserving the set of cycles being broken. We present a new approach for solving this problem. Based on the new approach we design an efficient algorithm. Unlike a previous algorithm which inherently has no control over the area overhead incurred during the modification, our algorithm explicitly minimizes the area overhead. The algorithm has been implemented and encouraging results were obtained.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128705690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We observe that the switching activity at a circuit node, also called the transition density, can be extremely sensitive to the circuit internal delays. As a result, slight delay variations can lead to several orders of magnitude changes in the node activity. This has important implications for CAD in that, if the transition density is estimated by simulation, then minor inaccuracies in the timing models can lead to very large errors in the estimated activity. As a solution, we propose an efficient technique for estimating an upper bound on the transition density at every node. While it is not always very tight, the upper bound is robust, in the sense that it is valid irrespective of delay variations and modeling errors. We will describe the technique and present experimental results based on a prototype implementation.
{"title":"Extreme Delay Sensitivity and the Worst-Case Switching Activity in VLSI Circuitsy","authors":"F. Najm, Michael Y. Zhang","doi":"10.1145/217474.217600","DOIUrl":"https://doi.org/10.1145/217474.217600","url":null,"abstract":"We observe that the switching activity at a circuit node, also called the transition density, can be extremely sensitive to the circuit internal delays. As a result, slight delay variations can lead to several orders of magnitude changes in the node activity. This has important implications for CAD in that, if the transition density is estimated by simulation, then minor inaccuracies in the timing models can lead to very large errors in the estimated activity. As a solution, we propose an efficient technique for estimating an upper bound on the transition density at every node. While it is not always very tight, the upper bound is robust, in the sense that it is valid irrespective of delay variations and modeling errors. We will describe the technique and present experimental results based on a prototype implementation.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117058332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present an algorithm for finding a good Ashenhurst decomposition of a switching function. Most current methods for performing this type of decomposition are based on the Roth-Karp algorithm. The algorithm presented here is based on finding an optimal cut in a BDD. This algorithm differs from previous decomposition algorithms in that the cut determines the size and composition of the bound set and the free set. Other methods examine all possible bound sets of an arbitrary size. We have applied this method to decomposing functions into sets of k-variable functions. This is a required step when implementing a function using a lookup table (LUT) based FPGA. The results compare very favorably to existing implementations of Roth-Karp decomposition methods.
{"title":"A Method for Finding Good Ashenhurst Decompositions and Its Application to FPGA Synthesis","authors":"T. Stanion, C. Sechen","doi":"10.1145/217474.217507","DOIUrl":"https://doi.org/10.1145/217474.217507","url":null,"abstract":"In this paper, we present an algorithm for finding a good Ashenhurst decomposition of a switching function. Most current methods for performing this type of decomposition are based on the Roth-Karp algorithm. The algorithm presented here is based on finding an optimal cut in a BDD. This algorithm differs from previous decomposition algorithms in that the cut determines the size and composition of the bound set and the free set. Other methods examine all possible bound sets of an arbitrary size. We have applied this method to decomposing functions into sets of k-variable functions. This is a required step when implementing a function using a lookup table (LUT) based FPGA. The results compare very favorably to existing implementations of Roth-Karp decomposition methods.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binary Moment Diagrams (BMDs) provide a canonical representations for linear functions similar to the way Binary Decision Diagrams (BDDs) represent Boolean functions. Within the class of linear functions, we can embed arbitrary functions from Boolean variables to integer values. BMDs can thus model the functionality of data path circuits operating over word-level data. Many important functions, including integermultiplication, that cannot be represented efficiently at the bit level with BDDs have simple representations at the word level with BMDs. Furthermore, BMDs can represent Boolean functions with around the same complexity as BDDs. We propose a hierarchical approach to verifying arithmetic circuits, where componentmodules are first shownto implement their word-level specifications. The overall circuit functionality is then verified by composing the component functions and comparing the result to the word-level circuit specification. Multipliers with word sizes of up to 256 bits have been verified by this technique.
{"title":"Verification of Arithmetic Circuits with Binary Moment Diagrams","authors":"R. Bryant, Yirng-An Chen","doi":"10.1145/217474.217583","DOIUrl":"https://doi.org/10.1145/217474.217583","url":null,"abstract":"Binary Moment Diagrams (BMDs) provide a canonical representations for linear functions similar to the way Binary Decision Diagrams (BDDs) represent Boolean functions. Within the class of linear functions, we can embed arbitrary functions from Boolean variables to integer values. BMDs can thus model the functionality of data path circuits operating over word-level data. Many important functions, including integermultiplication, that cannot be represented efficiently at the bit level with BDDs have simple representations at the word level with BMDs. Furthermore, BMDs can represent Boolean functions with around the same complexity as BDDs. We propose a hierarchical approach to verifying arithmetic circuits, where componentmodules are first shownto implement their word-level specifications. The overall circuit functionality is then verified by composing the component functions and comparing the result to the word-level circuit specification. Multipliers with word sizes of up to 256 bits have been verified by this technique.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126071589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a completemethodology for the design and validation of asynchronous circuits starting from a formal specificationmodel that roughly correspondsto a timing diagram. The methodology is presented in such a way that it is easy to embed in the current methodology for synchronous circuits. The different steps of the synthesis process will just be briefly touched upon. The main part of the paper concentrates on the simulation and validation of asynchronous circuits. It discusses where the designer needs validation and how it can be done. It also explains how this process can be automated and embedded in the complete methodology.
{"title":"A Design and Validation System for Asynchronous Circuits","authors":"P. Vanbekbergen, Albert R. Wang, K. Keutzer","doi":"10.1145/217474.217618","DOIUrl":"https://doi.org/10.1145/217474.217618","url":null,"abstract":"In this paper we present a completemethodology for the design and validation of asynchronous circuits starting from a formal specificationmodel that roughly correspondsto a timing diagram. The methodology is presented in such a way that it is easy to embed in the current methodology for synchronous circuits. The different steps of the synthesis process will just be briefly touched upon. The main part of the paper concentrates on the simulation and validation of asynchronous circuits. It discusses where the designer needs validation and how it can be done. It also explains how this process can be automated and embedded in the complete methodology.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131728488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Partitioned memory with bus interconnect architecture in its most general form consists of several functional units with associated memory accessible to the functional unit via local interconnect and global buses to communicate data values across from one functional unit to another. As can be expected, the time at which certain values are communicated affect the size of the local memories and the number of buses that are needed. We address the problem of scheduling communications in a bus architecture under memory constraints. We present here a network ow formulation for the problem and obtain an exact algorithm to schedule the communications, such that the constraint on the number of registers in each functional unit is satisfied. As an increasing number of architectures use multiple memories in addition to (or instead of) one central RAM, this work is especially interesting. Several authors have already studied this problem in related architectures, yet all use heuristic approaches to schedule the communications. Our technique is the first exact solution to the problem. Also, our graph theoretic formulation provides a clearer insight into the problem.
{"title":"Constrained Register Allocation in Bus Architectures","authors":"E. Frank, S. Raje, M. Sarrafzadeh","doi":"10.1145/217474.217525","DOIUrl":"https://doi.org/10.1145/217474.217525","url":null,"abstract":"Partitioned memory with bus interconnect architecture in its most general form consists of several functional units with associated memory accessible to the functional unit via local interconnect and global buses to communicate data values across from one functional unit to another. As can be expected, the time at which certain values are communicated affect the size of the local memories and the number of buses that are needed. We address the problem of scheduling communications in a bus architecture under memory constraints. We present here a network ow formulation for the problem and obtain an exact algorithm to schedule the communications, such that the constraint on the number of registers in each functional unit is satisfied. As an increasing number of architectures use multiple memories in addition to (or instead of) one central RAM, this work is especially interesting. Several authors have already studied this problem in related architectures, yet all use heuristic approaches to schedule the communications. Our technique is the first exact solution to the problem. Also, our graph theoretic formulation provides a clearer insight into the problem.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123962447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Elmore delay is an extremely popular delay metric, particularly for RC tree analysis. The widespread usage of this metric is mainly attributable to it being the most accurate delay measure that is a simple analytical function of the circuit parameters. The only drawbacks to this delay metric are the uncertainty as to whether it is an optimistic or a pessimistic estimate, and the restriction to step response delay estimation. In this paper, we prove that the Elmore delay is an absolute upper bound on the 50% delay of an RC tree response. Moreover, we prove that this bound holds for input signals other than steps, and that the actual delay asymptotically approaches the Elmore delay as the input signal rise time increases. A lower bound on the delay is also developed using the Elmore delay and the second moment of the impulse response. The utility of this bound is for understanding the accuracy and the limitations of the Elmore delay metric as we use it for design automation.
{"title":"The Elmore Delay as a Bound for RC Trees with Generalized Input Signals","authors":"Rohinish Gupta, B. Tutuianu, L. Pileggi","doi":"10.1145/217474.217556","DOIUrl":"https://doi.org/10.1145/217474.217556","url":null,"abstract":"The Elmore delay is an extremely popular delay metric, particularly for RC tree analysis. The widespread usage of this metric is mainly attributable to it being the most accurate delay measure that is a simple analytical function of the circuit parameters. The only drawbacks to this delay metric are the uncertainty as to whether it is an optimistic or a pessimistic estimate, and the restriction to step response delay estimation. In this paper, we prove that the Elmore delay is an absolute upper bound on the 50% delay of an RC tree response. Moreover, we prove that this bound holds for input signals other than steps, and that the actual delay asymptotically approaches the Elmore delay as the input signal rise time increases. A lower bound on the delay is also developed using the Elmore delay and the second moment of the impulse response. The utility of this bound is for understanding the accuracy and the limitations of the Elmore delay metric as we use it for design automation.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114787957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A spectral partitioning method uses the eigenvectors of a graph's adjacency or Laplacian matrix to construct a geometric representation (e.g., a linear ordering) which is then heuristically partitioned. We map each graph vertex to a vector in d-dimensional space, where d is the number of eigenvectors, such that these vectors constitute an instance of the vector partitioning problem. When all the eigenvectors are used, graph partitioning exactly reduces to vector partitioning. This result motivates a simple ordering heuristic that can be used to yield high-quality 2-way and multi-way partitionings. Our experiments suggest the vector partitioning perspective opens the door to new and effective heuristics.
{"title":"Spectral Partitioning: The More Eigenvectors, The Better","authors":"C. Alpert, So-Zen Yao","doi":"10.1145/217474.217529","DOIUrl":"https://doi.org/10.1145/217474.217529","url":null,"abstract":"A spectral partitioning method uses the eigenvectors of a graph's adjacency or Laplacian matrix to construct a geometric representation (e.g., a linear ordering) which is then heuristically partitioned. We map each graph vertex to a vector in d-dimensional space, where d is the number of eigenvectors, such that these vectors constitute an instance of the vector partitioning problem. When all the eigenvectors are used, graph partitioning exactly reduces to vector partitioning. This result motivates a simple ordering heuristic that can be used to yield high-quality 2-way and multi-way partitionings. Our experiments suggest the vector partitioning perspective opens the door to new and effective heuristics.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"184 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120895459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RLC transmission line synthesis is a difficult problem because minimal net delay cannot be achieved or accurately predicted unless: 1) a termination scheme is chosen and properly implemented, and 2) output driver transition rates are constrained at or below the net's capability. This paper describes a method to concurrently find, for any RLC net, an optimal termination value, a maximum source transition rate, and an approximate net delay using the first few response moments. When optimal termination is accomplished and transition rates do not exceed the capabilities of the net, the resulting delay metrics are an interesting extension of the popular Elmore delay metric for RC interconnects. The task of physical RLC interconnect design is facilitated by the ease with which these first few moments are calculated for generalized RLC lines.
{"title":"Transmission Line Synthesis","authors":"B. Krauter, Rohinish Gupta, J. Willis, L. Pileggi","doi":"10.1145/217474.217555","DOIUrl":"https://doi.org/10.1145/217474.217555","url":null,"abstract":"RLC transmission line synthesis is a difficult problem because minimal net delay cannot be achieved or accurately predicted unless: 1) a termination scheme is chosen and properly implemented, and 2) output driver transition rates are constrained at or below the net's capability. This paper describes a method to concurrently find, for any RLC net, an optimal termination value, a maximum source transition rate, and an approximate net delay using the first few response moments. When optimal termination is accomplished and transition rates do not exceed the capabilities of the net, the resulting delay metrics are an interesting extension of the popular Elmore delay metric for RC interconnects. The task of physical RLC interconnect design is facilitated by the ease with which these first few moments are calculated for generalized RLC lines.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116487522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}