Y. G. DeCastelo-Vide-e-Souza, M. Potkonjak, A. C. Parker
System level design and behavior transformations have been rapidly establishing themselves as design steps with the most inuential impact on final performance metrics, throughput and latency, of a design. In this paper we develop a formal ILP-based approach for throughput and latency optimization when algorithm-architecture matching, retiming, and pipelining are considered simultaneously. The effectiveness of the approach is demonstrated on several real-life examples.
{"title":"Optimal ILP-based Approach for Throughput Optimization Using Simultaneous Algorithm/Architecture Matching and Retiming","authors":"Y. G. DeCastelo-Vide-e-Souza, M. Potkonjak, A. C. Parker","doi":"10.1145/217474.217516","DOIUrl":"https://doi.org/10.1145/217474.217516","url":null,"abstract":"System level design and behavior transformations have been rapidly establishing themselves as design steps with the most inuential impact on final performance metrics, throughput and latency, of a design. In this paper we develop a formal ILP-based approach for throughput and latency optimization when algorithm-architecture matching, retiming, and pipelining are considered simultaneously. The effectiveness of the approach is demonstrated on several real-life examples.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A partial scan approach proposed recently selects scan signals without considering the availability of the ip-ops (FFs). Such an approach can greatly reduce the number of scan signals since maximum freedom is allowed in scan signal selection. To actually scan the selected signals, we, however, must make them FF-driving signals. In this paper, we study the problem of modifying and retiming a circuit to make a pre-selected set of scan signals FF-driving signals while preserving the set of cycles being broken. We present a new approach for solving this problem. Based on the new approach we design an efficient algorithm. Unlike a previous algorithm which inherently has no control over the area overhead incurred during the modification, our algorithm explicitly minimizes the area overhead. The algorithm has been implemented and encouraging results were obtained.
{"title":"Partial Scan with Pre-selected Scan Signals","authors":"P. Pan, C. Liu","doi":"10.1145/217474.217528","DOIUrl":"https://doi.org/10.1145/217474.217528","url":null,"abstract":"A partial scan approach proposed recently selects scan signals without considering the availability of the ip-ops (FFs). Such an approach can greatly reduce the number of scan signals since maximum freedom is allowed in scan signal selection. To actually scan the selected signals, we, however, must make them FF-driving signals. In this paper, we study the problem of modifying and retiming a circuit to make a pre-selected set of scan signals FF-driving signals while preserving the set of cycles being broken. We present a new approach for solving this problem. Based on the new approach we design an efficient algorithm. Unlike a previous algorithm which inherently has no control over the area overhead incurred during the modification, our algorithm explicitly minimizes the area overhead. The algorithm has been implemented and encouraging results were obtained.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128705690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We observe that the switching activity at a circuit node, also called the transition density, can be extremely sensitive to the circuit internal delays. As a result, slight delay variations can lead to several orders of magnitude changes in the node activity. This has important implications for CAD in that, if the transition density is estimated by simulation, then minor inaccuracies in the timing models can lead to very large errors in the estimated activity. As a solution, we propose an efficient technique for estimating an upper bound on the transition density at every node. While it is not always very tight, the upper bound is robust, in the sense that it is valid irrespective of delay variations and modeling errors. We will describe the technique and present experimental results based on a prototype implementation.
{"title":"Extreme Delay Sensitivity and the Worst-Case Switching Activity in VLSI Circuitsy","authors":"F. Najm, Michael Y. Zhang","doi":"10.1145/217474.217600","DOIUrl":"https://doi.org/10.1145/217474.217600","url":null,"abstract":"We observe that the switching activity at a circuit node, also called the transition density, can be extremely sensitive to the circuit internal delays. As a result, slight delay variations can lead to several orders of magnitude changes in the node activity. This has important implications for CAD in that, if the transition density is estimated by simulation, then minor inaccuracies in the timing models can lead to very large errors in the estimated activity. As a solution, we propose an efficient technique for estimating an upper bound on the transition density at every node. While it is not always very tight, the upper bound is robust, in the sense that it is valid irrespective of delay variations and modeling errors. We will describe the technique and present experimental results based on a prototype implementation.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117058332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present an algorithm for finding a good Ashenhurst decomposition of a switching function. Most current methods for performing this type of decomposition are based on the Roth-Karp algorithm. The algorithm presented here is based on finding an optimal cut in a BDD. This algorithm differs from previous decomposition algorithms in that the cut determines the size and composition of the bound set and the free set. Other methods examine all possible bound sets of an arbitrary size. We have applied this method to decomposing functions into sets of k-variable functions. This is a required step when implementing a function using a lookup table (LUT) based FPGA. The results compare very favorably to existing implementations of Roth-Karp decomposition methods.
{"title":"A Method for Finding Good Ashenhurst Decompositions and Its Application to FPGA Synthesis","authors":"T. Stanion, C. Sechen","doi":"10.1145/217474.217507","DOIUrl":"https://doi.org/10.1145/217474.217507","url":null,"abstract":"In this paper, we present an algorithm for finding a good Ashenhurst decomposition of a switching function. Most current methods for performing this type of decomposition are based on the Roth-Karp algorithm. The algorithm presented here is based on finding an optimal cut in a BDD. This algorithm differs from previous decomposition algorithms in that the cut determines the size and composition of the bound set and the free set. Other methods examine all possible bound sets of an arbitrary size. We have applied this method to decomposing functions into sets of k-variable functions. This is a required step when implementing a function using a lookup table (LUT) based FPGA. The results compare very favorably to existing implementations of Roth-Karp decomposition methods.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binary Moment Diagrams (BMDs) provide a canonical representations for linear functions similar to the way Binary Decision Diagrams (BDDs) represent Boolean functions. Within the class of linear functions, we can embed arbitrary functions from Boolean variables to integer values. BMDs can thus model the functionality of data path circuits operating over word-level data. Many important functions, including integermultiplication, that cannot be represented efficiently at the bit level with BDDs have simple representations at the word level with BMDs. Furthermore, BMDs can represent Boolean functions with around the same complexity as BDDs. We propose a hierarchical approach to verifying arithmetic circuits, where componentmodules are first shownto implement their word-level specifications. The overall circuit functionality is then verified by composing the component functions and comparing the result to the word-level circuit specification. Multipliers with word sizes of up to 256 bits have been verified by this technique.
{"title":"Verification of Arithmetic Circuits with Binary Moment Diagrams","authors":"R. Bryant, Yirng-An Chen","doi":"10.1145/217474.217583","DOIUrl":"https://doi.org/10.1145/217474.217583","url":null,"abstract":"Binary Moment Diagrams (BMDs) provide a canonical representations for linear functions similar to the way Binary Decision Diagrams (BDDs) represent Boolean functions. Within the class of linear functions, we can embed arbitrary functions from Boolean variables to integer values. BMDs can thus model the functionality of data path circuits operating over word-level data. Many important functions, including integermultiplication, that cannot be represented efficiently at the bit level with BDDs have simple representations at the word level with BMDs. Furthermore, BMDs can represent Boolean functions with around the same complexity as BDDs. We propose a hierarchical approach to verifying arithmetic circuits, where componentmodules are first shownto implement their word-level specifications. The overall circuit functionality is then verified by composing the component functions and comparing the result to the word-level circuit specification. Multipliers with word sizes of up to 256 bits have been verified by this technique.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126071589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a completemethodology for the design and validation of asynchronous circuits starting from a formal specificationmodel that roughly correspondsto a timing diagram. The methodology is presented in such a way that it is easy to embed in the current methodology for synchronous circuits. The different steps of the synthesis process will just be briefly touched upon. The main part of the paper concentrates on the simulation and validation of asynchronous circuits. It discusses where the designer needs validation and how it can be done. It also explains how this process can be automated and embedded in the complete methodology.
{"title":"A Design and Validation System for Asynchronous Circuits","authors":"P. Vanbekbergen, Albert R. Wang, K. Keutzer","doi":"10.1145/217474.217618","DOIUrl":"https://doi.org/10.1145/217474.217618","url":null,"abstract":"In this paper we present a completemethodology for the design and validation of asynchronous circuits starting from a formal specificationmodel that roughly correspondsto a timing diagram. The methodology is presented in such a way that it is easy to embed in the current methodology for synchronous circuits. The different steps of the synthesis process will just be briefly touched upon. The main part of the paper concentrates on the simulation and validation of asynchronous circuits. It discusses where the designer needs validation and how it can be done. It also explains how this process can be automated and embedded in the complete methodology.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131728488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Partitioned memory with bus interconnect architecture in its most general form consists of several functional units with associated memory accessible to the functional unit via local interconnect and global buses to communicate data values across from one functional unit to another. As can be expected, the time at which certain values are communicated affect the size of the local memories and the number of buses that are needed. We address the problem of scheduling communications in a bus architecture under memory constraints. We present here a network ow formulation for the problem and obtain an exact algorithm to schedule the communications, such that the constraint on the number of registers in each functional unit is satisfied. As an increasing number of architectures use multiple memories in addition to (or instead of) one central RAM, this work is especially interesting. Several authors have already studied this problem in related architectures, yet all use heuristic approaches to schedule the communications. Our technique is the first exact solution to the problem. Also, our graph theoretic formulation provides a clearer insight into the problem.
{"title":"Constrained Register Allocation in Bus Architectures","authors":"E. Frank, S. Raje, M. Sarrafzadeh","doi":"10.1145/217474.217525","DOIUrl":"https://doi.org/10.1145/217474.217525","url":null,"abstract":"Partitioned memory with bus interconnect architecture in its most general form consists of several functional units with associated memory accessible to the functional unit via local interconnect and global buses to communicate data values across from one functional unit to another. As can be expected, the time at which certain values are communicated affect the size of the local memories and the number of buses that are needed. We address the problem of scheduling communications in a bus architecture under memory constraints. We present here a network ow formulation for the problem and obtain an exact algorithm to schedule the communications, such that the constraint on the number of registers in each functional unit is satisfied. As an increasing number of architectures use multiple memories in addition to (or instead of) one central RAM, this work is especially interesting. Several authors have already studied this problem in related architectures, yet all use heuristic approaches to schedule the communications. Our technique is the first exact solution to the problem. Also, our graph theoretic formulation provides a clearer insight into the problem.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123962447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A spectral partitioning method uses the eigenvectors of a graph's adjacency or Laplacian matrix to construct a geometric representation (e.g., a linear ordering) which is then heuristically partitioned. We map each graph vertex to a vector in d-dimensional space, where d is the number of eigenvectors, such that these vectors constitute an instance of the vector partitioning problem. When all the eigenvectors are used, graph partitioning exactly reduces to vector partitioning. This result motivates a simple ordering heuristic that can be used to yield high-quality 2-way and multi-way partitionings. Our experiments suggest the vector partitioning perspective opens the door to new and effective heuristics.
{"title":"Spectral Partitioning: The More Eigenvectors, The Better","authors":"C. Alpert, So-Zen Yao","doi":"10.1145/217474.217529","DOIUrl":"https://doi.org/10.1145/217474.217529","url":null,"abstract":"A spectral partitioning method uses the eigenvectors of a graph's adjacency or Laplacian matrix to construct a geometric representation (e.g., a linear ordering) which is then heuristically partitioned. We map each graph vertex to a vector in d-dimensional space, where d is the number of eigenvectors, such that these vectors constitute an instance of the vector partitioning problem. When all the eigenvectors are used, graph partitioning exactly reduces to vector partitioning. This result motivates a simple ordering heuristic that can be used to yield high-quality 2-way and multi-way partitionings. Our experiments suggest the vector partitioning perspective opens the door to new and effective heuristics.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"184 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120895459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an automata model which concisely captures the constraints imposed by a data-path, such as bus hazards, register constraints, and control encoding limitations. A set of uniform base components for depicting general data-paths and techniques for systematic translation of such depictions into Boolean functions are described. Finally, this model is expanded to represent the limitations of generating as well as moving operands by incorporating data-flow graphs. The benefits of this representation are demonstrated by modeling a commercial DSP microprocessor.
{"title":"Symbolic Modeling and Evaluation of Data Paths","authors":"C. Monahan, F. Brewer","doi":"10.1145/217474.217560","DOIUrl":"https://doi.org/10.1145/217474.217560","url":null,"abstract":"We present an automata model which concisely captures the constraints imposed by a data-path, such as bus hazards, register constraints, and control encoding limitations. A set of uniform base components for depicting general data-paths and techniques for systematic translation of such depictions into Boolean functions are described. Finally, this model is expanded to represent the limitations of generating as well as moving operands by incorporating data-flow graphs. The benefits of this representation are demonstrated by modeling a commercial DSP microprocessor.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121250299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RLC transmission line synthesis is a difficult problem because minimal net delay cannot be achieved or accurately predicted unless: 1) a termination scheme is chosen and properly implemented, and 2) output driver transition rates are constrained at or below the net's capability. This paper describes a method to concurrently find, for any RLC net, an optimal termination value, a maximum source transition rate, and an approximate net delay using the first few response moments. When optimal termination is accomplished and transition rates do not exceed the capabilities of the net, the resulting delay metrics are an interesting extension of the popular Elmore delay metric for RC interconnects. The task of physical RLC interconnect design is facilitated by the ease with which these first few moments are calculated for generalized RLC lines.
{"title":"Transmission Line Synthesis","authors":"B. Krauter, Rohinish Gupta, J. Willis, L. Pileggi","doi":"10.1145/217474.217555","DOIUrl":"https://doi.org/10.1145/217474.217555","url":null,"abstract":"RLC transmission line synthesis is a difficult problem because minimal net delay cannot be achieved or accurately predicted unless: 1) a termination scheme is chosen and properly implemented, and 2) output driver transition rates are constrained at or below the net's capability. This paper describes a method to concurrently find, for any RLC net, an optimal termination value, a maximum source transition rate, and an approximate net delay using the first few response moments. When optimal termination is accomplished and transition rates do not exceed the capabilities of the net, the resulting delay metrics are an interesting extension of the popular Elmore delay metric for RC interconnects. The task of physical RLC interconnect design is facilitated by the ease with which these first few moments are calculated for generalized RLC lines.","PeriodicalId":422297,"journal":{"name":"32nd Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116487522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}