G. Ganapathy, Ram Narayan, Glenn Jorden, D. Fernandez, Ming Wang, J. Nishimura
The K5 microprocessor is a 4 Million transistor superscalar, X86 microprocessor. The K5 microprocessor is an AMD original design, verifying compatibility with the existing X86 architecture and software is crucial to its success in the market place. The X86 architecture has been constantly evolving over several years without any published specification. The primary mechanism for functional design verification of an X86 processor is simulation. The ability to execute a good sample set of the X86 software base on a model of the processor architecture before tapeout is key to achieving very high confidence first silicon. The Quickturn Hardware Emulation system allows us to map a model of the design onto hardware resources and execute it at high speeds. In this paper we present the emulation methodology that was jointly developed for K5 and applied successfully to meet our functional verification goals.
{"title":"Hardware emulation for functional verification of K5","authors":"G. Ganapathy, Ram Narayan, Glenn Jorden, D. Fernandez, Ming Wang, J. Nishimura","doi":"10.1145/240518.240578","DOIUrl":"https://doi.org/10.1145/240518.240578","url":null,"abstract":"The K5 microprocessor is a 4 Million transistor superscalar, X86 microprocessor. The K5 microprocessor is an AMD original design, verifying compatibility with the existing X86 architecture and software is crucial to its success in the market place. The X86 architecture has been constantly evolving over several years without any published specification. The primary mechanism for functional design verification of an X86 processor is simulation. The ability to execute a good sample set of the X86 software base on a model of the processor architecture before tapeout is key to achieving very high confidence first silicon. The Quickturn Hardware Emulation system allows us to map a model of the design onto hardware resources and execute it at high speeds. In this paper we present the emulation methodology that was jointly developed for K5 and applied successfully to meet our functional verification goals.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126893974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel Boolean approach to LUT-based FPGA technology mapping targeting high performance. At the core of the approach, we have developed a powerful functional decomposition algorithm. The impact of decomposition is enhanced by a preceding collapsing step. To decompose functions for small depth and area, we present an iterative, BDD-based variable partitioning procedure. The procedure optimizer the variable partition for each bound set size by iteratively exchanging variables between bound set and free set, and finally selects a good bound set size. Our decomposition algorithm extracts common subfunctions of multiple-output functions and thus further reduces area and the maximum interconnect lengths. Experimental results show that our new algorithm produces circuits with significantly smaller depths than other performance-oriented mappers. This advantage also holds for the actual delays after placement and routing.
{"title":"A Boolean approach to performance-directed technology mapping for LUT-based FPGA designs","authors":"C. Legl, B. Wurth, K. Eckl","doi":"10.1109/DAC.1996.545669","DOIUrl":"https://doi.org/10.1109/DAC.1996.545669","url":null,"abstract":"This paper presents a novel Boolean approach to LUT-based FPGA technology mapping targeting high performance. At the core of the approach, we have developed a powerful functional decomposition algorithm. The impact of decomposition is enhanced by a preceding collapsing step. To decompose functions for small depth and area, we present an iterative, BDD-based variable partitioning procedure. The procedure optimizer the variable partition for each bound set size by iteratively exchanging variables between bound set and free set, and finally selects a good bound set size. Our decomposition algorithm extracts common subfunctions of multiple-output functions and thus further reduces area and the maximum interconnect lengths. Experimental results show that our new algorithm produces circuits with significantly smaller depths than other performance-oriented mappers. This advantage also holds for the actual delays after placement and routing.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"88 20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126314111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we study the problem of decomposing gates in fanin-unbounded or K-bounded networks such that the K-input LUT mapping solutions computed by a depth-optimal mapper have minimum depth. We show (1) any decomposition leads to a smaller or equal mapping depth regardless the decomposition algorithm used, and (2) the problem is NP-hard for unbounded networks when K/spl ges/3 and remains NP-hard for K-bounded networks when K/spl ges/5. We propose a gate decomposition algorithm, named DOGMA, which combines level-driven node packing technique (Chortle-d) and the network flow based optimal labeling technique (FlowMap). Experimental results show that networks decomposed by DOGMA allow depth-optimal technology mappers to improve the mapping solutions by up to 11% in depth and up to 35% in area comparing to the mapping results of networks decomposed by other existing decomposition algorithms.
{"title":"Structural gate decomposition for depth-optimal technology mapping in LUT-based FPGA design","authors":"J. Cong, Yean-Yow Hwang","doi":"10.1109/DAC.1996.545668","DOIUrl":"https://doi.org/10.1109/DAC.1996.545668","url":null,"abstract":"In this paper, we study the problem of decomposing gates in fanin-unbounded or K-bounded networks such that the K-input LUT mapping solutions computed by a depth-optimal mapper have minimum depth. We show (1) any decomposition leads to a smaller or equal mapping depth regardless the decomposition algorithm used, and (2) the problem is NP-hard for unbounded networks when K/spl ges/3 and remains NP-hard for K-bounded networks when K/spl ges/5. We propose a gate decomposition algorithm, named DOGMA, which combines level-driven node packing technique (Chortle-d) and the network flow based optimal labeling technique (FlowMap). Experimental results show that networks decomposed by DOGMA allow depth-optimal technology mappers to improve the mapping solutions by up to 11% in depth and up to 35% in area comparing to the mapping results of networks decomposed by other existing decomposition algorithms.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125089070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe a novel, formal verification technique for proving the correctness of a pipelined microprocessor that focuses specifically on pipeline control logic. We iteratively deconstruct a pipeline by merging adjacent pipeline stages, allowing for the verification to be done in several easier steps. We present an inductive proof methodology that verifies that pipeline behaviour is preserved as the pipeline depth is reduced via deconstruction; this inductive approach is less sensitive to pipeline depth and complexity than previous approaches. Invariants are used to simplify the proof, and datapath components are abstracted using validity checking with uninterpreted functions. We present experimental results from the formal verification of a DLX five-stage pipeline using our technique.
{"title":"A scalable formal verification methodology for pipelined microprocessors","authors":"J. Levitt, K. Olukotun","doi":"10.1109/DAC.1996.545638","DOIUrl":"https://doi.org/10.1109/DAC.1996.545638","url":null,"abstract":"We describe a novel, formal verification technique for proving the correctness of a pipelined microprocessor that focuses specifically on pipeline control logic. We iteratively deconstruct a pipeline by merging adjacent pipeline stages, allowing for the verification to be done in several easier steps. We present an inductive proof methodology that verifies that pipeline behaviour is preserved as the pipeline depth is reduced via deconstruction; this inductive approach is less sensitive to pipeline depth and complexity than previous approaches. Invariants are used to simplify the proof, and datapath components are abstracted using validity checking with uninterpreted functions. We present experimental results from the formal verification of a DLX five-stage pipeline using our technique.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128209251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An efficient full-wave electromagnetic analysis tool would be useful in many aspects of engineering design. Development of integral-equation based tools has been hampered by the high computational complexity of dense matrix representations and difficulty in obtaining and utilizing the frequency-domain response. In this paper we demonstrate that an algorithm based on application of a novel model-order reduction scheme directly to the sparse model generated by a fast integral transform has significant advantages for frequency- and time-domain simulation.
{"title":"Efficient full-wave electromagnetic analysis via model-order reduction of fast integral transforms","authors":"Joel R. Philips, E. Chiprout, D. D. Ling","doi":"10.1109/DAC.1996.545605","DOIUrl":"https://doi.org/10.1109/DAC.1996.545605","url":null,"abstract":"An efficient full-wave electromagnetic analysis tool would be useful in many aspects of engineering design. Development of integral-equation based tools has been hampered by the high computational complexity of dense matrix representations and difficulty in obtaining and utilizing the frequency-domain response. In this paper we demonstrate that an algorithm based on application of a novel model-order reduction scheme directly to the sparse model generated by a fast integral transform has significant advantages for frequency- and time-domain simulation.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131647224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to optimize interconnect to avoid signal integrity problems, very fast and accurate 3-D capacitance extraction is essential. Fast algorithms, such as the multipole or precorrected Fast Fourier Transform (FFT) accelerated methods in programs like FASTCAP, must be combined with techniques to exploit the emerging cluster-of-workstation based parallel computers like the IBM SP2. In this paper, we examine parallelizing the precorrected FFT algorithm for 3-D capacitance extraction and present several algorithms for balancing workload and reducing communication time. Results from a prototype implementation on an eight processor IBM SP2 are presented for several test examples, and the largest of these examples achieves nearly linear parallel speed-up.
{"title":"A parallel precorrected FFT based capacitance extraction program for signal integrity analysis","authors":"N. Aluru, V. Nadkarni, James White","doi":"10.1145/240518.240587","DOIUrl":"https://doi.org/10.1145/240518.240587","url":null,"abstract":"In order to optimize interconnect to avoid signal integrity problems, very fast and accurate 3-D capacitance extraction is essential. Fast algorithms, such as the multipole or precorrected Fast Fourier Transform (FFT) accelerated methods in programs like FASTCAP, must be combined with techniques to exploit the emerging cluster-of-workstation based parallel computers like the IBM SP2. In this paper, we examine parallelizing the precorrected FFT algorithm for 3-D capacitance extraction and present several algorithms for balancing workload and reducing communication time. Results from a prototype implementation on an eight processor IBM SP2 are presented for several test examples, and the largest of these examples achieves nearly linear parallel speed-up.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134526975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new optimization technique called architectural retiming which is able to improve the performance of many latency-constrained circuits. Architectural retiming achieves this by increasing the number of registers on the latency-constrained path while preserving the functionality and latency of the circuit. This is done using the concept of a negative register, which can be implemented using precomputation and prediction. We use the name architectural retiming since it both reschedules operations in time and modifies the structure of the circuit to preserve its functionality. We illustrate the use of architectural retiming on two realistic examples and present performance improvement results for a number of sample circuits.
{"title":"Architectural retiming: pipelining latency-constrained circuits","authors":"S. Hassoun, C. Ebeling","doi":"10.1145/240518.240652","DOIUrl":"https://doi.org/10.1145/240518.240652","url":null,"abstract":"This paper presents a new optimization technique called architectural retiming which is able to improve the performance of many latency-constrained circuits. Architectural retiming achieves this by increasing the number of registers on the latency-constrained path while preserving the functionality and latency of the circuit. This is done using the concept of a negative register, which can be implemented using precomputation and prediction. We use the name architectural retiming since it both reschedules operations in time and modifies the structure of the circuit to preserve its functionality. We illustrate the use of architectural retiming on two realistic examples and present performance improvement results for a number of sample circuits.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122786779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Approximation has been shown to be an effective method for reducing the time and space costs of solving various floorplan area minimization problems. In this paper, we present several approximation techniques for solving floorplan area minimization problems. These new techniques enable us to reduce both the time and space complexities of the previously best known approximation algorithms by more than a factor of n and n/sup 2/ for rectangular and L-shaped sub-floorplans, respectively (where n is the number of given implementations). The efficiency in the time and space complexities is critical to the applicability of such approximation techniques in floorplan area minimization algorithms. We also give a technique for enhancing the quality of approximation results.
{"title":"Efficient approximation algorithms for floorplan area minimization","authors":"D. Chen, X. Hu","doi":"10.1109/DAC.1996.545624","DOIUrl":"https://doi.org/10.1109/DAC.1996.545624","url":null,"abstract":"Approximation has been shown to be an effective method for reducing the time and space costs of solving various floorplan area minimization problems. In this paper, we present several approximation techniques for solving floorplan area minimization problems. These new techniques enable us to reduce both the time and space complexities of the previously best known approximation algorithms by more than a factor of n and n/sup 2/ for rectangular and L-shaped sub-floorplans, respectively (where n is the number of given implementations). The efficiency in the time and space complexities is critical to the applicability of such approximation techniques in floorplan area minimization algorithms. We also give a technique for enhancing the quality of approximation results.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"7 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123803683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Technology mapping requires the unmapped logic network to be represented in terms of base functions, usually two-input NORs and inverters. Technology decomposition is the step that transforms arbitrary networks to this form. Typically, such decomposition schemes ignore the fact that certain circuit elements can be mapped more efficiently by treating them separately during decomposition. Multiplexers are one such category of circuit elements. They appear very naturally in circuits, in the form of datapath elements and as a result of synthesis of CASE statements in HDL specifications of control logic. Mapping them using multiplexers in technology libraries has many advantages. In this paper, we give an algorithm for optimally decomposing multiplexers, so as to minimize the delay of the network, and demonstrate its effectiveness in improving the quality of mapped circuits.
{"title":"Delay minimal decomposition of multiplexers in technology mapping","authors":"Shashidhar Thakur, D. F. Wong, S. Krishnamoorthy","doi":"10.1109/DAC.1996.545582","DOIUrl":"https://doi.org/10.1109/DAC.1996.545582","url":null,"abstract":"Technology mapping requires the unmapped logic network to be represented in terms of base functions, usually two-input NORs and inverters. Technology decomposition is the step that transforms arbitrary networks to this form. Typically, such decomposition schemes ignore the fact that certain circuit elements can be mapped more efficiently by treating them separately during decomposition. Multiplexers are one such category of circuit elements. They appear very naturally in circuits, in the form of datapath elements and as a result of synthesis of CASE statements in HDL specifications of control logic. Mapping them using multiplexers in technology libraries has many advantages. In this paper, we give an algorithm for optimally decomposing multiplexers, so as to minimize the delay of the network, and demonstrate its effectiveness in improving the quality of mapped circuits.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122039764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a technique that transforms large and complex RC networks into much smaller physically realizable RC networks. These models reflect the transmission behavior of the initial network accurately for frequencies up to a user-defined maximal signal frequency. This technique has been incorporated in a layout-to-circuit extractor, using a scan-line approach. The method guarantees numerical stability and performs excellently in modeling RC interconnects.
{"title":"Extracting circuit models for large RC interconnections that are accurate up to a predefined signal frequency","authors":"P. Elias, N. V. D. Meijs","doi":"10.1109/DAC.1996.545675","DOIUrl":"https://doi.org/10.1109/DAC.1996.545675","url":null,"abstract":"This paper presents a technique that transforms large and complex RC networks into much smaller physically realizable RC networks. These models reflect the transmission behavior of the initial network accurately for frequencies up to a user-defined maximal signal frequency. This technique has been incorporated in a layout-to-circuit extractor, using a scan-line approach. The method guarantees numerical stability and performs excellently in modeling RC interconnects.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122912486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}