Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528811
M. Allen, W. Lewchuk, J. Coddington
The PowerPC 620 microprocessor introduces a new integrated secondary cache controller and system bus interface. The secondary cache interface is 128 bits wide, supports L2 sizes from 1 MB to 128 MB, is ECC protected, can transfer 2.0 GB/sec at 133 MHz and supports an optional co-processor mode. The 620 bus is optimized for server-class systems requiring significant multiprocessing capability and supports the 64-bit PowerPC architecture with a 40-bit physical address bus and a separate 128-bit data bus. Address transfer rates of up to 33 M Addresses/sec at 66 MHz are achieved by pipelining the address snoop response with the address bus. The address and data buses are explicitly tagged allowing data transfers to be reordered with respect to the addresses. The data bus can transfer up to 1.0 GB/sec at 66 MHz. The bus protocol and the integrated L2 controller presented support the snoop-based MESI cache coherency protocol and direct cache-to-cache data transfers.
{"title":"A high performance bus and cache controller for PowerPC multiprocessing systems","authors":"M. Allen, W. Lewchuk, J. Coddington","doi":"10.1109/ICCD.1995.528811","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528811","url":null,"abstract":"The PowerPC 620 microprocessor introduces a new integrated secondary cache controller and system bus interface. The secondary cache interface is 128 bits wide, supports L2 sizes from 1 MB to 128 MB, is ECC protected, can transfer 2.0 GB/sec at 133 MHz and supports an optional co-processor mode. The 620 bus is optimized for server-class systems requiring significant multiprocessing capability and supports the 64-bit PowerPC architecture with a 40-bit physical address bus and a separate 128-bit data bus. Address transfer rates of up to 33 M Addresses/sec at 66 MHz are achieved by pipelining the address snoop response with the address bus. The address and data buses are explicitly tagged allowing data transfers to be reordered with respect to the addresses. The data bus can transfer up to 1.0 GB/sec at 66 MHz. The bus protocol and the integrated L2 controller presented support the snoop-based MESI cache coherency protocol and direct cache-to-cache data transfers.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121572342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528840
Shashidhar Thakur, D. F. Wong
We address the technology mapping problem for lookup table FPGAs. The area minimization problem for mapping K-bounded networks, consisting of nodes with at most K inputs using K-input lookup tables is known to be NP-complete for K/spl ges/5. The complexity was unknown for K=2, 3, and 4. The corresponding delay minimization problem (under the constant delay model) was solved in polynomial time by the flow-map algorithm, for arbitrary values of K. We study the class of K-bounded networks, where all nodes have exactly K inputs. We call such networks K-exact. We give a characterization of mapping solutions for such networks. This leads to a polynomial time algorithm for computing the simultaneous area and delay minimum mapping for such networks using K-input lookup tables. We also show that the flow-map algorithm minimizes the area of the mapped network as well, for K-exact networks. We then show that for K=2 the mapping solution for a 2-bounded network, minimizing the area and delay simultaneously, can be easily obtained from that of a 2-exact network derived from it by eliminating single input nodes. Thus the area minimization problem for 2-input lookup tables can be solved in polynomial time, resolving an open problem.
{"title":"Simultaneous area and delay minimum K-LUT mapping for K-exact networks","authors":"Shashidhar Thakur, D. F. Wong","doi":"10.1109/ICCD.1995.528840","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528840","url":null,"abstract":"We address the technology mapping problem for lookup table FPGAs. The area minimization problem for mapping K-bounded networks, consisting of nodes with at most K inputs using K-input lookup tables is known to be NP-complete for K/spl ges/5. The complexity was unknown for K=2, 3, and 4. The corresponding delay minimization problem (under the constant delay model) was solved in polynomial time by the flow-map algorithm, for arbitrary values of K. We study the class of K-bounded networks, where all nodes have exactly K inputs. We call such networks K-exact. We give a characterization of mapping solutions for such networks. This leads to a polynomial time algorithm for computing the simultaneous area and delay minimum mapping for such networks using K-input lookup tables. We also show that the flow-map algorithm minimizes the area of the mapped network as well, for K-exact networks. We then show that for K=2 the mapping solution for a 2-bounded network, minimizing the area and delay simultaneously, can be easily obtained from that of a 2-exact network derived from it by eliminating single input nodes. Thus the area minimization problem for 2-input lookup tables can be solved in polynomial time, resolving an open problem.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116330029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528919
Y. Hoskote, Dinos Moundanos, J. Abraham
Simulation is still the primary, although inadequate, resource for verifying the conformity of a design to its functional specification. Fortunately, most errors in the early stages of design involve only the control flow in the circuit. We define the functional coverage of a given sequence of verification vectors as the amount of control behavior exercised by them. We present a novel technique for automatically extracting the control flow of a design on the basis of the underlying mathematical model. Significantly, this extraction is independent of the circuit description style. The Extracted Control Flow Machine (ECFM) is then used for estimation of functional coverage and to provide information that will help the designer improve the quality of his or her tests.
{"title":"Automatic extraction of the control flow machine and application to evaluating coverage of verification vectors","authors":"Y. Hoskote, Dinos Moundanos, J. Abraham","doi":"10.1109/ICCD.1995.528919","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528919","url":null,"abstract":"Simulation is still the primary, although inadequate, resource for verifying the conformity of a design to its functional specification. Fortunately, most errors in the early stages of design involve only the control flow in the circuit. We define the functional coverage of a given sequence of verification vectors as the amount of control behavior exercised by them. We present a novel technique for automatically extracting the control flow of a design on the basis of the underlying mathematical model. Significantly, this extraction is independent of the circuit description style. The Extracted Control Flow Machine (ECFM) is then used for estimation of functional coverage and to provide information that will help the designer improve the quality of his or her tests.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528908
K. Srivatsan, C. Chakrabarti, L. Lucke
In many applications, such as digital signal processing, data format converters are used to reformat the data transferred between processing modules. In VLSI implementations, these converters consume a large portion of the available resources. Various methods have been proposed to synthesize data format converter architectures while optimizing the number of registers used to store the data. In this paper, we present a new register allocation scheme which not only minimizes the number of resistors, but also minimizes the power consumption in the data format converter. Low power data format converters are synthesized by minimizing the transitions and interconnections between the registers used to store the data. We present both a heuristic and an integer linear programming formulation to solve the allocation problem. Our method shows significant improvement over previous techniques.
{"title":"Low power data format converter design using semi-static register allocation","authors":"K. Srivatsan, C. Chakrabarti, L. Lucke","doi":"10.1109/ICCD.1995.528908","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528908","url":null,"abstract":"In many applications, such as digital signal processing, data format converters are used to reformat the data transferred between processing modules. In VLSI implementations, these converters consume a large portion of the available resources. Various methods have been proposed to synthesize data format converter architectures while optimizing the number of registers used to store the data. In this paper, we present a new register allocation scheme which not only minimizes the number of resistors, but also minimizes the power consumption in the data format converter. Low power data format converters are synthesized by minimizing the transitions and interconnections between the registers used to store the data. We present both a heuristic and an integer linear programming formulation to solve the allocation problem. Our method shows significant improvement over previous techniques.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129106153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528844
B. Hosticka
This paper discusses problems of integrated system design. It is shown what is the current state of the art and where are the deficits. Finally, recommendations for future development of design support are given.
{"title":"Toward integrated system design: a global perspective","authors":"B. Hosticka","doi":"10.1109/ICCD.1995.528844","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528844","url":null,"abstract":"This paper discusses problems of integrated system design. It is shown what is the current state of the art and where are the deficits. Finally, recommendations for future development of design support are given.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125216859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528800
Hong Hao, K. Bhabuthmal
This paper describes the SuperSPARC II clock controller. This controller allows the internal clock to be disabled during the chip's normal operation. Then any number of internal clock pulses can be issued in a controlled fashion. The clock can return to the free running mode after being disabled. All clock control is done in a way that produces no glitches on the internal clock signal The clock controller can be accessed through the IEEE 1149.1 interface, making it useful at the chip level and at the module or system level.
{"title":"Clock controller design in SuperSPARC II microprocessor","authors":"Hong Hao, K. Bhabuthmal","doi":"10.1109/ICCD.1995.528800","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528800","url":null,"abstract":"This paper describes the SuperSPARC II clock controller. This controller allows the internal clock to be disabled during the chip's normal operation. Then any number of internal clock pulses can be issued in a controlled fashion. The clock can return to the free running mode after being disabled. All clock control is done in a way that produces no glitches on the internal clock signal The clock controller can be accessed through the IEEE 1149.1 interface, making it useful at the chip level and at the module or system level.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123802412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528904
I. Radivojevic, F. Brewer
Optimization of hardware resources for conditional data-flow graph behavior is particularly important when conditional behavior occurs in cyclic loops and maximization of throughput is desired. In this paper, an exact and efficient conditional resource sharing analysis using a guardbased control representation is presented. The analysis is transparent to a scheduler implementation. The proposed technique systematically handles complex conditional resource sharing for cases when folded (software pipelined) loops include conditional behavior within the loop body.
{"title":"Analysis of conditional resource sharing using a guard-based control representation","authors":"I. Radivojevic, F. Brewer","doi":"10.1109/ICCD.1995.528904","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528904","url":null,"abstract":"Optimization of hardware resources for conditional data-flow graph behavior is particularly important when conditional behavior occurs in cyclic loops and maximization of throughput is desired. In this paper, an exact and efficient conditional resource sharing analysis using a guardbased control representation is presented. The analysis is transparent to a scheduler implementation. The proposed technique systematically handles complex conditional resource sharing for cases when folded (software pipelined) loops include conditional behavior within the loop body.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116193094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528793
S. Campos, E. Clarke, W. Marrero, M. Minea
Symbolic model checking is a successful technique for checking properties of large finite-state systems. This method has been used to verify a number of real-world hardware designs; however it is not able to determine timing or performance properties directly. Since these properties are extremely important in the design of high-performance systems and in time-critical applications, we have extended model checking techniques to produce timing information. Our results allow a more detailed analysis of a model than is possible with tools that simply determine whether a property is satisfied or not. We present algorithms that determine the exact bounds on the time interval between two specified events and the number of occurrences of another event in such an interval. To demonstrate how our method works, we have modelled the PCI local bus and analyzed its temporal behavior. The results demonstrate the usefulness of our technique in analyzing complex modem designs.
{"title":"Verifying the performance of the PCI local bus using symbolic techniques","authors":"S. Campos, E. Clarke, W. Marrero, M. Minea","doi":"10.1109/ICCD.1995.528793","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528793","url":null,"abstract":"Symbolic model checking is a successful technique for checking properties of large finite-state systems. This method has been used to verify a number of real-world hardware designs; however it is not able to determine timing or performance properties directly. Since these properties are extremely important in the design of high-performance systems and in time-critical applications, we have extended model checking techniques to produce timing information. Our results allow a more detailed analysis of a model than is possible with tools that simply determine whether a property is satisfied or not. We present algorithms that determine the exact bounds on the time interval between two specified events and the number of occurrences of another event in such an interval. To demonstrate how our method works, we have modelled the PCI local bus and analyzed its temporal behavior. The results demonstrate the usefulness of our technique in analyzing complex modem designs.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132490614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528823
A. Yousif, J. Gu
The test generation problem for combinational circuits is known to be NP-hard. Efficient techniques for test generation are essential in order to reduce the test generation time. In this paper, we present a new and efficient test generation system based on global computations techniques. We aim at reducing the test generation time by using concurrent search to find tests for more than one fault at a time as opposed to the single target fault technique used by current test systems. In order to achieve our objective, a new, model for test generation is presented. We present a formal definition for the new test generation model and an implementation for the test generation system. Experimental results using ISCAS'85 and ISCAS'89 benchmarks are also presented.
{"title":"Concurrent automatic test pattern generation algorithm for combinational circuits","authors":"A. Yousif, J. Gu","doi":"10.1109/ICCD.1995.528823","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528823","url":null,"abstract":"The test generation problem for combinational circuits is known to be NP-hard. Efficient techniques for test generation are essential in order to reduce the test generation time. In this paper, we present a new and efficient test generation system based on global computations techniques. We aim at reducing the test generation time by using concurrent search to find tests for more than one fault at a time as opposed to the single target fault technique used by current test systems. In order to achieve our objective, a new, model for test generation is presented. We present a formal definition for the new test generation model and an implementation for the test generation system. Experimental results using ISCAS'85 and ISCAS'89 benchmarks are also presented.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126446114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528829
Jean-Paul Theis, L. Thiele
In this paper, we describe a new processor model called Periodic Operation Model (POM) that is suitable for real time image processing. First we analyze existing image processing systems in order to situate our approach. Starting from the processor architecture, we derive the corresponding algorithm class by means of a novel hardware description. Then we address the allocation and scheduling problem. We show that allocation and scheduling can be decoupled in the mapping process related to POM-processor arrays and outline the principles of an optimal mapping trajectory. We describe the outline of a novel ILP-model for allocation of POM-processor arrays which takes into account array-topology and bus bandwidth constraints. Finally we discuss implementational aspects of the POM as well as applications in image processing. We especially show that POM-processor arrays can be integrated onto single chips, thereby allowing to achieve several GOPS processing power per chip.
{"title":"POM: a processor model for image processing","authors":"Jean-Paul Theis, L. Thiele","doi":"10.1109/ICCD.1995.528829","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528829","url":null,"abstract":"In this paper, we describe a new processor model called Periodic Operation Model (POM) that is suitable for real time image processing. First we analyze existing image processing systems in order to situate our approach. Starting from the processor architecture, we derive the corresponding algorithm class by means of a novel hardware description. Then we address the allocation and scheduling problem. We show that allocation and scheduling can be decoupled in the mapping process related to POM-processor arrays and outline the principles of an optimal mapping trajectory. We describe the outline of a novel ILP-model for allocation of POM-processor arrays which takes into account array-topology and bus bandwidth constraints. Finally we discuss implementational aspects of the POM as well as applications in image processing. We especially show that POM-processor arrays can be integrated onto single chips, thereby allowing to achieve several GOPS processing power per chip.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125303672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}