Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528798
J. Wellman, E. Davidson
In this paper we propose a new execution trace driven simulation technique, called the Resource Conflict Methodology (RCM) for modeling and simulating computer systems early in the design cycle. By using a simplified hardware element model which allows the user to easily add or delete hardware elements in the model, RCM allows the user to readily change the machine design being investigated and to evaluate the resulting machine on a given workload. We describe the RCM model with reference to a family of superscalar processors and develop an RCM-based analysis program (called REAP) for this family of processors. Using REAP, we demonstrate the validity of our method by comparing its RCM performance estimates to those of a traditional early design stage timer model.
{"title":"The resource conflict methodology for early-stage design space exploration of superscalar RISC processors","authors":"J. Wellman, E. Davidson","doi":"10.1109/ICCD.1995.528798","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528798","url":null,"abstract":"In this paper we propose a new execution trace driven simulation technique, called the Resource Conflict Methodology (RCM) for modeling and simulating computer systems early in the design cycle. By using a simplified hardware element model which allows the user to easily add or delete hardware elements in the model, RCM allows the user to readily change the machine design being investigated and to evaluate the resulting machine on a given workload. We describe the RCM model with reference to a family of superscalar processors and develop an RCM-based analysis program (called REAP) for this family of processors. Using REAP, we demonstrate the validity of our method by comparing its RCM performance estimates to those of a traditional early design stage timer model.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126581743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528824
I. Pomeranz, S. Reddy
A test generation procedure to detect multiple state-table faults in finite-state machines is proposed. The importance of multiple state-table faults and their advantages as test generation objectives to avoid the need for checking experiments are considered. The proposed procedure is based on a new method for implicit enumeration of large numbers of multiple faults by using incompletely specified faulty machines. Experimental results are presented to demonstrate the effectiveness of implicit fault enumeration in detecting large numbers of multiple faults.
{"title":"Test generation for multiple state-table faults in finite-state machines","authors":"I. Pomeranz, S. Reddy","doi":"10.1109/ICCD.1995.528824","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528824","url":null,"abstract":"A test generation procedure to detect multiple state-table faults in finite-state machines is proposed. The importance of multiple state-table faults and their advantages as test generation objectives to avoid the need for checking experiments are considered. The proposed procedure is based on a new method for implicit enumeration of large numbers of multiple faults by using incompletely specified faulty machines. Experimental results are presented to demonstrate the effectiveness of implicit fault enumeration in detecting large numbers of multiple faults.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126346757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528906
S. Katkoori, Nand Kumar, R. Vemuri
We present a profiling based technique for power estimation. This technique is implemented in the PDSS (Profile Driven Synthesis System) for the synthesis of low power designs. Initially, each module in the module library is characterized for the average switching capacitance per input vector. The input description is simulated using user-specified set of input vectors to collect the profile data for various operators and carriers. The profile data, in conjunction with the pre-characterized module library is used to estimate the total capacitance switched by each of the valid schedules produced by the PDSS scheduler. A valid schedule is one which satisfies other constants such as area and delay. The schedule with the least switching capacitance estimate is further synthesized to the layout level. Results show an average deviation of 12% compared with the actual switching capacitance values at the layout level.
{"title":"High level profiling based low power synthesis technique","authors":"S. Katkoori, Nand Kumar, R. Vemuri","doi":"10.1109/ICCD.1995.528906","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528906","url":null,"abstract":"We present a profiling based technique for power estimation. This technique is implemented in the PDSS (Profile Driven Synthesis System) for the synthesis of low power designs. Initially, each module in the module library is characterized for the average switching capacitance per input vector. The input description is simulated using user-specified set of input vectors to collect the profile data for various operators and carriers. The profile data, in conjunction with the pre-characterized module library is used to estimate the total capacitance switched by each of the valid schedules produced by the PDSS scheduler. A valid schedule is one which satisfies other constants such as area and delay. The schedule with the least switching capacitance estimate is further synthesized to the layout level. Results show an average deviation of 12% compared with the actual switching capacitance values at the layout level.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126858125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528910
H. Srinivas, K. Parhi
This paper presents the architecture and implementation of a full-custom 1.2 micron CMOS VLSI chip that executes a shared division/square root algorithm operating on mantissas (23-b in length) of single precision IEEE 754 std. floating point numbers. The division and square root algorithms used in this implementation are the radix 2 signed digit based digit-by-digit schemes. These two algorithms perform quotient/root digit selection using two most-significant digits of the partial remainder and are hence faster than other similar previously proposed radix 2 shared division/square root schemes. This chip runs at a clock rate of about 66 MHz at 5.0 V (from simulations) and requires 29 cycles per divide/square root operation from the time the operands are provided at its pin inputs.
{"title":"A floating point radix 2 shared division/square root chip","authors":"H. Srinivas, K. Parhi","doi":"10.1109/ICCD.1995.528910","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528910","url":null,"abstract":"This paper presents the architecture and implementation of a full-custom 1.2 micron CMOS VLSI chip that executes a shared division/square root algorithm operating on mantissas (23-b in length) of single precision IEEE 754 std. floating point numbers. The division and square root algorithms used in this implementation are the radix 2 signed digit based digit-by-digit schemes. These two algorithms perform quotient/root digit selection using two most-significant digits of the partial remainder and are hence faster than other similar previously proposed radix 2 shared division/square root schemes. This chip runs at a clock rate of about 66 MHz at 5.0 V (from simulations) and requires 29 cycles per divide/square root operation from the time the operands are provided at its pin inputs.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126629586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528834
F. Beeftink, A. V. Genderen, N. V. D. Meijs
In this paper, we describe how we have exploited the advantages of various methods for device recognition and modeling in a layout-to-circuit extractor, called Space. Hence, we have obtained a program that, for different technologies, can quickly translate a large layout into an equivalent network. The network includes layout parasitics of the interconnects and can directly be simulated by various simulation packages, such as Spice. The efficiency and accuracy of the extractor are confirmed by experimental results and enable a fast and reliable layout verification for both MOS and bipolar/BiCMOS technologies.
{"title":"Accurate and efficient layout-to-circuit extraction for high-speed MOS and bipolar/BiCMOS integrated circuits","authors":"F. Beeftink, A. V. Genderen, N. V. D. Meijs","doi":"10.1109/ICCD.1995.528834","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528834","url":null,"abstract":"In this paper, we describe how we have exploited the advantages of various methods for device recognition and modeling in a layout-to-circuit extractor, called Space. Hence, we have obtained a program that, for different technologies, can quickly translate a large layout into an equivalent network. The network includes layout parasitics of the interconnects and can directly be simulated by various simulation packages, such as Spice. The efficiency and accuracy of the extractor are confirmed by experimental results and enable a fast and reliable layout verification for both MOS and bipolar/BiCMOS technologies.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528935
R. Krishnamurthy, R. Sridhar
This paper presents the implementation of a high-speed morphological image processor using CMOS wave-pipelining. A modular and expandable architecture, based on wave-pipelined transmission gate logic, has been developed for gray-scale and binary morphological operators. Using this architecture, 3/spl times/3 (2-dimensional) structuring element binary dilation and erosion units, and a two-stage morphological skeleton transform filter have been implemented in CMOS 1.2 /spl mu/m technology. The operating frequency is 333 MHz, which exceeds the speeds reported in literature for this functionality. Simulation results indicate a speed-up of 4-5 compared to non-pipelined processor implementations. The wave-pipelined implementation also offers a significant reduction in latency and hardware complexity compared to regular pipelined architectures.
{"title":"A CMOS wave-pipelined image processor for real-time morphology","authors":"R. Krishnamurthy, R. Sridhar","doi":"10.1109/ICCD.1995.528935","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528935","url":null,"abstract":"This paper presents the implementation of a high-speed morphological image processor using CMOS wave-pipelining. A modular and expandable architecture, based on wave-pipelined transmission gate logic, has been developed for gray-scale and binary morphological operators. Using this architecture, 3/spl times/3 (2-dimensional) structuring element binary dilation and erosion units, and a two-stage morphological skeleton transform filter have been implemented in CMOS 1.2 /spl mu/m technology. The operating frequency is 333 MHz, which exceeds the speeds reported in literature for this functionality. Simulation results indicate a speed-up of 4-5 compared to non-pipelined processor implementations. The wave-pipelined implementation also offers a significant reduction in latency and hardware complexity compared to regular pipelined architectures.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122630613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528783
B. Wah, Arthur Ieumwananonthachai, Shu Yao, T. Yu
In this paper, we discuss a new approach to generalize heuristic methods (HMs) to new test cases of an application, and conditions under which such generalization is possible. Generalization is difficult when performance values of HMs are characterized by multiple statistical distributions across subsets of test cases of an application. We define a new measure called probability of win and propose three methods to evaluate it: interval analysis, maximum likelihood estimate, and Bayesian analysis. We show experimental results on new HMs found for blind equalization and branch-and-bound search.
{"title":"Statistical generalization: theory and applications","authors":"B. Wah, Arthur Ieumwananonthachai, Shu Yao, T. Yu","doi":"10.1109/ICCD.1995.528783","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528783","url":null,"abstract":"In this paper, we discuss a new approach to generalize heuristic methods (HMs) to new test cases of an application, and conditions under which such generalization is possible. Generalization is difficult when performance values of HMs are characterized by multiple statistical distributions across subsets of test cases of an application. We define a new measure called probability of win and propose three methods to evaluate it: interval analysis, maximum likelihood estimate, and Bayesian analysis. We show experimental results on new HMs found for blind equalization and branch-and-bound search.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"69 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121013929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528933
B. Grayson, S. Shaikh, S. Szygenda
Basic data of the nature presented here on fault and design error simulation processes have not been previously reported. Experiments are performed on c-sim, a gate level concurrent simulator developed at the University of Texas at Austin. Three types of statistics are considered: event based statistics, gate evaluation statistics and memory requirements. These statistics are important for design verification researchers and engineers for numerous reasons. For example, they help simulator developers tune up or optimize their concurrent simulators. They also fulfill the increasing need for experimental data concerning design error simulation. Most importantly, these statistics provide guidance to hardware accelerator designers in evaluating and comparing various design options.
{"title":"Statistics on concurrent fault and design error simulation","authors":"B. Grayson, S. Shaikh, S. Szygenda","doi":"10.1109/ICCD.1995.528933","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528933","url":null,"abstract":"Basic data of the nature presented here on fault and design error simulation processes have not been previously reported. Experiments are performed on c-sim, a gate level concurrent simulator developed at the University of Texas at Austin. Three types of statistics are considered: event based statistics, gate evaluation statistics and memory requirements. These statistics are important for design verification researchers and engineers for numerous reasons. For example, they help simulator developers tune up or optimize their concurrent simulators. They also fulfill the increasing need for experimental data concerning design error simulation. Most importantly, these statistics provide guidance to hardware accelerator designers in evaluating and comparing various design options.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528835
Jin-Tai Yan
In this paper, based on the assumptions of the geometrical topology in a floorplan graph and the precedence relations in a channel precedence graph, the cuts are further classified into S-cuts, redundant L-cuts, balanced L-cuts, non-minimal L-cuts, non-critical L-cuts and critical L-cuts. An efficient cut-based algorithm on minimizing the number of L-shaped channels is proposed. The time complexity of the algorithm is proved to be in O(n) time, where n is the number of line segments in a floorplan graph. Finally, several examples have been tested on Dai's and Cai's algorithms and the proposed algorithm. The experimental results show that the proposed algorithm defines fewer L-shaped channels than Dai's and Cai's algorithms in the definition of straight and L-shaped channels for the assignment of safe routing ordering.
{"title":"An efficient cut-based algorithm on minimizing the number of L-shaped channels for safe routing ordering","authors":"Jin-Tai Yan","doi":"10.1109/ICCD.1995.528835","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528835","url":null,"abstract":"In this paper, based on the assumptions of the geometrical topology in a floorplan graph and the precedence relations in a channel precedence graph, the cuts are further classified into S-cuts, redundant L-cuts, balanced L-cuts, non-minimal L-cuts, non-critical L-cuts and critical L-cuts. An efficient cut-based algorithm on minimizing the number of L-shaped channels is proposed. The time complexity of the algorithm is proved to be in O(n) time, where n is the number of line segments in a floorplan graph. Finally, several examples have been tested on Dai's and Cai's algorithms and the proposed algorithm. The experimental results show that the proposed algorithm defines fewer L-shaped channels than Dai's and Cai's algorithms in the definition of straight and L-shaped channels for the assignment of safe routing ordering.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115354548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528815
Vinod Narayananan, D. LaPotin, Rajesh K. Gupta, G. Vijayan
With increasing chip complexities and the requirement to reduce design time, early analysis is becoming increasingly important in the design of performance critical CMOS chips. As clock rates increase rapidly, interconnect delay consumes an appreciable portion of the chip cycle time, and the floorplan of the chip significantly affects its performance. This paper describes a system for early floorplan analysis of large designs. The floorplanner is designed to be used in the early stages of system design, to optimize performance, area and wireability targets before detailed implementation decisions are made. Most floorplanners which claim to optimize timing work only on a subset of paths during the floorplanning process. One novel feature of our floorplanner is that it performs static timing analysis during the floorplan optimization process, instead of working on a subset of the paths. The floorplanner incorporates various interactive and automatic floorplanning capabilities. The paper describes the floorplanning capabilities and algorithms as well as our experiences in using the tool.
{"title":"PEPPER-a timing driven early floorplanner","authors":"Vinod Narayananan, D. LaPotin, Rajesh K. Gupta, G. Vijayan","doi":"10.1109/ICCD.1995.528815","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528815","url":null,"abstract":"With increasing chip complexities and the requirement to reduce design time, early analysis is becoming increasingly important in the design of performance critical CMOS chips. As clock rates increase rapidly, interconnect delay consumes an appreciable portion of the chip cycle time, and the floorplan of the chip significantly affects its performance. This paper describes a system for early floorplan analysis of large designs. The floorplanner is designed to be used in the early stages of system design, to optimize performance, area and wireability targets before detailed implementation decisions are made. Most floorplanners which claim to optimize timing work only on a subset of paths during the floorplanning process. One novel feature of our floorplanner is that it performs static timing analysis during the floorplan optimization process, instead of working on a subset of the paths. The floorplanner incorporates various interactive and automatic floorplanning capabilities. The paper describes the floorplanning capabilities and algorithms as well as our experiences in using the tool.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116170476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}