Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528910
H. Srinivas, K. Parhi
This paper presents the architecture and implementation of a full-custom 1.2 micron CMOS VLSI chip that executes a shared division/square root algorithm operating on mantissas (23-b in length) of single precision IEEE 754 std. floating point numbers. The division and square root algorithms used in this implementation are the radix 2 signed digit based digit-by-digit schemes. These two algorithms perform quotient/root digit selection using two most-significant digits of the partial remainder and are hence faster than other similar previously proposed radix 2 shared division/square root schemes. This chip runs at a clock rate of about 66 MHz at 5.0 V (from simulations) and requires 29 cycles per divide/square root operation from the time the operands are provided at its pin inputs.
{"title":"A floating point radix 2 shared division/square root chip","authors":"H. Srinivas, K. Parhi","doi":"10.1109/ICCD.1995.528910","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528910","url":null,"abstract":"This paper presents the architecture and implementation of a full-custom 1.2 micron CMOS VLSI chip that executes a shared division/square root algorithm operating on mantissas (23-b in length) of single precision IEEE 754 std. floating point numbers. The division and square root algorithms used in this implementation are the radix 2 signed digit based digit-by-digit schemes. These two algorithms perform quotient/root digit selection using two most-significant digits of the partial remainder and are hence faster than other similar previously proposed radix 2 shared division/square root schemes. This chip runs at a clock rate of about 66 MHz at 5.0 V (from simulations) and requires 29 cycles per divide/square root operation from the time the operands are provided at its pin inputs.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126629586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528906
S. Katkoori, Nand Kumar, R. Vemuri
We present a profiling based technique for power estimation. This technique is implemented in the PDSS (Profile Driven Synthesis System) for the synthesis of low power designs. Initially, each module in the module library is characterized for the average switching capacitance per input vector. The input description is simulated using user-specified set of input vectors to collect the profile data for various operators and carriers. The profile data, in conjunction with the pre-characterized module library is used to estimate the total capacitance switched by each of the valid schedules produced by the PDSS scheduler. A valid schedule is one which satisfies other constants such as area and delay. The schedule with the least switching capacitance estimate is further synthesized to the layout level. Results show an average deviation of 12% compared with the actual switching capacitance values at the layout level.
{"title":"High level profiling based low power synthesis technique","authors":"S. Katkoori, Nand Kumar, R. Vemuri","doi":"10.1109/ICCD.1995.528906","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528906","url":null,"abstract":"We present a profiling based technique for power estimation. This technique is implemented in the PDSS (Profile Driven Synthesis System) for the synthesis of low power designs. Initially, each module in the module library is characterized for the average switching capacitance per input vector. The input description is simulated using user-specified set of input vectors to collect the profile data for various operators and carriers. The profile data, in conjunction with the pre-characterized module library is used to estimate the total capacitance switched by each of the valid schedules produced by the PDSS scheduler. A valid schedule is one which satisfies other constants such as area and delay. The schedule with the least switching capacitance estimate is further synthesized to the layout level. Results show an average deviation of 12% compared with the actual switching capacitance values at the layout level.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126858125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528907
Chuan-Yu Wang, K. Roy
With demands for reliability and further integration, reducing power consumption becomes a critical concern in today's processor design. Considering the different techniques to minimize power consumption and promote system's reliability, reducing switching activity of CMOS circuits is a promising area to be explored. Motivated by these, we propose two optimization schemes which can be incorporated into processor's control unit synthesis to lower power dissipation. The first one, a low-power decoding scheme, utilizes graph embedding and logic minimization techniques to refine the decoding structure in processor's control unit. To get further optimization for those control units in nanoprogrammed or microprogrammed architecture, the second scheme is proposed to optimally assign ZERO or ONE to the don't-care bits distributed in nanocontrol memory or control memory, to significantly reduce switching activity within the control unit and/or on the path from control unit to data processing unit. To achieve these two goals efficiently, we have used pseudo-Boolean programming to optimize the synthesis parameters. Based on a subset of 8086 instruction set, experimental results show that 15.8 percent improvement is obtained by properly encoding instruction opcodes, and 4.9 to 16.6 percent improvement can be obtained from a optimal don't-care bits assignment.
{"title":"Control unit synthesis targeting low-power processors","authors":"Chuan-Yu Wang, K. Roy","doi":"10.1109/ICCD.1995.528907","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528907","url":null,"abstract":"With demands for reliability and further integration, reducing power consumption becomes a critical concern in today's processor design. Considering the different techniques to minimize power consumption and promote system's reliability, reducing switching activity of CMOS circuits is a promising area to be explored. Motivated by these, we propose two optimization schemes which can be incorporated into processor's control unit synthesis to lower power dissipation. The first one, a low-power decoding scheme, utilizes graph embedding and logic minimization techniques to refine the decoding structure in processor's control unit. To get further optimization for those control units in nanoprogrammed or microprogrammed architecture, the second scheme is proposed to optimally assign ZERO or ONE to the don't-care bits distributed in nanocontrol memory or control memory, to significantly reduce switching activity within the control unit and/or on the path from control unit to data processing unit. To achieve these two goals efficiently, we have used pseudo-Boolean programming to optimize the synthesis parameters. Based on a subset of 8086 instruction set, experimental results show that 15.8 percent improvement is obtained by properly encoding instruction opcodes, and 4.9 to 16.6 percent improvement can be obtained from a optimal don't-care bits assignment.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134561567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528937
A. Ejnioui, N. Ranganathan
The objective of tree matching is to find the set of nodes at which a pattern tree matches a subject tree. Several sequential and parallel algorithms have been proposed in the literature for this compute bound problem. Most of the parallel algorithms are based on the theoretical PRAM model of computation. In this paper, we propose two efficient parallel algorithms for tree pattern matching based on the linear systolic array model. The algorithms can be mapped onto any SIMD machine. The algorithms require O(n+m) time to perform the matching using either n or m processors, where n is the size of the subject tree and m is the size of the pattern tree. The algorithms represent a significant improvement over the existing ones in view of implementation.
{"title":"Systolic algorithms for tree pattern matching","authors":"A. Ejnioui, N. Ranganathan","doi":"10.1109/ICCD.1995.528937","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528937","url":null,"abstract":"The objective of tree matching is to find the set of nodes at which a pattern tree matches a subject tree. Several sequential and parallel algorithms have been proposed in the literature for this compute bound problem. Most of the parallel algorithms are based on the theoretical PRAM model of computation. In this paper, we propose two efficient parallel algorithms for tree pattern matching based on the linear systolic array model. The algorithms can be mapped onto any SIMD machine. The algorithms require O(n+m) time to perform the matching using either n or m processors, where n is the size of the subject tree and m is the size of the pattern tree. The algorithms represent a significant improvement over the existing ones in view of implementation.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528834
F. Beeftink, A. V. Genderen, N. V. D. Meijs
In this paper, we describe how we have exploited the advantages of various methods for device recognition and modeling in a layout-to-circuit extractor, called Space. Hence, we have obtained a program that, for different technologies, can quickly translate a large layout into an equivalent network. The network includes layout parasitics of the interconnects and can directly be simulated by various simulation packages, such as Spice. The efficiency and accuracy of the extractor are confirmed by experimental results and enable a fast and reliable layout verification for both MOS and bipolar/BiCMOS technologies.
{"title":"Accurate and efficient layout-to-circuit extraction for high-speed MOS and bipolar/BiCMOS integrated circuits","authors":"F. Beeftink, A. V. Genderen, N. V. D. Meijs","doi":"10.1109/ICCD.1995.528834","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528834","url":null,"abstract":"In this paper, we describe how we have exploited the advantages of various methods for device recognition and modeling in a layout-to-circuit extractor, called Space. Hence, we have obtained a program that, for different technologies, can quickly translate a large layout into an equivalent network. The network includes layout parasitics of the interconnects and can directly be simulated by various simulation packages, such as Spice. The efficiency and accuracy of the extractor are confirmed by experimental results and enable a fast and reliable layout verification for both MOS and bipolar/BiCMOS technologies.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122368963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528935
R. Krishnamurthy, R. Sridhar
This paper presents the implementation of a high-speed morphological image processor using CMOS wave-pipelining. A modular and expandable architecture, based on wave-pipelined transmission gate logic, has been developed for gray-scale and binary morphological operators. Using this architecture, 3/spl times/3 (2-dimensional) structuring element binary dilation and erosion units, and a two-stage morphological skeleton transform filter have been implemented in CMOS 1.2 /spl mu/m technology. The operating frequency is 333 MHz, which exceeds the speeds reported in literature for this functionality. Simulation results indicate a speed-up of 4-5 compared to non-pipelined processor implementations. The wave-pipelined implementation also offers a significant reduction in latency and hardware complexity compared to regular pipelined architectures.
{"title":"A CMOS wave-pipelined image processor for real-time morphology","authors":"R. Krishnamurthy, R. Sridhar","doi":"10.1109/ICCD.1995.528935","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528935","url":null,"abstract":"This paper presents the implementation of a high-speed morphological image processor using CMOS wave-pipelining. A modular and expandable architecture, based on wave-pipelined transmission gate logic, has been developed for gray-scale and binary morphological operators. Using this architecture, 3/spl times/3 (2-dimensional) structuring element binary dilation and erosion units, and a two-stage morphological skeleton transform filter have been implemented in CMOS 1.2 /spl mu/m technology. The operating frequency is 333 MHz, which exceeds the speeds reported in literature for this functionality. Simulation results indicate a speed-up of 4-5 compared to non-pipelined processor implementations. The wave-pipelined implementation also offers a significant reduction in latency and hardware complexity compared to regular pipelined architectures.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122630613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528783
B. Wah, Arthur Ieumwananonthachai, Shu Yao, T. Yu
In this paper, we discuss a new approach to generalize heuristic methods (HMs) to new test cases of an application, and conditions under which such generalization is possible. Generalization is difficult when performance values of HMs are characterized by multiple statistical distributions across subsets of test cases of an application. We define a new measure called probability of win and propose three methods to evaluate it: interval analysis, maximum likelihood estimate, and Bayesian analysis. We show experimental results on new HMs found for blind equalization and branch-and-bound search.
{"title":"Statistical generalization: theory and applications","authors":"B. Wah, Arthur Ieumwananonthachai, Shu Yao, T. Yu","doi":"10.1109/ICCD.1995.528783","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528783","url":null,"abstract":"In this paper, we discuss a new approach to generalize heuristic methods (HMs) to new test cases of an application, and conditions under which such generalization is possible. Generalization is difficult when performance values of HMs are characterized by multiple statistical distributions across subsets of test cases of an application. We define a new measure called probability of win and propose three methods to evaluate it: interval analysis, maximum likelihood estimate, and Bayesian analysis. We show experimental results on new HMs found for blind equalization and branch-and-bound search.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"69 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121013929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528933
B. Grayson, S. Shaikh, S. Szygenda
Basic data of the nature presented here on fault and design error simulation processes have not been previously reported. Experiments are performed on c-sim, a gate level concurrent simulator developed at the University of Texas at Austin. Three types of statistics are considered: event based statistics, gate evaluation statistics and memory requirements. These statistics are important for design verification researchers and engineers for numerous reasons. For example, they help simulator developers tune up or optimize their concurrent simulators. They also fulfill the increasing need for experimental data concerning design error simulation. Most importantly, these statistics provide guidance to hardware accelerator designers in evaluating and comparing various design options.
{"title":"Statistics on concurrent fault and design error simulation","authors":"B. Grayson, S. Shaikh, S. Szygenda","doi":"10.1109/ICCD.1995.528933","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528933","url":null,"abstract":"Basic data of the nature presented here on fault and design error simulation processes have not been previously reported. Experiments are performed on c-sim, a gate level concurrent simulator developed at the University of Texas at Austin. Three types of statistics are considered: event based statistics, gate evaluation statistics and memory requirements. These statistics are important for design verification researchers and engineers for numerous reasons. For example, they help simulator developers tune up or optimize their concurrent simulators. They also fulfill the increasing need for experimental data concerning design error simulation. Most importantly, these statistics provide guidance to hardware accelerator designers in evaluating and comparing various design options.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121446020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528835
Jin-Tai Yan
In this paper, based on the assumptions of the geometrical topology in a floorplan graph and the precedence relations in a channel precedence graph, the cuts are further classified into S-cuts, redundant L-cuts, balanced L-cuts, non-minimal L-cuts, non-critical L-cuts and critical L-cuts. An efficient cut-based algorithm on minimizing the number of L-shaped channels is proposed. The time complexity of the algorithm is proved to be in O(n) time, where n is the number of line segments in a floorplan graph. Finally, several examples have been tested on Dai's and Cai's algorithms and the proposed algorithm. The experimental results show that the proposed algorithm defines fewer L-shaped channels than Dai's and Cai's algorithms in the definition of straight and L-shaped channels for the assignment of safe routing ordering.
{"title":"An efficient cut-based algorithm on minimizing the number of L-shaped channels for safe routing ordering","authors":"Jin-Tai Yan","doi":"10.1109/ICCD.1995.528835","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528835","url":null,"abstract":"In this paper, based on the assumptions of the geometrical topology in a floorplan graph and the precedence relations in a channel precedence graph, the cuts are further classified into S-cuts, redundant L-cuts, balanced L-cuts, non-minimal L-cuts, non-critical L-cuts and critical L-cuts. An efficient cut-based algorithm on minimizing the number of L-shaped channels is proposed. The time complexity of the algorithm is proved to be in O(n) time, where n is the number of line segments in a floorplan graph. Finally, several examples have been tested on Dai's and Cai's algorithms and the proposed algorithm. The experimental results show that the proposed algorithm defines fewer L-shaped channels than Dai's and Cai's algorithms in the definition of straight and L-shaped channels for the assignment of safe routing ordering.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115354548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528815
Vinod Narayananan, D. LaPotin, Rajesh K. Gupta, G. Vijayan
With increasing chip complexities and the requirement to reduce design time, early analysis is becoming increasingly important in the design of performance critical CMOS chips. As clock rates increase rapidly, interconnect delay consumes an appreciable portion of the chip cycle time, and the floorplan of the chip significantly affects its performance. This paper describes a system for early floorplan analysis of large designs. The floorplanner is designed to be used in the early stages of system design, to optimize performance, area and wireability targets before detailed implementation decisions are made. Most floorplanners which claim to optimize timing work only on a subset of paths during the floorplanning process. One novel feature of our floorplanner is that it performs static timing analysis during the floorplan optimization process, instead of working on a subset of the paths. The floorplanner incorporates various interactive and automatic floorplanning capabilities. The paper describes the floorplanning capabilities and algorithms as well as our experiences in using the tool.
{"title":"PEPPER-a timing driven early floorplanner","authors":"Vinod Narayananan, D. LaPotin, Rajesh K. Gupta, G. Vijayan","doi":"10.1109/ICCD.1995.528815","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528815","url":null,"abstract":"With increasing chip complexities and the requirement to reduce design time, early analysis is becoming increasingly important in the design of performance critical CMOS chips. As clock rates increase rapidly, interconnect delay consumes an appreciable portion of the chip cycle time, and the floorplan of the chip significantly affects its performance. This paper describes a system for early floorplan analysis of large designs. The floorplanner is designed to be used in the early stages of system design, to optimize performance, area and wireability targets before detailed implementation decisions are made. Most floorplanners which claim to optimize timing work only on a subset of paths during the floorplanning process. One novel feature of our floorplanner is that it performs static timing analysis during the floorplan optimization process, instead of working on a subset of the paths. The floorplanner incorporates various interactive and automatic floorplanning capabilities. The paper describes the floorplanning capabilities and algorithms as well as our experiences in using the tool.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116170476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}