Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665323
M. Franklin, Prithvi Prabhu
In this paper, we present methods for improving the performance of self-timed computation blocks. The Hybrid Completion method permits the design of a spectrum of completion circuits ranging from those based on pure bounded delays to those based on full complementary circuit development. This is achieved by using a subset of the outputs of the computation block to generate the overall completion signal. Thus, the extra circuitry for the completion signals of the other outputs is eliminated. The computation block's delay might also be reduced since fewer signals are required to generate the overall completion signal. The approach seeks to incorporate the area efficiency of the bounded delay approach and the operand based delay sensitivity of the full complementary approach.
{"title":"Performance optimization of self-timed circuits","authors":"M. Franklin, Prithvi Prabhu","doi":"10.1109/GLSV.1998.665323","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665323","url":null,"abstract":"In this paper, we present methods for improving the performance of self-timed computation blocks. The Hybrid Completion method permits the design of a spectrum of completion circuits ranging from those based on pure bounded delays to those based on full complementary circuit development. This is achieved by using a subset of the outputs of the computation block to generate the overall completion signal. Thus, the extra circuitry for the completion signals of the other outputs is eliminated. The computation block's delay might also be reduced since fewer signals are required to generate the overall completion signal. The approach seeks to incorporate the area efficiency of the bounded delay approach and the operand based delay sensitivity of the full complementary approach.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132422744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665219
E. Senn, B. Zavidovique
This paper describes the implementation of the self timed asynchronous router in a parallel machine. The heterogenous architecture of the machine is outlined, then the need for asynchronous operations is explained, and the interest in an asynchronous network control. The specification and VLSI design of the router are exhibited with its measured performances.
{"title":"A self timed asynchronous router for an heterogeneous parallel machine","authors":"E. Senn, B. Zavidovique","doi":"10.1109/GLSV.1998.665219","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665219","url":null,"abstract":"This paper describes the implementation of the self timed asynchronous router in a parallel machine. The heterogenous architecture of the machine is outlined, then the need for asynchronous operations is explained, and the interest in an asynchronous network control. The specification and VLSI design of the router are exhibited with its measured performances.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131578349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665243
H. Soeleman, D. Somasekhar, K. Roy
This paper describes a test method which relies on the actual observation of supply current (I/sub DD/) waveforms. The method can be used to supplement the standard I/sub DDQ/ test method and it can be easily applied to dynamic and low V/sub DD/, low V/sub T/ CMOS circuits. The method allows us to detect faults which may not be detected by I/sub DDQ/ test methods, and is sensitive enough to detect potential faults, which do not manifest themselves as functional errors. A simple built-in current sensor, which proves to be adequate in verifying the feasibility of using the I/sub DD/ waveforms analysis is proposed to safely observe the current waveforms without significantly changing the waveforms.
{"title":"I/sub DD/ waveforms analysis for testing of domino and low voltage static CMOS circuits","authors":"H. Soeleman, D. Somasekhar, K. Roy","doi":"10.1109/GLSV.1998.665243","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665243","url":null,"abstract":"This paper describes a test method which relies on the actual observation of supply current (I/sub DD/) waveforms. The method can be used to supplement the standard I/sub DDQ/ test method and it can be easily applied to dynamic and low V/sub DD/, low V/sub T/ CMOS circuits. The method allows us to detect faults which may not be detected by I/sub DDQ/ test methods, and is sensitive enough to detect potential faults, which do not manifest themselves as functional errors. A simple built-in current sensor, which proves to be adequate in verifying the feasibility of using the I/sub DD/ waveforms analysis is proposed to safely observe the current waveforms without significantly changing the waveforms.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114857865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665299
S. M. Sait, H. Youssef, M. M. Zahra
In this paper we address the problem of optimizing mixed CMOS/BiCMOS circuits. The problem is formulated as a constrained combinatorial optimization problem and solved using an tabu search algorithm. Only gates on the critical sensitizable paths are considered for optimization. Such a strategy leads to sizable circuit speed improvement with minimum increase in the overall circuit capacitance. Compared to earlier approaches, the presented technique produces circuits with remarkable increase in speed (greater than 20%) for very small increase in overall circuit capacitance (less than 3%).
{"title":"Tabu search based circuit optimization","authors":"S. M. Sait, H. Youssef, M. M. Zahra","doi":"10.1109/GLSV.1998.665299","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665299","url":null,"abstract":"In this paper we address the problem of optimizing mixed CMOS/BiCMOS circuits. The problem is formulated as a constrained combinatorial optimization problem and solved using an tabu search algorithm. Only gates on the critical sensitizable paths are considered for optimization. Such a strategy leads to sizable circuit speed improvement with minimum increase in the overall circuit capacitance. Compared to earlier approaches, the presented technique produces circuits with remarkable increase in speed (greater than 20%) for very small increase in overall circuit capacitance (less than 3%).","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123602779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665289
L. Benini, G. Micheli, A. Lioy, E. Macii, G. Odasso, M. Poncino
In high-performance systems, variable-latency units are often employed to improve the average throughput when the worst-case delay exceeds the cycle time. Although such units have traditionally been hand-designed, recent results have shown that variable-latency units can be automatically generated. Unfortunately, the existing synthesis procedure has limited applicability due to its computational complexity. In this work, we define and study an optimization problem, timed supersetting, whose solution is at the kernel of the procedure for automatic generation of variable-latency units. We contribute a new algorithm for solving timed supersetting in the most difficult case, that is, when the timing behaviour of the circuits is expressed through an accurate delay model. The proposed solution overcomes the complexity limitation of previous approaches, and its robustness is experimentally demonstrated by obtaining high-throughput, variable-latency implementations for all the largest circuits in the Iscas'85 and Iscas'89 benchmark suites.
{"title":"Timed supersetting and the synthesis of large telescopic units","authors":"L. Benini, G. Micheli, A. Lioy, E. Macii, G. Odasso, M. Poncino","doi":"10.1109/GLSV.1998.665289","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665289","url":null,"abstract":"In high-performance systems, variable-latency units are often employed to improve the average throughput when the worst-case delay exceeds the cycle time. Although such units have traditionally been hand-designed, recent results have shown that variable-latency units can be automatically generated. Unfortunately, the existing synthesis procedure has limited applicability due to its computational complexity. In this work, we define and study an optimization problem, timed supersetting, whose solution is at the kernel of the procedure for automatic generation of variable-latency units. We contribute a new algorithm for solving timed supersetting in the most difficult case, that is, when the timing behaviour of the circuits is expressed through an accurate delay model. The proposed solution overcomes the complexity limitation of previous approaches, and its robustness is experimentally demonstrated by obtaining high-throughput, variable-latency implementations for all the largest circuits in the Iscas'85 and Iscas'89 benchmark suites.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124700690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665190
B. Kapoor
We provide data and insight into how the choice of cache parameters affects memory power consumption of video algorithms. We make use of memory traces generated as a result of running typical MPEG-2 motion estimation algorithms to simulate a large number of cache configurations. The cache simulation data is then combined with on-chip and off-chip memory power models to compute memory power consumption. In the area of analysis of video algorithms, this paper focuses on the following issues: we provide a detailed study of how varying cache size, block size, and associativity affects memory power consumption. The configurations of particular interest are the ones that optimize power under certain constraints. We also study the role of process technology in these experiments. In particular, we look at how moving to a more advanced process technology for the on-chip cache affects optimal points of operation with respect to memory power consumption.
{"title":"Low power memory architectures for video applications","authors":"B. Kapoor","doi":"10.1109/GLSV.1998.665190","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665190","url":null,"abstract":"We provide data and insight into how the choice of cache parameters affects memory power consumption of video algorithms. We make use of memory traces generated as a result of running typical MPEG-2 motion estimation algorithms to simulate a large number of cache configurations. The cache simulation data is then combined with on-chip and off-chip memory power models to compute memory power consumption. In the area of analysis of video algorithms, this paper focuses on the following issues: we provide a detailed study of how varying cache size, block size, and associativity affects memory power consumption. The configurations of particular interest are the ones that optimize power under certain constraints. We also study the role of process technology in these experiments. In particular, we look at how moving to a more advanced process technology for the on-chip cache affects optimal points of operation with respect to memory power consumption.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117214320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665221
Yuke Wang, Xiaoyu Song, E. Aboulhamid
This paper proposes three new residue-to-binary converters using 2n- bit or n-bit adders for the three moduli residue number system of the form (2/sup n/-1, 2/sup n/, 2/sup n/+1). The 2n-bit adder based converter is faster and requires about half of the hardware required by previous methods. For n-bit adder based implementations, one new converter is twice as fast as the previous method using similar amount of hardware; while another new converter achieves improvement in both speed and area.
{"title":"Residue to binary number converters for (2/sup n/-1, 2/sup n/, 2/sup n/+1)","authors":"Yuke Wang, Xiaoyu Song, E. Aboulhamid","doi":"10.1109/GLSV.1998.665221","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665221","url":null,"abstract":"This paper proposes three new residue-to-binary converters using 2n- bit or n-bit adders for the three moduli residue number system of the form (2/sup n/-1, 2/sup n/, 2/sup n/+1). The 2n-bit adder based converter is faster and requires about half of the hardware required by previous methods. For n-bit adder based implementations, one new converter is twice as fast as the previous method using similar amount of hardware; while another new converter achieves improvement in both speed and area.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"443 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122934785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665342
D. Hertweck, Mihaela Nica, Sangeon Park, C. Purdy
Because so many important problems arising in VLSI design are NP-hard, VLSI algorithms must employ randomization techniques or heuristics. Thus the process of analyzing a new algorithm or of comparing two algorithms is at present an experimental one. Consequently, progress in VLSI algorithm development must be based on references to standard benchmarks. Yet examination of literature on specific problems, such as graph partitioning, shows that such standardization is not yet a reality. Here we describe a system, Circuitbase, which we are developing to address the standardization problem. Circuitbase will combine the extensive graph manipulation routines of Knuth's Stanford Graphbase package with actual circuit examples from the Benchmark Archives at CBL, standard routines for generating random examples of circuits, and standard methods for algorithm analysis. We describe Circuitbase versions of example behavioral, structural, and physical views of a VLSI circuit and discuss how Circuitbase can support modern VLSI design environments.
{"title":"Standard data representations for VLSI algorithm development","authors":"D. Hertweck, Mihaela Nica, Sangeon Park, C. Purdy","doi":"10.1109/GLSV.1998.665342","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665342","url":null,"abstract":"Because so many important problems arising in VLSI design are NP-hard, VLSI algorithms must employ randomization techniques or heuristics. Thus the process of analyzing a new algorithm or of comparing two algorithms is at present an experimental one. Consequently, progress in VLSI algorithm development must be based on references to standard benchmarks. Yet examination of literature on specific problems, such as graph partitioning, shows that such standardization is not yet a reality. Here we describe a system, Circuitbase, which we are developing to address the standardization problem. Circuitbase will combine the extensive graph manipulation routines of Knuth's Stanford Graphbase package with actual circuit examples from the Benchmark Archives at CBL, standard routines for generating random examples of circuits, and standard methods for algorithm analysis. We describe Circuitbase versions of example behavioral, structural, and physical views of a VLSI circuit and discuss how Circuitbase can support modern VLSI design environments.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133342858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665222
Inseop Lee, W. Jenkins
This paper presents the design details of an experimental ASIC for an all-digital adaptive equalizer. In this design, the LMS algorithm is chosen because of its simplicity. The adaptive equalizer design, which is based on an RNS architecture, consists of an RNS multiplier, an RNS adder, an RNS filter, a binary-to-residue converter, a residue-to-binary converter, and an update algorithm. The design is verified by a high level hardware simulation tool. The designs of all these units are discussed in this paper.
{"title":"The design of residue number system arithmetic units for a VLSI adaptive equalizer","authors":"Inseop Lee, W. Jenkins","doi":"10.1109/GLSV.1998.665222","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665222","url":null,"abstract":"This paper presents the design details of an experimental ASIC for an all-digital adaptive equalizer. In this design, the LMS algorithm is chosen because of its simplicity. The adaptive equalizer design, which is based on an RNS architecture, consists of an RNS multiplier, an RNS adder, an RNS filter, a binary-to-residue converter, a residue-to-binary converter, and an update algorithm. The design is verified by a high level hardware simulation tool. The designs of all these units are discussed in this paper.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-02-19DOI: 10.1109/GLSV.1998.665192
A. Fahim, M. Khellah, M. Elmasry
A new approach to modeling the decoding hierarchy in a hierarchical word line (HWL) SRAM architecture using integer-linear programming (ILP) is introduced. Using this approach, the HWL architecture is shown to be inadequate for very large SRAM sizes. Alternatively, a new low-power high-speed SRAM architecture is described. This architecture is shown to have fairly constant speed and power dissipation for sizes ranging between 32 kb to 4 Mb. Low-power is achieved by a voltage boosting technique not requiring a two-step voltage and by a new method of tristating memory cells during a write operation. The SRAM was implemented in a 0.35 /spl mu/m CMOS technology operated at 150 MHz while dissipating only 10 mW.
{"title":"A low-power high-performance embedded SRAM macrocell","authors":"A. Fahim, M. Khellah, M. Elmasry","doi":"10.1109/GLSV.1998.665192","DOIUrl":"https://doi.org/10.1109/GLSV.1998.665192","url":null,"abstract":"A new approach to modeling the decoding hierarchy in a hierarchical word line (HWL) SRAM architecture using integer-linear programming (ILP) is introduced. Using this approach, the HWL architecture is shown to be inadequate for very large SRAM sizes. Alternatively, a new low-power high-speed SRAM architecture is described. This architecture is shown to have fairly constant speed and power dissipation for sizes ranging between 32 kb to 4 Mb. Low-power is achieved by a voltage boosting technique not requiring a two-step voltage and by a new method of tristating memory cells during a write operation. The SRAM was implemented in a 0.35 /spl mu/m CMOS technology operated at 150 MHz while dissipating only 10 mW.","PeriodicalId":225107,"journal":{"name":"Proceedings of the 8th Great Lakes Symposium on VLSI (Cat. No.98TB100222)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132180305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}