Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528802
A. Devgan
Accuracy of a transient simulator is critically dependent on its device models, and device model evaluation is often a bottleneck in transient simulation performance. This paper presents comprehensive modeling techniques to compute Fast-to-evaluate and Accurate Simplified Transistor (FAST) models for aggressive MOS technologies. These FAST models accurately capture the static and dynamic behavior of the transistor, and lend themselves to efficient transient simulation. Use of FAST models in timing simulator AGES leads to speedups of 1000/spl times/ or more over traditional circuit simulators with little or no loss in circuit timing accuracy.
{"title":"Accurate device modeling techniques for efficient timing simulation of integrated circuits","authors":"A. Devgan","doi":"10.1109/ICCD.1995.528802","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528802","url":null,"abstract":"Accuracy of a transient simulator is critically dependent on its device models, and device model evaluation is often a bottleneck in transient simulation performance. This paper presents comprehensive modeling techniques to compute Fast-to-evaluate and Accurate Simplified Transistor (FAST) models for aggressive MOS technologies. These FAST models accurately capture the static and dynamic behavior of the transistor, and lend themselves to efficient transient simulation. Use of FAST models in timing simulator AGES leads to speedups of 1000/spl times/ or more over traditional circuit simulators with little or no loss in circuit timing accuracy.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122255763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528807
Indradeep Ghosh, A. Raghunathan, N. Jha
Most behavioral synthesis and design for testability techniques target subsequent gate-level sequential test generation, which is frequently incapable of handling complex controller/data path circuits with large data path bit-widths. Hierarchical testing attempts to counter the complexity of test generation by exploiting information from multiple levels of the design hierarchy. We present techniques that add minimal test hardware to the given register-transfer level (RTL) design obtained through behavioral synthesis in order to ensure that all the embedded modules in the circuit are hierarchically testable. An important by-product of our DFT procedure is a system-level test set that is guaranteed to deliver pre-computed module test sets to each module in the RTL circuit. This eliminates the need to apply gate-level sequential test generation to the controller/data path. We performed extensive experiments with several complex data path/controller circuits synthesized by two different high level synthesis systems which do not target testability.
{"title":"Design for hierarchical testability of RTL circuits obtained by behavioral synthesis","authors":"Indradeep Ghosh, A. Raghunathan, N. Jha","doi":"10.1109/ICCD.1995.528807","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528807","url":null,"abstract":"Most behavioral synthesis and design for testability techniques target subsequent gate-level sequential test generation, which is frequently incapable of handling complex controller/data path circuits with large data path bit-widths. Hierarchical testing attempts to counter the complexity of test generation by exploiting information from multiple levels of the design hierarchy. We present techniques that add minimal test hardware to the given register-transfer level (RTL) design obtained through behavioral synthesis in order to ensure that all the embedded modules in the circuit are hierarchically testable. An important by-product of our DFT procedure is a system-level test set that is guaranteed to deliver pre-computed module test sets to each module in the RTL circuit. This eliminates the need to apply gate-level sequential test generation to the controller/data path. We performed extensive experiments with several complex data path/controller circuits synthesized by two different high level synthesis systems which do not target testability.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130101247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528818
T. Kam, T. Villa, R. Brayton, A. Sangiovanni-Vincentelli
This paper addresses state minimization problems of different classes of non-deterministic finite state machines (NDFSMs). We present a theoretical solution to the problem of exact state minimization of general NDFSMs, based on the proposal of generalized compatibles. This gives an algorithmic frame to explore behaviors contained in a general NDFSM. Then we describe a fully implicit algorithm for state minimization of pseudo non-deterministic FSMs (PNDFSMs). The results of our implementation are reported and shown to be superior to a previous explicit formulation. We could solve exactly all but one problem of a published benchmark, while an explicit program could complete approximately one half of the examples, and in those cases with longer run times.
{"title":"Implicit state minimization of non-deterministic FSMs","authors":"T. Kam, T. Villa, R. Brayton, A. Sangiovanni-Vincentelli","doi":"10.1109/ICCD.1995.528818","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528818","url":null,"abstract":"This paper addresses state minimization problems of different classes of non-deterministic finite state machines (NDFSMs). We present a theoretical solution to the problem of exact state minimization of general NDFSMs, based on the proposal of generalized compatibles. This gives an algorithmic frame to explore behaviors contained in a general NDFSM. Then we describe a fully implicit algorithm for state minimization of pseudo non-deterministic FSMs (PNDFSMs). The results of our implementation are reported and shown to be superior to a previous explicit formulation. We could solve exactly all but one problem of a published benchmark, while an explicit program could complete approximately one half of the examples, and in those cases with longer run times.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128061861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528826
R. Yung, Neil C. Wilhelm
VLIW, multi-context, or windowed-register architectures may require one hundred or more processor registers. It can be difficult to design a register file with so many registers that meets processor cycle time requirements. We propose to resolve this problem by taking advantage of register values that are bypassed within a processor's pipeline, and supplementing the bypassed values with values supplied by a small register cache. If the register cache is sufficiently small then it can be designed to meet a fast target cycle time. We call this combination of bypassing and register caching the register scoreboard and cache. We develop a simple performance model and show by simulations that it can be effective for windowed-register architectures.
{"title":"Caching processor general registers","authors":"R. Yung, Neil C. Wilhelm","doi":"10.1109/ICCD.1995.528826","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528826","url":null,"abstract":"VLIW, multi-context, or windowed-register architectures may require one hundred or more processor registers. It can be difficult to design a register file with so many registers that meets processor cycle time requirements. We propose to resolve this problem by taking advantage of register values that are bypassed within a processor's pipeline, and supplementing the bypassed values with values supplied by a small register cache. If the register cache is sufficiently small then it can be designed to meet a fast target cycle time. We call this combination of bypassing and register caching the register scoreboard and cache. We develop a simple performance model and show by simulations that it can be effective for windowed-register architectures.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"68 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117233972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528924
Menghui Zheng, A. Albicki
A low power multiplication algorithm and its VLSI architecture using a mixed number representation is proposed. The reduced switching activity and low power dissipation are achieved through the Sign-Magnitude (SM) notation for the multiplicand and through a novel design of the Redundant Binary (RB) adder and Booth decoder. The high speed operation is achieved through the Carry-Propagation-Free (CPF) accumulation of the Partial Products (PP) by using the RB notation. Analysis showed that the switching activity in the PP generation process can be reduced on average by 90%. Compared to the same type of multipliers, the proposed design dissipates much less power and is 18% faster on average.
{"title":"Low power and high speed multiplication design through mixed number representations","authors":"Menghui Zheng, A. Albicki","doi":"10.1109/ICCD.1995.528924","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528924","url":null,"abstract":"A low power multiplication algorithm and its VLSI architecture using a mixed number representation is proposed. The reduced switching activity and low power dissipation are achieved through the Sign-Magnitude (SM) notation for the multiplicand and through a novel design of the Redundant Binary (RB) adder and Booth decoder. The high speed operation is achieved through the Carry-Propagation-Free (CPF) accumulation of the Partial Products (PP) by using the RB notation. Analysis showed that the switching activity in the PP generation process can be reduced on average by 90%. Compared to the same type of multipliers, the proposed design dissipates much less power and is 18% faster on average.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122590139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528938
H. Vaishnav, M. Pedram
We present a cost function which can be used to minimize the routing contribution of a circuit during logic synthesis. Instead of estimating the absolute routing cost of a net, this function captures the relative routing costs of nets based on the number of terminals on the nets. Unlike the routing cost functions proposed earlier, the proposed cost function does not require layout-parameters or any tuning of the variables to achieve acceptable estimation of the routing cost. The usefulness of the proposed routing cost is verified by minimizing it during the process of logic extraction in logic synthesis, leading to an average of 10% improvement in the routing area and 8% improvement in the chip area at no performance loss.
{"title":"Logic extraction based on normalized netlengths","authors":"H. Vaishnav, M. Pedram","doi":"10.1109/ICCD.1995.528938","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528938","url":null,"abstract":"We present a cost function which can be used to minimize the routing contribution of a circuit during logic synthesis. Instead of estimating the absolute routing cost of a net, this function captures the relative routing costs of nets based on the number of terminals on the nets. Unlike the routing cost functions proposed earlier, the proposed cost function does not require layout-parameters or any tuning of the variables to achieve acceptable estimation of the routing cost. The usefulness of the proposed routing cost is verified by minimizing it during the process of logic extraction in logic synthesis, leading to an average of 10% improvement in the routing area and 8% improvement in the chip area at no performance loss.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117112544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528837
B. Stott, Dave Johnson, V. Akella
To lend additional insight into the reality of self-timed design, this paper proposes a large-scale, application specific, asynchronous design-a CCITT compatible asynchronous DCT/IDCT processor. The prototype DCT/IDCT processor uses two-phase transition signaling and a bounded delay approach to implement a modified version of Sutherland's micropipeline. The layout of the core processor was designed using standard cell and custom techniques to integrate 150,000 transistors in a 2 /spl mu/ SCMOS technology. This investigation presents the prototype DCT/IDCT processor design and the resulting measures of speed, power, and area.
{"title":"Asynchronous 2-D discrete cosine transform core processor","authors":"B. Stott, Dave Johnson, V. Akella","doi":"10.1109/ICCD.1995.528837","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528837","url":null,"abstract":"To lend additional insight into the reality of self-timed design, this paper proposes a large-scale, application specific, asynchronous design-a CCITT compatible asynchronous DCT/IDCT processor. The prototype DCT/IDCT processor uses two-phase transition signaling and a bounded delay approach to implement a modified version of Sutherland's micropipeline. The layout of the core processor was designed using standard cell and custom techniques to integrate 150,000 transistors in a 2 /spl mu/ SCMOS technology. This investigation presents the prototype DCT/IDCT processor design and the resulting measures of speed, power, and area.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117120609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528932
Steven Parkes, P. Banerjee, J. Patel
Fault simulation for sequential circuits numbers among the highly compute intensive tasks in the integrated circuit design process. In the quest for rapid design turn around, parallelization has been proposed to speed fault simulation. We introduce ProperPROOFS, a parallel extension of the PROOFS fault simulation package. ProperPROOFS exploits parallelism based on fault partitioning, incorporating static and dynamic partitioning schemes and a new asynchronous and distributed method of fault redistribution. We present results for circuits in the ISCAS-89 benchmark set across several parallel architectures. A detailed evaluation of results provides new insight into the use of fault partitioning to parallelize high performance serial fault simulation applications.
{"title":"A parallel algorithm for fault simulation based on PROOFS","authors":"Steven Parkes, P. Banerjee, J. Patel","doi":"10.1109/ICCD.1995.528932","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528932","url":null,"abstract":"Fault simulation for sequential circuits numbers among the highly compute intensive tasks in the integrated circuit design process. In the quest for rapid design turn around, parallelization has been proposed to speed fault simulation. We introduce ProperPROOFS, a parallel extension of the PROOFS fault simulation package. ProperPROOFS exploits parallelism based on fault partitioning, incorporating static and dynamic partitioning schemes and a new asynchronous and distributed method of fault redistribution. We present results for circuits in the ISCAS-89 benchmark set across several parallel architectures. A detailed evaluation of results provides new insight into the use of fault partitioning to parallelize high performance serial fault simulation applications.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131443239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528784
C. Truzzi, E. Beyne, E. Ringoot, J. Peeters
This paper describes the analysis of the propagation of digital signal on a thin-film multichip module (MCM) substrate populated with CMOS integrated circuits. Timing analyses and circuit simulations were performed during the design of an MCM consisting of 4 bare 0.7-/spl mu/m CMOS ASIC's (100 pins, 64 mm/sup 2/, standard cell technology) transmitting signals at 200 Mbit/s on a 5-layer thin-film substrate (1-by-1 inch, 2 interconnection layers). This paper addresses mainly two problems related to the design of microsystems where trade-offs must be found between high frequency and high density requirements: 1) an accurate description of the chip-to-chip, propagation of the signals, including the combined influence of active devices (drivers and receivers) and coupled, lossy interconnection lines: 2) an accurate overview of the way parameters from different domains (geometrical, electrical and technological) interact with each other and affect together the signal propagation. It is shown how the results of such analyses can help solving trade-offs between different requirements and taking decisions during the system design phase.
本文分析了数字信号在装有CMOS集成电路的薄膜多芯片模块(MCM)衬底上的传播。在设计MCM时进行了时序分析和电路仿真,该MCM由4个0.7-/spl μ m CMOS ASIC(100引脚,64 mm/sup /,标准单元技术)组成,在5层薄膜衬底(1 × 1英寸,2个互连层)上以200 Mbit/s的速度传输信号。本文主要解决与微系统设计相关的两个问题,其中必须在高频和高密度要求之间找到权衡:1)准确描述芯片到芯片的信号传播,包括有源设备(驱动器和接收器)和耦合的有损耗互连线的综合影响;2)准确概述来自不同领域(几何、电气和技术)的参数相互作用并共同影响信号传播的方式。它显示了这种分析的结果如何帮助解决不同需求之间的权衡,并在系统设计阶段做出决策。
{"title":"Signal propagation in high-speed MCM circuits","authors":"C. Truzzi, E. Beyne, E. Ringoot, J. Peeters","doi":"10.1109/ICCD.1995.528784","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528784","url":null,"abstract":"This paper describes the analysis of the propagation of digital signal on a thin-film multichip module (MCM) substrate populated with CMOS integrated circuits. Timing analyses and circuit simulations were performed during the design of an MCM consisting of 4 bare 0.7-/spl mu/m CMOS ASIC's (100 pins, 64 mm/sup 2/, standard cell technology) transmitting signals at 200 Mbit/s on a 5-layer thin-film substrate (1-by-1 inch, 2 interconnection layers). This paper addresses mainly two problems related to the design of microsystems where trade-offs must be found between high frequency and high density requirements: 1) an accurate description of the chip-to-chip, propagation of the signals, including the combined influence of active devices (drivers and receivers) and coupled, lossy interconnection lines: 2) an accurate overview of the way parameters from different domains (geometrical, electrical and technological) interact with each other and affect together the signal propagation. It is shown how the results of such analyses can help solving trade-offs between different requirements and taking decisions during the system design phase.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123485466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528839
Yao-Wen Chang, D. F. Wong, Chak-Kuen Wong
Switch modules are the most important component of the routing resources in FPGAs and FPICs. The quality of switch modules greatly affects FPGA/FPIC routing solutions. The switch-module design problem was studied by K. Zhu et al. (1993). In order to analyze the routability of designed switch modules, a heuristic algorithm based on network-flow techniques was proposed. In this paper, we mathematically show that the network-flow based algorithm has provably good performance with the bounds 5 and 5/4 away from the optima for two types of switch modules, respectively. Based on the analyses, we developed a new method for designing switch modules. Experimental results show that our designed switch modules significantly improve routability, compared with those by K. Zhu et al. Extensive experiments also show that the network-flow based algorithm is highly accurate and runs very efficiently.
{"title":"Design and analysis of FPGA/FPIC switch modules","authors":"Yao-Wen Chang, D. F. Wong, Chak-Kuen Wong","doi":"10.1109/ICCD.1995.528839","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528839","url":null,"abstract":"Switch modules are the most important component of the routing resources in FPGAs and FPICs. The quality of switch modules greatly affects FPGA/FPIC routing solutions. The switch-module design problem was studied by K. Zhu et al. (1993). In order to analyze the routability of designed switch modules, a heuristic algorithm based on network-flow techniques was proposed. In this paper, we mathematically show that the network-flow based algorithm has provably good performance with the bounds 5 and 5/4 away from the optima for two types of switch modules, respectively. Based on the analyses, we developed a new method for designing switch modules. Experimental results show that our designed switch modules significantly improve routability, compared with those by K. Zhu et al. Extensive experiments also show that the network-flow based algorithm is highly accurate and runs very efficiently.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129822868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}