Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528909
H. Yamada, T. Hotta, T. Nishiyama, F. Murabayashi, T. Yamauchi, H. Sawamoto
One-bit pre-shifting before alignment shift, normalization with anticipated leading '1' bit and pre-rounding techniques have been developed for a floating-point arithmetic logic unit (ALU). In addition, carry select addition and pre-rounding techniques have been developed for a floating-point multiplier. A noise tolerant precharge (NTP) circuit was designed and applied to the ALU and multiplier. These techniques reduced the delay time of the critical path by 24%. Each unit was fabricated in 0.3 /spl mu/m 2.5 V four-layer-metal CMOS technology and achieved a two-cycle latency at 150 MHz.
{"title":"A 13.3ns double-precision floating-point ALU and multiplier","authors":"H. Yamada, T. Hotta, T. Nishiyama, F. Murabayashi, T. Yamauchi, H. Sawamoto","doi":"10.1109/ICCD.1995.528909","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528909","url":null,"abstract":"One-bit pre-shifting before alignment shift, normalization with anticipated leading '1' bit and pre-rounding techniques have been developed for a floating-point arithmetic logic unit (ALU). In addition, carry select addition and pre-rounding techniques have been developed for a floating-point multiplier. A noise tolerant precharge (NTP) circuit was designed and applied to the ALU and multiplier. These techniques reduced the delay time of the critical path by 24%. Each unit was fabricated in 0.3 /spl mu/m 2.5 V four-layer-metal CMOS technology and achieved a two-cycle latency at 150 MHz.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125698159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528804
J. Kumar, N. Strader, J. Freeman, Michael Miller
Large-scale hardware logic emulation using software configurable hardware provides a new means to significantly improve verification of complex integrated circuits such as today's advanced microprocessors. The essence of hardware logic emulation is the provision of a hardware prototype of the circuit being designed. Such a hardware prototype can execute both pseudo-random verification vectors and software application programs up to six orders-of-magnitude faster than conventional software logic simulators. Trillions of verification vectors can be run on the emulation model for verification in only a few weeks compared to the prior best practice of running only billions of verification vectors in many months. Application of hardware logic emulation requires a sound design methodology with an HDL model (RTL or at least gate-level), an unlimited source of vectors or software applications intended to exercise the design in a target system.
{"title":"Emulation verification of the Motorola 68060","authors":"J. Kumar, N. Strader, J. Freeman, Michael Miller","doi":"10.1109/ICCD.1995.528804","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528804","url":null,"abstract":"Large-scale hardware logic emulation using software configurable hardware provides a new means to significantly improve verification of complex integrated circuits such as today's advanced microprocessors. The essence of hardware logic emulation is the provision of a hardware prototype of the circuit being designed. Such a hardware prototype can execute both pseudo-random verification vectors and software application programs up to six orders-of-magnitude faster than conventional software logic simulators. Trillions of verification vectors can be run on the emulation model for verification in only a few weeks compared to the prior best practice of running only billions of verification vectors in many months. Application of hardware logic emulation requires a sound design methodology with an HDL model (RTL or at least gate-level), an unlimited source of vectors or software applications intended to exercise the design in a target system.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132861093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528943
C. Ykman-Couvreur, Bill Lin
This paper presents a new efficient state assignment framework for synthesizing asynchronous state graphs. This framework operates purely at the state graph level and is applicable to a broad class of behaviors. In this paper we focus the framework for solving the complete state coding problem. This method has been automated and applied to a large set of asynchronous circuits. It achieves significant improvements in terms of both circuit area and computation time.
{"title":"Efficient state assignment framework for asynchronous state graphs","authors":"C. Ykman-Couvreur, Bill Lin","doi":"10.1109/ICCD.1995.528943","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528943","url":null,"abstract":"This paper presents a new efficient state assignment framework for synthesizing asynchronous state graphs. This framework operates purely at the state graph level and is applicable to a broad class of behaviors. In this paper we focus the framework for solving the complete state coding problem. This method has been automated and applied to a large set of asynchronous circuits. It achieves significant improvements in terms of both circuit area and computation time.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133571029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528914
S. Dutta, W. Wolf, A. Wolfe
This paper addresses the design of memory-system architectures for video signal processors. The memory subsystem is the bottleneck of most video computing systems and demands a careful analysis of the design tradeoffs related to area, cycle time, and utilization. We emphasize the need to consider technological and circuit-level issues during the design of a system architecture, particularly that of a video processor, and present a method whereby the conceptual organization of the memory architecture can be evaluated before a detailed design is undertaken. Our analysis suggests that the organization of an efficient memory hierarchy for video signal processors is different from the register-cache based hierarchy of general-purpose programmable microprocessors.
{"title":"VLSI issues in memory-system design for video signal processors","authors":"S. Dutta, W. Wolf, A. Wolfe","doi":"10.1109/ICCD.1995.528914","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528914","url":null,"abstract":"This paper addresses the design of memory-system architectures for video signal processors. The memory subsystem is the bottleneck of most video computing systems and demands a careful analysis of the design tradeoffs related to area, cycle time, and utilization. We emphasize the need to consider technological and circuit-level issues during the design of a system architecture, particularly that of a video processor, and present a method whereby the conceptual organization of the memory architecture can be evaluated before a detailed design is undertaken. Our analysis suggests that the organization of an efficient memory hierarchy for video signal processors is different from the register-cache based hierarchy of general-purpose programmable microprocessors.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130014827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528838
C. Wey, Haiyan Wang, Cheng-Ping Wang
This paper presents a self-timed converter circuit which converts an n-digit redundant binary number to an (n+1)-bit binary number. Self-timed refers to the fact that the conversion is problem-dependent and requires variable conversion time to complete the operation. The propagation delay of the proposed converter circuit does not increase with the number of digits to be converted, but it is determined by the maximum number of consecutive 0's in that number. This study shows that the statistical upper bound of the average maximum number of consecutive 0's is log/sub 3/n, or 3.78 for 64-digits. This implies that the proposed self-time circuit can be approximately 17 times faster than the ripple-type converter. Thus the proposed converter is well-suited to high-speed, long-word digital arithmetic processors.
{"title":"A self-timed redundant-binary number to binary number converter for digital arithmetic processors","authors":"C. Wey, Haiyan Wang, Cheng-Ping Wang","doi":"10.1109/ICCD.1995.528838","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528838","url":null,"abstract":"This paper presents a self-timed converter circuit which converts an n-digit redundant binary number to an (n+1)-bit binary number. Self-timed refers to the fact that the conversion is problem-dependent and requires variable conversion time to complete the operation. The propagation delay of the proposed converter circuit does not increase with the number of digits to be converted, but it is determined by the maximum number of consecutive 0's in that number. This study shows that the statistical upper bound of the average maximum number of consecutive 0's is log/sub 3/n, or 3.78 for 64-digits. This implies that the proposed self-time circuit can be approximately 17 times faster than the ripple-type converter. Thus the proposed converter is well-suited to high-speed, long-word digital arithmetic processors.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125095812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528816
Jin-Tai Yan
In this paper, we firstly propose a k-way connection-oriented net model, chain net model, to generalize the cut analysis for k-way circuit partitioning and to reduce the complexity of edges for the representation of a multiple-pin net between the transformation of a hypergraph and an edge-weighted graph. Furthermore, based on the techniques of fuzzy c-means clustering, we develop and propose fuzzy c-means graph clustering to obtain k groups of fuzzy memberships for the vertices in the mapped graph according to the global information of all the net connections. Finally, by the area information of any cell in the circuit netlist, these k groups of fuzzy memberships will lead to a cut-driven or balance-driven k-way circuit partitioning. As a result, k-way circuit partitioning has been implemented for testing MCNC circuit benchmarks and the experimental results show that the proposed partitioning approach generates effective results on the partitioning cut and the partitioning balance for these benchmarks.
{"title":"Connection-oriented net model and fuzzy clustering techniques for K-way circuit partitioning","authors":"Jin-Tai Yan","doi":"10.1109/ICCD.1995.528816","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528816","url":null,"abstract":"In this paper, we firstly propose a k-way connection-oriented net model, chain net model, to generalize the cut analysis for k-way circuit partitioning and to reduce the complexity of edges for the representation of a multiple-pin net between the transformation of a hypergraph and an edge-weighted graph. Furthermore, based on the techniques of fuzzy c-means clustering, we develop and propose fuzzy c-means graph clustering to obtain k groups of fuzzy memberships for the vertices in the mapped graph according to the global information of all the net connections. Finally, by the area information of any cell in the circuit netlist, these k groups of fuzzy memberships will lead to a cut-driven or balance-driven k-way circuit partitioning. As a result, k-way circuit partitioning has been implemented for testing MCNC circuit benchmarks and the experimental results show that the proposed partitioning approach generates effective results on the partitioning cut and the partitioning balance for these benchmarks.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114136066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528803
J. Adams, J. Miller, D. E. Thomas
This paper presents a technique for back-annotating the results of high-level synthesis into the source description to produce a timing-accurate behavioral simulation model. The resulting simulation model exhibits the same cycle-by-cycle behavior as a register-transfer level model, but can be simulated in a fraction of the time. This idea has analogies both to software profiling and to back-annotation at lower levels of hardware design. Experimental results demonstrate that the annotated behavioral simulation models run two to three orders of magnitude faster than register-transfer level simulation models, and only about an order of magnitude slower than behavioral models with no timing information.
{"title":"Execution-time profiling for multiple-process behavioral synthesis","authors":"J. Adams, J. Miller, D. E. Thomas","doi":"10.1109/ICCD.1995.528803","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528803","url":null,"abstract":"This paper presents a technique for back-annotating the results of high-level synthesis into the source description to produce a timing-accurate behavioral simulation model. The resulting simulation model exhibits the same cycle-by-cycle behavior as a register-transfer level model, but can be simulated in a fraction of the time. This idea has analogies both to software profiling and to back-annotation at lower levels of hardware design. Experimental results demonstrate that the annotated behavioral simulation models run two to three orders of magnitude faster than register-transfer level simulation models, and only about an order of magnitude slower than behavioral models with no timing information.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"341 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124214972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528819
G. Cabodi, S. Quer, P. Camurati
Computing equivalence classes for finite state machines (FSMs) has several applications to synthesis and verification problems, like state minimization, automata reduction, and logic optimization with don't cares. Symbolic traversal techniques are applicable to medium-small circuits. This paper extends their use to large FSMs by means of cofactor-based enhancements to the state-of-the-art approaches and of underestimations of equivalence classes. The key to success is pruning the search space by constraining it. Experimental results on some of the larger ISCAS'89 and MCNC circuits show its applicability.
{"title":"Extending equivalence class computation to large FSMs","authors":"G. Cabodi, S. Quer, P. Camurati","doi":"10.1109/ICCD.1995.528819","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528819","url":null,"abstract":"Computing equivalence classes for finite state machines (FSMs) has several applications to synthesis and verification problems, like state minimization, automata reduction, and logic optimization with don't cares. Symbolic traversal techniques are applicable to medium-small circuits. This paper extends their use to large FSMs by means of cofactor-based enhancements to the state-of-the-art approaches and of underestimations of equivalence classes. The key to success is pruning the search space by constraining it. Experimental results on some of the larger ISCAS'89 and MCNC circuits show its applicability.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124297506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528928
G. Swamy, R. Brayton, V. Singhal
Computing the set of reachable states of a finite state machine, is an important component of many problems in the synthesis and formal verification of digital systems. The process of design is usually iterative, and the designer may modify and recompute information many times, and reachability is called each time the designer modifies the system because current methods for reachability analysis are not incremental. Unfortunately, the representation of the reachable states that is currently used in synthesis and verification, is inherently non updatable (O. Coudert and J.C. Madre, 1990). We solve this problem by presenting alternate ways to represent the reachable set, and incremental algorithms that can update the new representation each time the designer changes the system. The incremental algorithms use the reachable set computed at a previous iteration, and information about the changes to the system to update it, rather than compute the reachable set from the beginning. This results in computational savings, as demonstrated by the results.
{"title":"Incremental methods for FSM traversal","authors":"G. Swamy, R. Brayton, V. Singhal","doi":"10.1109/ICCD.1995.528928","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528928","url":null,"abstract":"Computing the set of reachable states of a finite state machine, is an important component of many problems in the synthesis and formal verification of digital systems. The process of design is usually iterative, and the designer may modify and recompute information many times, and reachability is called each time the designer modifies the system because current methods for reachability analysis are not incremental. Unfortunately, the representation of the reachable states that is currently used in synthesis and verification, is inherently non updatable (O. Coudert and J.C. Madre, 1990). We solve this problem by presenting alternate ways to represent the reachable set, and incremental algorithms that can update the new representation each time the designer changes the system. The incremental algorithms use the reachable set computed at a previous iteration, and information about the changes to the system to update it, rather than compute the reachable set from the beginning. This results in computational savings, as demonstrated by the results.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129137931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-02DOI: 10.1109/ICCD.1995.528931
M. Amin, B. Vinnakota
Fault simulation is a compute intensive problem. Data parallel simulation on multiple processors is one method to reduce fault simulation time. We discuss a novel technique to partition the fault set for data parallel fault simulation. When applied statically, the technique can scale well for up to eight processors. The fault set partitioning technique is simple, can itself be parallelized, and can be implemented with extreme ease. Therefore, the technique can be used on a low cost parallel resource, such as a network of workstations.
{"title":"Data parallel fault simulation","authors":"M. Amin, B. Vinnakota","doi":"10.1109/ICCD.1995.528931","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528931","url":null,"abstract":"Fault simulation is a compute intensive problem. Data parallel simulation on multiple processors is one method to reduce fault simulation time. We discuss a novel technique to partition the fault set for data parallel fault simulation. When applied statically, the technique can scale well for up to eight processors. The fault set partitioning technique is simple, can itself be parallelized, and can be implemented with extreme ease. Therefore, the technique can be used on a low cost parallel resource, such as a network of workstations.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116318113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}