Deep sub-micron processing technologies have enabled the implementation of new application-specific embedded architectures that integrate multiple software programmable processors (e.g. DSPs, microcontrollers) and dedicated hardware components together onto a single cost-efficient IC. These application-specific architectures are emerging as a key design solution to today's microelectronics design problems, which are being driven by emerging applications in the areas of wireless communication, broadband networking, and multimedia computing. However the construction of these customized heterogeneous multiprocessor architectures, while ensuring that the hardware and software parts communicate correctly, is a tremendously difficult and highly error proned task with little or no tool support. In this paper, we present a solution to this embedded architecture co-synthesis problem based on an orchestrated combination of architectural strategies, parameterized libraries, and software tool support.
{"title":"Constructing application-specific heterogeneous embedded architectures from custom HW/SW applications","authors":"S. Vercauteren, Bill Lin, H. Man","doi":"10.1145/240518.240617","DOIUrl":"https://doi.org/10.1145/240518.240617","url":null,"abstract":"Deep sub-micron processing technologies have enabled the implementation of new application-specific embedded architectures that integrate multiple software programmable processors (e.g. DSPs, microcontrollers) and dedicated hardware components together onto a single cost-efficient IC. These application-specific architectures are emerging as a key design solution to today's microelectronics design problems, which are being driven by emerging applications in the areas of wireless communication, broadband networking, and multimedia computing. However the construction of these customized heterogeneous multiprocessor architectures, while ensuring that the hardware and software parts communicate correctly, is a tremendously difficult and highly error proned task with little or no tool support. In this paper, we present a solution to this embedded architecture co-synthesis problem based on an orchestrated combination of architectural strategies, parameterized libraries, and software tool support.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133818480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past several years there has been a great deal of interest in the design of mixed hardware/software systems, sometimes referred to as hardware/software co-design or hardware/software co-synthesis. However, although many new design methodologies have taken the name hardware/software co-design, they often do not seem to share much in common with one another This partly due to the fact that the problem itself has so many dimensions. This tutorial describes a set of criteria that can be used to compare differing approaches to hardware/software co-design. These criteria are used in the discussion of a number of published hardware/software co-design techniques to illustrate how a wide range of approaches can be viewed within a single framework.
{"title":"The design of mixed hardware/software systems","authors":"J. Adams, D. E. Thomas","doi":"10.1145/240518.240616","DOIUrl":"https://doi.org/10.1145/240518.240616","url":null,"abstract":"Over the past several years there has been a great deal of interest in the design of mixed hardware/software systems, sometimes referred to as hardware/software co-design or hardware/software co-synthesis. However, although many new design methodologies have taken the name hardware/software co-design, they often do not seem to share much in common with one another This partly due to the fact that the problem itself has so many dimensions. This tutorial describes a set of criteria that can be used to compare differing approaches to hardware/software co-design. These criteria are used in the discussion of a number of published hardware/software co-design techniques to illustrate how a wide range of approaches can be viewed within a single framework.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116662045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To reduce the number of synthesis and layout iterations, we present a new delay optimization technique, which inserts buffers based on back-annotated detailed routing information. During optimization, inserted buffers are assumed to be placed on the appropriate location of original wires so as to calculate accurate wire RC delay. With forward annotated location information of inserted buffers, the layout system attempts to preserve patterns of original wires using the ECO technique. Our experimental results show that this technique combined with the conventional gate sizing technique achieves up to 41.2% delay reduction after the initial layout.
{"title":"Post-layout optimization for deep submicron design","authors":"Koichi Sato, M. Kawarabayashi, H. Emura, N. Maeda","doi":"10.1109/DAC.1996.545671","DOIUrl":"https://doi.org/10.1109/DAC.1996.545671","url":null,"abstract":"To reduce the number of synthesis and layout iterations, we present a new delay optimization technique, which inserts buffers based on back-annotated detailed routing information. During optimization, inserted buffers are assumed to be placed on the appropriate location of original wires so as to calculate accurate wire RC delay. With forward annotated location information of inserted buffers, the layout system attempts to preserve patterns of original wires using the ECO technique. Our experimental results show that this technique combined with the conventional gate sizing technique achieves up to 41.2% delay reduction after the initial layout.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114789048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an efficient method for the timing verification of concurrent systems, modeled as labeled Timed Petri nets. The verification problems we consider require us to analyze the system's reachable behaviors under the specified time delays. Our geometric timing analysis algorithm improves over existing ones by enumerating the state space only partially. The algorithm relies on a concept called pre-mature firing and a new, extended notion of clocks with a negative age. We have tested the fully automated procedure on a number of examples. Experimental results obtained on highly concurrent Petri nets with more than 6000 nodes and 10/sup 210/ reachable states show that the proposed method can drastically reduce computational cost.
{"title":"Efficient partial enumeration for timing analysis of asynchronous systems","authors":"E. Verlind, G. D. Jong, Bill Lin","doi":"10.1109/DAC.1996.545545","DOIUrl":"https://doi.org/10.1109/DAC.1996.545545","url":null,"abstract":"This paper presents an efficient method for the timing verification of concurrent systems, modeled as labeled Timed Petri nets. The verification problems we consider require us to analyze the system's reachable behaviors under the specified time delays. Our geometric timing analysis algorithm improves over existing ones by enumerating the state space only partially. The algorithm relies on a concept called pre-mature firing and a new, extended notion of clocks with a negative age. We have tested the fully automated procedure on a number of examples. Experimental results obtained on highly concurrent Petri nets with more than 6000 nodes and 10/sup 210/ reachable states show that the proposed method can drastically reduce computational cost.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125901356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents our experience on domain-specific high-level modeling and synthesis for Fujitsu ATM switch design. We propose a high-level design methodology using VHDL, where ATM switch architectural features are considered during behavior modeling, and a high-level synthesis compiler, MEBS, is prototyped to synthesize the behavior model down to a gate-level implementation. Since the specific ATM switch architecture is incorporated into both modeling and syntheses phases, a high-quality design is efficiently derived. The synthesis results show that given the design constraints, the proposed high-level design methodology can produce a gate-level implementation by MEBS with about 15% area reduction in shorter design cycle when compared with manual design.
{"title":"Domain-specific high-level modeling and synthesis for ATM switch design using VHDL","authors":"M. Lee, Y. Hsu, Ben Chen, M. Fujita","doi":"10.1109/DAC.1996.545643","DOIUrl":"https://doi.org/10.1109/DAC.1996.545643","url":null,"abstract":"This paper presents our experience on domain-specific high-level modeling and synthesis for Fujitsu ATM switch design. We propose a high-level design methodology using VHDL, where ATM switch architectural features are considered during behavior modeling, and a high-level synthesis compiler, MEBS, is prototyped to synthesize the behavior model down to a gate-level implementation. Since the specific ATM switch architecture is incorporated into both modeling and syntheses phases, a high-quality design is efficiently derived. The synthesis results show that given the design constraints, the proposed high-level design methodology can produce a gate-level implementation by MEBS with about 15% area reduction in shorter design cycle when compared with manual design.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126385682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Sanghavi, R. Ranjan, R. Brayton, A. Sangiovanni-Vincentelli
The success of binary decision diagram (BDD) based algorithms for verification depend on the availability of a high performance package to manipulate very large BDDs. State-of-the-art BDD packages, based on the conventional depth-first technique, limit the size of the BDDs due to a disorderly memory access patterns that results in unacceptably high elapsed time when the BDD size exceeds the main memory capacity. We present a high performance BDD package that enables manipulation of very large BDDs by using an iterative breadth-first technique directed towards localizing the memory accesses to exploit the memory system hierarchy. The new memory-oriented performance features of this package are: 1) an architecture independent customized memory management scheme, 2) the ability to issue multiple independent BDD operations (superscalarity), and 3) the ability to perform multiple BDD operations even when the operands of some BDD operations are the result of some other operations yet to be completed (pipelining). A comprehensive set of BDD manipulation algorithms are implemented using the above techniques. Unlike the breadth-first algorithms presented in the literature, the new package is faster than the state-of-the-art BDD package by a factor of up to 15, even for the BDD sizes that fit within the main memory. For BDD sizes that do not fit within the main memory, a performance improvement of up to a factor of 100 can be achieved.
{"title":"High performance BDD package by exploiting memory hierarchy","authors":"J. Sanghavi, R. Ranjan, R. Brayton, A. Sangiovanni-Vincentelli","doi":"10.1109/DAC.1996.545652","DOIUrl":"https://doi.org/10.1109/DAC.1996.545652","url":null,"abstract":"The success of binary decision diagram (BDD) based algorithms for verification depend on the availability of a high performance package to manipulate very large BDDs. State-of-the-art BDD packages, based on the conventional depth-first technique, limit the size of the BDDs due to a disorderly memory access patterns that results in unacceptably high elapsed time when the BDD size exceeds the main memory capacity. We present a high performance BDD package that enables manipulation of very large BDDs by using an iterative breadth-first technique directed towards localizing the memory accesses to exploit the memory system hierarchy. The new memory-oriented performance features of this package are: 1) an architecture independent customized memory management scheme, 2) the ability to issue multiple independent BDD operations (superscalarity), and 3) the ability to perform multiple BDD operations even when the operands of some BDD operations are the result of some other operations yet to be completed (pipelining). A comprehensive set of BDD manipulation algorithms are implemented using the above techniques. Unlike the breadth-first algorithms presented in the literature, the new package is faster than the state-of-the-art BDD package by a factor of up to 15, even for the BDD sizes that fit within the main memory. For BDD sizes that do not fit within the main memory, a performance improvement of up to a factor of 100 can be achieved.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128205969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we examine the problem of reducing the power consumption of a technology mapped circuit under timing constraints. Consider a cell library that contains multiple implementations (cells) of the same Boolean function. We first present an exact algorithm for the problem when a complete library is given in a complete library, "all" implementations of each cell are present. We then propose an efficient algorithm for the problem if the provided library is not complete.
{"title":"An exact algorithm for low power library-specific gate re-sizing","authors":"De-Sheng Chen, M. Sarrafzadeh","doi":"10.1109/DAC.1996.545678","DOIUrl":"https://doi.org/10.1109/DAC.1996.545678","url":null,"abstract":"In this paper we examine the problem of reducing the power consumption of a technology mapped circuit under timing constraints. Consider a cell library that contains multiple implementations (cells) of the same Boolean function. We first present an exact algorithm for the problem when a complete library is given in a complete library, \"all\" implementations of each cell are present. We then propose an efficient algorithm for the problem if the provided library is not complete.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130408965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We reduce the state explosion problem in automatic verification of finite-state systems by automatically collapsing subgraphs of the state graph into abstract states. The key idea of the method is to identify state generation rules that can be inverted. It can be used for verification of deadlock-freedom, error and invariant checking and stuttering-invariant CTL model checking.
{"title":"State reduction using reversible rules","authors":"C. N. Ip, D. Dill","doi":"10.1145/240518.240625","DOIUrl":"https://doi.org/10.1145/240518.240625","url":null,"abstract":"We reduce the state explosion problem in automatic verification of finite-state systems by automatically collapsing subgraphs of the state graph into abstract states. The key idea of the method is to identify state generation rules that can be inverted. It can be used for verification of deadlock-freedom, error and invariant checking and stuttering-invariant CTL model checking.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130466412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new technique for obtaining a DC operating point of large, hard-to-solve MOS circuits is reported in this paper. Based on homotopy, the technique relies on the provable global convergence of arclength continuation and uses a novel method for embedding the continuation parameter into MOS devices. The new embedding circumvents inefficiencies and numerical failures that limit the practical applicability of previous simpler embeddings. Use of the technique in a production environment has led to the routine solution of large, previously hard-to-solve circuits.
{"title":"Homotopy techniques for obtaining a DC solution of large-scale MOS circuits","authors":"J. Roychowdhury, R. Melville","doi":"10.1109/DAC.1996.545588","DOIUrl":"https://doi.org/10.1109/DAC.1996.545588","url":null,"abstract":"A new technique for obtaining a DC operating point of large, hard-to-solve MOS circuits is reported in this paper. Based on homotopy, the technique relies on the provable global convergence of arclength continuation and uses a novel method for embedding the continuation parameter into MOS devices. The new embedding circumvents inefficiencies and numerical failures that limit the practical applicability of previous simpler embeddings. Use of the technique in a production environment has led to the routine solution of large, previously hard-to-solve circuits.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114991616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present design-for-low-power techniques based on glitch reduction for register-transfer level circuits. We analyze the generation and propagation of glitches in both the control and data path parts of the circuit. Based on the analysis, we develop techniques that attempt to reduce glitching power consumption by minimizing generation and propagation of glitches in the RTL circuit. Our techniques include restructuring multiplexer networks (to enhance data correlations, eliminate glitchy control signals, and reduce glitches on data signals), clocking control signals, and inserting selective rising/falling delays. Our techniques are suited to control-flow intensive designs,where glitches generated at control signals have a significant impact on the circuit's power consumption, and multiplexers and registers often account for a major portion of the total power. Application of the proposed techniques to several examples shows significant power savings, with negligible area and delay overheads.
{"title":"Glitch analysis and reduction in register transfer level power optimization","authors":"A. Raghunathan, S. Dey, N. Jha","doi":"10.1109/DAC.1996.545596","DOIUrl":"https://doi.org/10.1109/DAC.1996.545596","url":null,"abstract":"We present design-for-low-power techniques based on glitch reduction for register-transfer level circuits. We analyze the generation and propagation of glitches in both the control and data path parts of the circuit. Based on the analysis, we develop techniques that attempt to reduce glitching power consumption by minimizing generation and propagation of glitches in the RTL circuit. Our techniques include restructuring multiplexer networks (to enhance data correlations, eliminate glitchy control signals, and reduce glitches on data signals), clocking control signals, and inserting selective rising/falling delays. Our techniques are suited to control-flow intensive designs,where glitches generated at control signals have a significant impact on the circuit's power consumption, and multiplexers and registers often account for a major portion of the total power. Application of the proposed techniques to several examples shows significant power savings, with negligible area and delay overheads.","PeriodicalId":152966,"journal":{"name":"33rd Design Automation Conference Proceedings, 1996","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125751955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}