Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268892
Roman L. Lysecky, F. Vahid
In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.
{"title":"A configurable logic architecture for dynamic hardware/software partitioning","authors":"Roman L. Lysecky, F. Vahid","doi":"10.1109/DATE.2004.1268892","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268892","url":null,"abstract":"In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic - an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In contrast, dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134479535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268895
P. Babighian, L. Benini, E. Macii
This paper describes a new automatic clock-gating extraction working at the RT-level. The key features of our approach are: (i) seamless merging with existing industrial design flows and commercial tools; (ii) high scalability to deal with large circuits; (iii) improved quality of results with respect to available commercial tools; (iv) smaller and well-controlled overhead in speed and area. Experimental results, on a set of industrial RTL designs, demonstrate the viability and practical impact of our approach.
{"title":"A scalable ODC-based algorithm for RTL insertion of gated clocks","authors":"P. Babighian, L. Benini, E. Macii","doi":"10.1109/DATE.2004.1268895","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268895","url":null,"abstract":"This paper describes a new automatic clock-gating extraction working at the RT-level. The key features of our approach are: (i) seamless merging with existing industrial design flows and commercial tools; (ii) high scalability to deal with large circuits; (iii) improved quality of results with respect to available commercial tools; (iv) smaller and well-controlled overhead in speed and area. Experimental results, on a set of industrial RTL designs, demonstrate the viability and practical impact of our approach.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131909033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269014
Rong Jiang, C. C. Chen
Presently, a necessary modification to mainstream analysis tools prevents the direct application of reluctance k. In this paper, we propose a reluctance realization algorithm (RRA) by directly converting reluctances to circuit elements compatible with general simulation engines, such as SPICE. Reluctance realization is applicable to arbitrary circuit topology and no accuracy penalty is involved in the realization process. Since the stability of the converted circuit largely depends on the stability of the reluctance matrix, we present an efficient improved recursive bisection cutting algorithm (IRBCA) to obtain stability-guaranteed reluctance matrices, and integrate IRBCA and RRA into a SPICE compatible reluctance extraction tool, SCORE.
{"title":"SCORE: SPICE compatible reluctance extraction","authors":"Rong Jiang, C. C. Chen","doi":"10.1109/DATE.2004.1269014","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269014","url":null,"abstract":"Presently, a necessary modification to mainstream analysis tools prevents the direct application of reluctance k. In this paper, we propose a reluctance realization algorithm (RRA) by directly converting reluctances to circuit elements compatible with general simulation engines, such as SPICE. Reluctance realization is applicable to arbitrary circuit topology and no accuracy penalty is involved in the realization process. Since the stability of the converted circuit largely depends on the stability of the reluctance matrix, we present an efficient improved recursive bisection cutting algorithm (IRBCA) to obtain stability-guaranteed reluctance matrices, and integrate IRBCA and RRA into a SPICE compatible reluctance extraction tool, SCORE.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268821
Yen-Jen Chang, Chia-Lin Yang, F. Lai
Most microprocessors employ the on-chip caches to bridge the performance gap between the processor and main memory. However, the cache accesses usually contribute significantly to the total power consumption of the chip. Based on the observation that an overwhelming majority of the cache access bits are '0', in this paper we propose a value-conscious (VC) cache to reduce the average cache power consumption during an access. Unlike the conventional cache with differential-bitline implementation, the VC cache is a single-bitline design. Depending on the access bit value, the VC cache can dynamically prevent the bitline from being discharged such that the power dissipated in accessing '0' is much less than the power dissipated in accessing '1'. The implementation of the VC cache is a circuit-level technique, which is software independent and orthogonal to other low power techniques at architecture-level. The experimental results based on the SPEC2000 and MediaBench traces show that without compromise of both performance and stability, by exploiting the prevalence of '0' bits in access data the VC cache can reduce the average cache read and write power by about 18%/spl sim/22% and 36%/spl sim/40%, respectively.
{"title":"Value-conscious cache: simple technique for reducing cache access power","authors":"Yen-Jen Chang, Chia-Lin Yang, F. Lai","doi":"10.1109/DATE.2004.1268821","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268821","url":null,"abstract":"Most microprocessors employ the on-chip caches to bridge the performance gap between the processor and main memory. However, the cache accesses usually contribute significantly to the total power consumption of the chip. Based on the observation that an overwhelming majority of the cache access bits are '0', in this paper we propose a value-conscious (VC) cache to reduce the average cache power consumption during an access. Unlike the conventional cache with differential-bitline implementation, the VC cache is a single-bitline design. Depending on the access bit value, the VC cache can dynamically prevent the bitline from being discharged such that the power dissipated in accessing '0' is much less than the power dissipated in accessing '1'. The implementation of the VC cache is a circuit-level technique, which is software independent and orthogonal to other low power techniques at architecture-level. The experimental results based on the SPEC2000 and MediaBench traces show that without compromise of both performance and stability, by exploiting the prevalence of '0' bits in access data the VC cache can reduce the average cache read and write power by about 18%/spl sim/22% and 36%/spl sim/40%, respectively.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124166686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269239
D. Lettnin, A. Braun, M. Bogdan, J. Gerlach, W. Rosenstiel
This work presents the whole system-on-silicon design flow using systemC system specification language. In this study, systemC is used to design a multilayer perceptron neural network, which is applied to an electrocardiogram pattern recognition system. The objective of this work is to exemplify the synthesis of RTL-and behavioral integrated systems. To achieve this, a preprocessing methodology was used to optimize the three main constraints of hardware neural network (HNN) design: accuracy, space and processing speed. This allows a complex HNN to be implemented on a single field programmable gate array (FPGA). The high level systemC synthesis allows the straightforward translation of system level into hardware level, avoiding the error prone and the time consuming translation into another hardware description language.
{"title":"Synthesis of embedded systemC design: a case study of digital neural networks","authors":"D. Lettnin, A. Braun, M. Bogdan, J. Gerlach, W. Rosenstiel","doi":"10.1109/DATE.2004.1269239","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269239","url":null,"abstract":"This work presents the whole system-on-silicon design flow using systemC system specification language. In this study, systemC is used to design a multilayer perceptron neural network, which is applied to an electrocardiogram pattern recognition system. The objective of this work is to exemplify the synthesis of RTL-and behavioral integrated systems. To achieve this, a preprocessing methodology was used to optimize the three main constraints of hardware neural network (HNN) design: accuracy, space and processing speed. This allows a complex HNN to be implemented on a single field programmable gate array (FPGA). The high level systemC synthesis allows the straightforward translation of system level into hardware level, avoiding the error prone and the time consuming translation into another hardware description language.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124558769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269247
N. Bannow, Karsten Haug
In this paper we present results in using the new object-oriented design approach OSSS (ODETTE system synthesis subset). The methodology and tools of the ODETTE (object-oriented co-design and functional test techniques) project have been developed within the context of the IST programme of the European Commission. Main focus of OSSS lies in the field of hardware design and in synthesis capability. The strategy is based on an extension of the synthesizable subset of standard systemC. The approach supports real object-oriented and synthesizable design features like classes, inheritance, templates, polymorphism and global object access. Therefore OSSS promises high efficiency in sense of capability to handle complex designs, faster development time, improved code quality and faster time to market. In contrast, standard systemC is also based on C++ constructs, but no object-oriented constructs are available yet for a synthesizable system description. We have evaluated OSSS on an automotive design example. It was chosen for the implementation of a component that is part of all video projects: A camera's exposure control unit (ExpoCU). The first main goal that was achieved is a synthesizable design by the automatic generation of an FPGA netlist from an OSSS description. Furthermore we have also proved the methodology to fulfill industrial requirements such as usability for complex system development, integration of existing IP, improved code quality and decreased development effort. Comparison will be done against existing VHDL based design flow. We especially focus on the implementation and testability by comparing the new object-oriented synthesis approach with a standard VHDL flow by laying emphasis on synthesizability. OSSS and equivalent kinds of methodology show a large potential to handle new generations of complex HW-SW systems. Moreover the gap between increasing design complexity and available methodologies already now gets bigger and bigger and thus needs to be closed by new solutions such as OSSS.
{"title":"Evaluation of an object-oriented hardware design methodology for automotive applications","authors":"N. Bannow, Karsten Haug","doi":"10.1109/DATE.2004.1269247","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269247","url":null,"abstract":"In this paper we present results in using the new object-oriented design approach OSSS (ODETTE system synthesis subset). The methodology and tools of the ODETTE (object-oriented co-design and functional test techniques) project have been developed within the context of the IST programme of the European Commission. Main focus of OSSS lies in the field of hardware design and in synthesis capability. The strategy is based on an extension of the synthesizable subset of standard systemC. The approach supports real object-oriented and synthesizable design features like classes, inheritance, templates, polymorphism and global object access. Therefore OSSS promises high efficiency in sense of capability to handle complex designs, faster development time, improved code quality and faster time to market. In contrast, standard systemC is also based on C++ constructs, but no object-oriented constructs are available yet for a synthesizable system description. We have evaluated OSSS on an automotive design example. It was chosen for the implementation of a component that is part of all video projects: A camera's exposure control unit (ExpoCU). The first main goal that was achieved is a synthesizable design by the automatic generation of an FPGA netlist from an OSSS description. Furthermore we have also proved the methodology to fulfill industrial requirements such as usability for complex system development, integration of existing IP, improved code quality and decreased development effort. Comparison will be done against existing VHDL based design flow. We especially focus on the implementation and testability by comparing the new object-oriented synthesis approach with a standard VHDL flow by laying emphasis on synthesizability. OSSS and equivalent kinds of methodology show a large potential to handle new generations of complex HW-SW systems. Moreover the gap between increasing design complexity and available methodologies already now gets bigger and bigger and thus needs to be closed by new solutions such as OSSS.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114728860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268922
P. Basu, Sayantan Das, P. Dasgupta, P. Chakrabarti, C. Mohan, L. Fix
It is essential to formally ascertain whether the RTL validation effort effectively guarantees the correctness with respect to the design's architectural intent. The design's architectural intent can be expressed in formal properties. However, due to the capacity limitation of formal verification, these architectural-properties cannot be directly verified on the RTL. As a result, a set of lower level RTL-properties are developed and verified against the RTL. In this paper we present: (1) a method for checking whether the RTL-properties are covering the architectural-properties, that is, whether verifying the RTL-properties guarantee the correctness of the design's architectural intent; and (2) a method to identify the coverage holes in terms of the architectural properties (or their sub-properties) that are not covered.
{"title":"Formal verification coverage: are the RTL-properties covering the design's architectural intent?","authors":"P. Basu, Sayantan Das, P. Dasgupta, P. Chakrabarti, C. Mohan, L. Fix","doi":"10.1109/DATE.2004.1268922","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268922","url":null,"abstract":"It is essential to formally ascertain whether the RTL validation effort effectively guarantees the correctness with respect to the design's architectural intent. The design's architectural intent can be expressed in formal properties. However, due to the capacity limitation of formal verification, these architectural-properties cannot be directly verified on the RTL. As a result, a set of lower level RTL-properties are developed and verified against the RTL. In this paper we present: (1) a method for checking whether the RTL-properties are covering the architectural-properties, that is, whether verifying the RTL-properties guarantee the correctness of the design's architectural intent; and (2) a method to identify the coverage holes in terms of the architectural properties (or their sub-properties) that are not covered.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114938629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268912
R. Murgai, S. Reddy, T. Miyoshi, T. Horie, M. Tahoori
Substrate noise (SN) is an important problem in mixed-signal designs. With increasing design complexity, it is not possible to simulate for SN with a detailed SPICE model that uses an accurate model for each transistor. In this paper, we propose a sensitivity analysis- and static timing analysis-based methodology to derive a reduced model that computes the worst case substrate noise in the design. The reduced model contains only passive components, which are very few, and is very quick to simulate. The main feature of our methodology is that, unlike previous approaches, it is independent of input patterns and does not need to simulate for millions of clock cycles. This lets us apply it to a full-chip design in reasonable CPU time. We validate our reduced model on several benchmark circuits against a detailed and highly accurate reference model. On average, the reduced model is within 16.4% of the reference model and is up to 38 times faster. Finally, we apply our methodology to a mixed-signal switch chip design consisting of 8 million gates and show that it finishes in 17 minutes.
{"title":"Sensitivity-based modeling and methodology for full-chip substrate noise analysis","authors":"R. Murgai, S. Reddy, T. Miyoshi, T. Horie, M. Tahoori","doi":"10.1109/DATE.2004.1268912","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268912","url":null,"abstract":"Substrate noise (SN) is an important problem in mixed-signal designs. With increasing design complexity, it is not possible to simulate for SN with a detailed SPICE model that uses an accurate model for each transistor. In this paper, we propose a sensitivity analysis- and static timing analysis-based methodology to derive a reduced model that computes the worst case substrate noise in the design. The reduced model contains only passive components, which are very few, and is very quick to simulate. The main feature of our methodology is that, unlike previous approaches, it is independent of input patterns and does not need to simulate for millions of clock cycles. This lets us apply it to a full-chip design in reasonable CPU time. We validate our reduced model on several benchmark circuits against a detailed and highly accurate reference model. On average, the reduced model is within 16.4% of the reference model and is up to 38 times faster. Finally, we apply our methodology to a mixed-signal switch chip design consisting of 8 million gates and show that it finishes in 17 minutes.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123639239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269026
M. Bolado, H. Posadas, J. Castillo, P. Huerta, P. Sánchez, C. Sánchez, H. Fouren, Francisco Blasco
The latest version of the international technology roadmap for semiconductors predicts that design reuse will be essential in the near future to face the constantly increasing design complexity. The concept comes from software engineering in which reuse is a fundamental technology. In order to provide libraries and applications to reuse in software development, some open-source initiatives (e.g. Linux, gcc, X, mysql) have appeared during the last decades. The basic idea is to distribute the library or application source code (normally free-of-charge) and allow any developer to use, modify, debug and improve it. Several initiatives have tried to port this idea to hardware development. The main goal of this paper is to develop a synthesizable platform described in SystemC from an open architecture. The platform includes a CPU (OpenRISC) and some basic peripherals. A set of software development tools (compiler, assembler, debugger) and RTOS (eCos) has also been developed. This work enables the evaluation of the advantages and disadvantages of the open-source model in electronic system design.
{"title":"Platform based on open-source cores for industrial applications","authors":"M. Bolado, H. Posadas, J. Castillo, P. Huerta, P. Sánchez, C. Sánchez, H. Fouren, Francisco Blasco","doi":"10.1109/DATE.2004.1269026","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269026","url":null,"abstract":"The latest version of the international technology roadmap for semiconductors predicts that design reuse will be essential in the near future to face the constantly increasing design complexity. The concept comes from software engineering in which reuse is a fundamental technology. In order to provide libraries and applications to reuse in software development, some open-source initiatives (e.g. Linux, gcc, X, mysql) have appeared during the last decades. The basic idea is to distribute the library or application source code (normally free-of-charge) and allow any developer to use, modify, debug and improve it. Several initiatives have tried to port this idea to hardware development. The main goal of this paper is to develop a synthesizable platform described in SystemC from an open architecture. The platform includes a CPU (OpenRISC) and some basic peripherals. A set of software development tools (compiler, assembler, debugger) and RTOS (eCos) has also been developed. This work enables the evaluation of the advantages and disadvantages of the open-source model in electronic system design.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122033963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269045
V. Chandra, A. Xu, H. Schmit, L. Pileggi
On-chip communication is becoming a bottleneck for high performance designs. Conventional interconnect design methodology does not account for architectures and/or communication schemes that require storage buffers (first-in-first-out queues or FIFOs) in the interconnect channel. For example, FIFOs and flow-control are needed for Network-on-Chip, high performance ASICs and multiple clock domain designs. These IC implementation architectures require an efficient methodology to determine the size of the FIFOs in the channel since the FIFO sizes affect system performance. In this work we devised a methodology to size the FIFOs in an interconnect channel containing one or more FIFOs connected in series. We show that the sizing of the FIFOs in the channel is a function of system parameters such as data production rate and consumption rate, data burstiness, number of channel stages etc. and we also quantify their effect on performance. For a single clock design, we have developed an efficient algorithm which reduces the search space for the optimal sizing of the FIFOs in the channel.
{"title":"An interconnect channel design methodology for high performance integrated circuits","authors":"V. Chandra, A. Xu, H. Schmit, L. Pileggi","doi":"10.1109/DATE.2004.1269045","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269045","url":null,"abstract":"On-chip communication is becoming a bottleneck for high performance designs. Conventional interconnect design methodology does not account for architectures and/or communication schemes that require storage buffers (first-in-first-out queues or FIFOs) in the interconnect channel. For example, FIFOs and flow-control are needed for Network-on-Chip, high performance ASICs and multiple clock domain designs. These IC implementation architectures require an efficient methodology to determine the size of the FIFOs in the channel since the FIFO sizes affect system performance. In this work we devised a methodology to size the FIFOs in an interconnect channel containing one or more FIFOs connected in series. We show that the sizing of the FIFOs in the channel is a function of system parameters such as data production rate and consumption rate, data burstiness, number of channel stages etc. and we also quantify their effect on performance. For a single clock design, we have developed an efficient algorithm which reduces the search space for the optimal sizing of the FIFOs in the channel.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124724905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}