Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269200
A. Panato, S. V. Silva, F. Wagner, M. Johann, R. Reis, S. Bampi
This work investigates the use of very deep pipelines for implementing circuits in FPGAs, where each pipeline stage is limited to a single FPGA logic element (LE). The architecture and VHDL design of a parameterized integer array multiplier is presented and also an IEEE 754 compliant 32-bit floating-point multiplier. We show how to write VHDL cells that implement such approach, and how the array multiplier architecture was adapted. Synthesis and simulation were performed for Altera Apex20KE devices, although the VHDL code should be portable to other devices. For this family, a 16 bit integer multiplier achieves a frequency of 266 MHz, while the floating point unit reaches 235 MHz, performing 235 MFLOPS in an FPGA. Additional cells are inserted to synchronize data, what imposes significant area penalties. This and other considerations to apply the technique in real designs are also addressed.
{"title":"Design of very deep pipelined multipliers for FPGAs","authors":"A. Panato, S. V. Silva, F. Wagner, M. Johann, R. Reis, S. Bampi","doi":"10.1109/DATE.2004.1269200","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269200","url":null,"abstract":"This work investigates the use of very deep pipelines for implementing circuits in FPGAs, where each pipeline stage is limited to a single FPGA logic element (LE). The architecture and VHDL design of a parameterized integer array multiplier is presented and also an IEEE 754 compliant 32-bit floating-point multiplier. We show how to write VHDL cells that implement such approach, and how the array multiplier architecture was adapted. Synthesis and simulation were performed for Altera Apex20KE devices, although the VHDL code should be portable to other devices. For this family, a 16 bit integer multiplier achieves a frequency of 266 MHz, while the floating point unit reaches 235 MHz, performing 235 MFLOPS in an FPGA. Additional cells are inserted to synchronize data, what imposes significant area penalties. This and other considerations to apply the technique in real designs are also addressed.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115596506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269064
S. Khawam, S. Baloch, A. Pai, Imran Ahmed, N. Aydin, T. Arslan, F. Westall
Mobile video processing as defined in standards like MPEG-4 and H.263 contains a number of timeconsuming computations that cannot be efficiently executed on current hardware architectures. The authors recently introduced a reconfigurable SoC platform that permits a low-power, high-throughput and flexible implementation of the motion estimation and DCT algorithms. The computations are done using domainspecific reconfigurable arrays that have demonstrated up to 75% reduction in power consumption when compared to generic FPGA architecture, which makes them suitable for portable devices. This paper presents and compares different configurations of the arrays to efficiently implementing DCT and motion estimation algorithms. A number of algorithms are mapped into the various reconfigurable fabrics demonstrating the flexibility of the new reconfigurable SoC architecture and its ability to support a number of implementations having different performance characteristics.
{"title":"Efficient implementations of mobile video computations on domain-specific reconfigurable arrays","authors":"S. Khawam, S. Baloch, A. Pai, Imran Ahmed, N. Aydin, T. Arslan, F. Westall","doi":"10.1109/DATE.2004.1269064","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269064","url":null,"abstract":"Mobile video processing as defined in standards like MPEG-4 and H.263 contains a number of timeconsuming computations that cannot be efficiently executed on current hardware architectures. The authors recently introduced a reconfigurable SoC platform that permits a low-power, high-throughput and flexible implementation of the motion estimation and DCT algorithms. The computations are done using domainspecific reconfigurable arrays that have demonstrated up to 75% reduction in power consumption when compared to generic FPGA architecture, which makes them suitable for portable devices. This paper presents and compares different configurations of the arrays to efficiently implementing DCT and motion estimation algorithms. A number of algorithms are mapped into the various reconfigurable fabrics demonstrating the flexibility of the new reconfigurable SoC architecture and its ability to support a number of implementations having different performance characteristics.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114711740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269002
S. Murali, G. Micheli
We address the design of complex monolithic systems, where processing cores generate and consume a varying and large amount of data, thus bringing the communication links to the edge of congestion. Typical applications are in the area of multi-media processing. We consider a mesh-based networks on chip (NoC) architecture, and we explore the assignment of cores to mesh cross-points so that the traffic on links satisfies bandwidth constraints. A single-path deterministic routing between the cores places high bandwidth demands on the links. The bandwidth requirements can be significantly reduced by splitting the traffic between the cores across multiple paths. In this paper, we present NMAP, a fast algorithm that maps the cores onto a mesh NoC architecture under bandwidth constraints, minimizing the average communication delay. The NMAP algorithm is presented for both single minimum-path routing and split-traffic routing. The algorithm is applied to a benchmark DSP design and the resulting NoC is built and simulated at cycle accurate level in SystemC using macros from the /spl times/pipes library. Also, experiments with six video processing applications show significant savings in bandwidth and communication cost for NMAP algorithm when compared to existing algorithms.
{"title":"Bandwidth-constrained mapping of cores onto NoC architectures","authors":"S. Murali, G. Micheli","doi":"10.1109/DATE.2004.1269002","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269002","url":null,"abstract":"We address the design of complex monolithic systems, where processing cores generate and consume a varying and large amount of data, thus bringing the communication links to the edge of congestion. Typical applications are in the area of multi-media processing. We consider a mesh-based networks on chip (NoC) architecture, and we explore the assignment of cores to mesh cross-points so that the traffic on links satisfies bandwidth constraints. A single-path deterministic routing between the cores places high bandwidth demands on the links. The bandwidth requirements can be significantly reduced by splitting the traffic between the cores across multiple paths. In this paper, we present NMAP, a fast algorithm that maps the cores onto a mesh NoC architecture under bandwidth constraints, minimizing the average communication delay. The NMAP algorithm is presented for both single minimum-path routing and split-traffic routing. The algorithm is applied to a benchmark DSP design and the resulting NoC is built and simulated at cycle accurate level in SystemC using macros from the /spl times/pipes library. Also, experiments with six video processing applications show significant savings in bandwidth and communication cost for NMAP algorithm when compared to existing algorithms.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114466275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268883
Anuja Sehgal, K. Chakrabarty
The increasing complexity of system-on-chip (SOC) integrated circuits has spurred the development of versatile automatic test equipment (ATE) that can simultaneously drive different channels at different data rates. Examples of such ATEs include the Agilent 93000 series tester based on port scalability and the test processor-per-pin architecture, and the Tiger system from Teradyne. The number of tester channels with high data rates may be constrained in practice however due to ATE resource limitations, the power rating of the SOC, and scan frequency limits for the embedded cores. Therefore, we formulate the following optimization problem: given two available data rates for the tester channels, an SOC-level test access mechanism (TAM) width W, V (V < W) channels that can transport test data at the higher data rate, determine an SOC TAM architecture that minimizes the testing time. We present an efficient heuristic algorithm for TAM optimization that exploits port scalability of ATEs to reduce SOC testing time and test cost. We present experimental results on dual-speed TAM optimization for the ITC'2002 SOC test benchmarks.
片上系统(SOC)集成电路的复杂性日益增加,促使了多功能自动测试设备(ATE)的发展,这些设备可以同时以不同的数据速率驱动不同的通道。此类ATEs的示例包括基于端口可伸缩性和测试处理器每引脚架构的Agilent 93000系列测试仪,以及Teradyne的Tiger系统。然而,由于ATE资源限制、SOC的额定功率以及嵌入式内核的扫描频率限制,具有高数据速率的测试通道的数量可能在实践中受到限制。因此,我们制定了以下优化问题:给定两种可用的测试通道数据速率,一个SOC级测试访问机制(TAM)宽度为W, V (V < W)的通道可以以更高的数据速率传输测试数据,确定一个SOC TAM架构,使测试时间最小化。我们提出了一种有效的启发式算法,该算法利用ATEs的端口可扩展性来减少SOC测试时间和测试成本。我们在ITC'2002 SOC测试基准上给出了双速度TAM优化的实验结果。
{"title":"Efficient modular testing of SoCs using dual-speed TAM architectures","authors":"Anuja Sehgal, K. Chakrabarty","doi":"10.1109/DATE.2004.1268883","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268883","url":null,"abstract":"The increasing complexity of system-on-chip (SOC) integrated circuits has spurred the development of versatile automatic test equipment (ATE) that can simultaneously drive different channels at different data rates. Examples of such ATEs include the Agilent 93000 series tester based on port scalability and the test processor-per-pin architecture, and the Tiger system from Teradyne. The number of tester channels with high data rates may be constrained in practice however due to ATE resource limitations, the power rating of the SOC, and scan frequency limits for the embedded cores. Therefore, we formulate the following optimization problem: given two available data rates for the tester channels, an SOC-level test access mechanism (TAM) width W, V (V < W) channels that can transport test data at the higher data rate, determine an SOC TAM architecture that minimizes the testing time. We present an efficient heuristic algorithm for TAM optimization that exploits port scalability of ATEs to reduce SOC testing time and test cost. We present experimental results on dual-speed TAM optimization for the ITC'2002 SOC test benchmarks.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"6 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116742643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268858
Sean Safarpour, A. Veneris, R. Drechsler, Joanne Lee
Advances in Boolean satisfiability solvers have popularized their use in many of today's CAD VLSI challenges. Existing satisfiability solvers operate on a circuit representation that does not capture all of the structural circuit characteristics and properties. This work proposes algorithms that take into account the circuit don't care conditions thus enhancing the performance of these tools. Don't care sets are addressed in this work both statically and dynamically to reduce the search space and guide the decision making process. Experiments demonstrate performance gains.
{"title":"Managing don't cares in Boolean satisfiability","authors":"Sean Safarpour, A. Veneris, R. Drechsler, Joanne Lee","doi":"10.1109/DATE.2004.1268858","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268858","url":null,"abstract":"Advances in Boolean satisfiability solvers have popularized their use in many of today's CAD VLSI challenges. Existing satisfiability solvers operate on a circuit representation that does not capture all of the structural circuit characteristics and properties. This work proposes algorithms that take into account the circuit don't care conditions thus enhancing the performance of these tools. Don't care sets are addressed in this work both statically and dynamically to reduce the search space and guide the decision making process. Experiments demonstrate performance gains.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116177854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269268
P. Daglio, D. Iezzi, Danilo Rimondi, C. Roma, Salvatore Santapa
Main concerns related to post-layout simulation, today, are about the format of the netlist coming out from the parasitic extractor. In fact, such a netlist is usually flat so that readability, whether compared to the pre-layout hierarchical one, is very poor due to device and net names which often change and to the difficulty to compare pre-layout and post-layout output signals. Furthermore, simulating such large flat netlists is frequently time consuming because it is not possible to exploit algorithms like hierarchical array reduction (HAR) and isomorphic matching (IM), strength points of state-of-the-art full chip simulators. In this paper, we present a new approach that, starting from a flat netlist with parasitic components and a pre-layout hierarchical one, allows to create a fully hierarchical post-layout netlist containing device parameters and parasitic components directly extracted from the layout. In this way, a fast and accurate post-layout simulation is made possible by the use of look-up table simulators, taking advantages from the HAR and IM algorithms as mentioned before. This methodology has been integrated in a complete design flow to guarantee first silicon success, cut down time-to-design, improve time-to-market and streamline design quality.
{"title":"Building the hierarchy from a flat netlist for a fast and accurate post-layout simulation with parasitic components","authors":"P. Daglio, D. Iezzi, Danilo Rimondi, C. Roma, Salvatore Santapa","doi":"10.1109/DATE.2004.1269268","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269268","url":null,"abstract":"Main concerns related to post-layout simulation, today, are about the format of the netlist coming out from the parasitic extractor. In fact, such a netlist is usually flat so that readability, whether compared to the pre-layout hierarchical one, is very poor due to device and net names which often change and to the difficulty to compare pre-layout and post-layout output signals. Furthermore, simulating such large flat netlists is frequently time consuming because it is not possible to exploit algorithms like hierarchical array reduction (HAR) and isomorphic matching (IM), strength points of state-of-the-art full chip simulators. In this paper, we present a new approach that, starting from a flat netlist with parasitic components and a pre-layout hierarchical one, allows to create a fully hierarchical post-layout netlist containing device parameters and parasitic components directly extracted from the layout. In this way, a fast and accurate post-layout simulation is made possible by the use of look-up table simulators, taking advantages from the HAR and IM algorithms as mentioned before. This methodology has been integrated in a complete design flow to guarantee first silicon success, cut down time-to-design, improve time-to-market and streamline design quality.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115133464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268873
J. Brunel, M. Natale, A. Ferrari, P. Giusto, L. Lavagno
This paper discusses a model-based design flow for requirements in distributed embedded software development. Such requirements are specified using a language similar to linear temporal logic which allows one to reason about time and sequencing. They consist of assertions which must hold for a design, given some assumptions on its environment. They can be checked both during simulation and, at least for a subset, even on the target. The key contribution of the paper is the extension to the embedded software domain of assertion-based verification, and the automated generation of property-checking code in multiple target languages, from simulation, to prototyping, to final production.
{"title":"SoftContract: an assertion-based software development process that enables design-by-contract","authors":"J. Brunel, M. Natale, A. Ferrari, P. Giusto, L. Lavagno","doi":"10.1109/DATE.2004.1268873","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268873","url":null,"abstract":"This paper discusses a model-based design flow for requirements in distributed embedded software development. Such requirements are specified using a language similar to linear temporal logic which allows one to reason about time and sequencing. They consist of assertions which must hold for a design, given some assumptions on its environment. They can be checked both during simulation and, at least for a subset, even on the target. The key contribution of the paper is the extension to the embedded software domain of assertion-based verification, and the automated generation of property-checking code in multiple target languages, from simulation, to prototyping, to final production.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124676588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268948
F. Burns, D. Shang, A. Koelmans, A. Yakovlev
We present a new CAD tool set for generating asynchronous circuits from high-level Verilog level-sensitive specifications. Initially, high-level Verilog descriptions are compiled and converted into a novel intermediate Petri net format. The intermediate format is subsequently passed to optimization tools and mapping tools where it is directly mapped into asynchronous datapath and control circuits using David cells (DCs). Finally, logic optimization tools are applied to generate speed-independent (SI) circuits. The speed independent circuits generated perform well compared to circuits generated by existing asynchronous tools.
{"title":"An asynchronous synthesis toolset using Verilog","authors":"F. Burns, D. Shang, A. Koelmans, A. Yakovlev","doi":"10.1109/DATE.2004.1268948","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268948","url":null,"abstract":"We present a new CAD tool set for generating asynchronous circuits from high-level Verilog level-sensitive specifications. Initially, high-level Verilog descriptions are compiled and converted into a novel intermediate Petri net format. The intermediate format is subsequently passed to optimization tools and mapping tools where it is directly mapped into asynchronous datapath and control circuits using David cells (DCs). Finally, logic optimization tools are applied to generate speed-independent (SI) circuits. The speed independent circuits generated perform well compared to circuits generated by existing asynchronous tools.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124937861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1269085
D. K. Reed, S. Levitan, J. Boles, J. A. Martínez, D. Chiarulli
We present our system-level co-simulation environment for mixed domain microsystems. The environment provides synchronization and co-simulation between the Chatoyant MOEMS (micro-electro mechanical systems) simulator and ModelTech ModelSim. By using shared memory IPC (inter-process communication) and PDES (parallel discrete event simulation) techniques, we achieve two orders of magnitude speedup over standard pipe/socket communication.
{"title":"An application of parallel discrete event simulation algorithms to mixed domain system simulation","authors":"D. K. Reed, S. Levitan, J. Boles, J. A. Martínez, D. Chiarulli","doi":"10.1109/DATE.2004.1269085","DOIUrl":"https://doi.org/10.1109/DATE.2004.1269085","url":null,"abstract":"We present our system-level co-simulation environment for mixed domain microsystems. The environment provides synchronization and co-simulation between the Chatoyant MOEMS (micro-electro mechanical systems) simulator and ModelTech ModelSim. By using shared memory IPC (inter-process communication) and PDES (parallel discrete event simulation) techniques, we achieve two orders of magnitude speedup over standard pipe/socket communication.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121549218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-02-16DOI: 10.1109/DATE.2004.1268998
A. Radulescu, J. Dielissen, K. Goossens, E. Rijpkema, P. Wielage
In this paper we present a network interface for an on-chip network. Our network interface decouples computation from communication by offering a shared-memory abstraction, which is independent of the network implementation. We use a transaction-based protocol to achieve backward compatibility with existing bus protocols such as AXI, OCP and DTL. Our network interface has a modular architecture, which allows flexible instantiation. It provides both guaranteed and best-effort services via connections. These are configured via network interface ports using the network itself, instead of a separate control interconnect. An example instance of this network interface with 4 ports has an area of 0.143 mm/sup 2/ in a 0.13 /spl mu/m technology, and runs at 500 MHz.
{"title":"An efficient on-chip network interface offering guaranteed services, shared-memory abstraction, and flexible network configuration","authors":"A. Radulescu, J. Dielissen, K. Goossens, E. Rijpkema, P. Wielage","doi":"10.1109/DATE.2004.1268998","DOIUrl":"https://doi.org/10.1109/DATE.2004.1268998","url":null,"abstract":"In this paper we present a network interface for an on-chip network. Our network interface decouples computation from communication by offering a shared-memory abstraction, which is independent of the network implementation. We use a transaction-based protocol to achieve backward compatibility with existing bus protocols such as AXI, OCP and DTL. Our network interface has a modular architecture, which allows flexible instantiation. It provides both guaranteed and best-effort services via connections. These are configured via network interface ports using the network itself, instead of a separate control interconnect. An example instance of this network interface with 4 ports has an area of 0.143 mm/sup 2/ in a 0.13 /spl mu/m technology, and runs at 500 MHz.","PeriodicalId":335658,"journal":{"name":"Proceedings Design, Automation and Test in Europe Conference and Exhibition","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122782746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}