2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

英文中文

Architectural Vulnerability Factor Estimation with Backwards Analysis 基于向后分析的体系结构脆弱性因子估计

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.104

Robert Hartl, Andreas-Juergen Rohatschek, W. Stechele, A. Herkersdorf

Single-Event-Upsets in synchronous register-based designs are a severe problem for safety-critical applications. Exact and detailed error rate estimations are needed to determinea system’s level of reliability. Available methods for estimation consider only special effects, use special reliability models or are computationally intensive. We present an innovative method that is able to calculate the architectural vulnerability factor (AVF)of any RT-level circuit description by applying time-reversed stimulus values. This method, which we call Backwards Analysis, considers all major masking effects (logic masking, information lifetime, timing derating, transitive masking) in a single algorithm and delivers results in several levels of detail from average AVF through sensitivity waveforms. The results show the critical parts and states of a design, which could be used for reliability assessment and selective hardening of the circuit to reach a target failure rate.

在基于同步寄存器的设计中，单事件中断是安全关键应用的一个严重问题。为了确定系统的可靠性水平，需要精确而详细的错误率估计。现有的估计方法只考虑特殊效应，使用特殊的可靠性模型或计算量大。我们提出了一种创新的方法，可以通过应用时间反转的刺激值来计算任何rt级电路描述的架构脆弱性因子(AVF)。这种方法，我们称之为反向分析，在单一算法中考虑所有主要的掩蔽效应(逻辑掩蔽，信息寿命，时序降率，传递掩蔽)，并提供从平均AVF到灵敏度波形的几个细节级别的结果。结果显示了设计的关键部件和状态，可用于可靠性评估和电路的选择性硬化以达到目标故障率。

引用次数: 5

Medical Diagnosis Improvement Through Image Quality Enhancement Based on Super-Resolution 基于超分辨率图像质量增强的医学诊断改进

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.35

Lara G. Villanueva, G. Callicó, F. Tobajas, S. López, V. Armas, J. López, R. Sarmiento

Nowadays, images are employed in several areas of medicine for early diagnosis. In this sense, the industry provides accurate models to obtain, for example, X-ray and cardiology images of high resolution. However, other images, such as those related to pathological anatomy present in many situations poor quality, which complicates the diagnostic process. This work is focused on the quality enhancement of this type of images through a system based on super-resolution techniques. The results show that the proposed methodology can help medical specialists in the diagnostic of several pathologies.

如今，图像在医学的几个领域被用于早期诊断。从这个意义上说，该行业提供了精确的模型，例如获得高分辨率的x射线和心脏病学图像。然而，其他图像，如与病理解剖相关的图像，在许多情况下质量较差，这使诊断过程复杂化。这项工作的重点是通过一个基于超分辨率技术的系统来提高这类图像的质量。结果表明，提出的方法可以帮助医学专家在几种病理诊断。

引用次数: 7

Structurally Synthesized Multiple Input BDDs for Speeding Up Logic-Level Simulation of Digital Circuits 加速数字电路逻辑级仿真的结构合成多输入bdd

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.27

Dmitri Mironov, R. Ubar, S. Devadze, J. Raik, A. Jutman

Logic simulation is a critical component of the design tool flow in modern hardware development efforts. In this paper a new algorithm for parallel logic simulation is proposed based on a new model of Structurally Synthesized Multiple Input BDDs (SSMIBDD). The SSMIBDDs allow further model size reduction and therefore higher speed of logic simulation than its predecessor SSBDD model. The paper presents a method of SSMIBDD synthesis from the given gate network and the main principles of parallel logic simulation with SSMIBDDs. Experimental data demonstrate in average 2.9 times improvement in the speed of logic simulation because of the reduced number of nodes in SSMIBDDs. Similarly to the SSBDDs, the new model preserves structural information about the circuit, which is needed for processing of faults. The reduced complexity of SSMIBDDs leads to the more powerful fault collapsing and as the result to more efficient fault simulation and fault injection to evaluate the dependability of fault tolerant circuits.

在现代硬件开发工作中，逻辑仿真是设计工具流的关键组成部分。本文提出了一种新的基于结构合成多输入bdd (SSMIBDD)模型的并行逻辑仿真算法。ssmibdd允许进一步减小模型尺寸，因此比其前身SSBDD模型具有更高的逻辑仿真速度。本文给出了一种基于给定栅极网络的SSMIBDD合成方法，以及SSMIBDD并行逻辑仿真的主要原理。实验数据表明，由于减少了ssmibdd中的节点数量，逻辑仿真速度平均提高了2.9倍。与ssbdd类似，新模型保留了电路的结构信息，这是处理故障所需要的。降低了ssmibdd的复杂度，使得故障崩溃功能更强大，从而可以更有效地进行故障模拟和故障注入，从而评估容错电路的可靠性。

引用次数: 3

Power Distribution in NoCs Through a Fuzzy Based Selection Strategy for Adaptive Routing 基于模糊自适应路由选择策略的noc功率分配

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.24

Nastaran Salehi, A. Khademzadeh, A. Dana

Network-on-chip (NoC) is being proposed as a scalable and reusable communication platform for future embedded systems. The performance of NoC largely depends on the underlying deadlock-free and efficient routing algorithm. When the adaptive routing returns a set of acceptable output channels, then a selection strategy is used to select the output channel, therefore the selection strategy affects the efficiency of adaptive routing. In this paper a novel selection strategy for avoiding congested areas using a fuzzy-based routing decision is proposed that can be used with any adaptive routing algorithm. The objective of the proposed selection strategy is to choose a channel that has more free slots input buffer and lower power consumption. The routing path is established by minimizing a cost which is calculated by fuzzy controller and considers the power consumption and free slots input buffer of cores. Performance evaluation is carried out by using a flit-accurate simulator under different traffic scenarios. Result experiments show that the proposed selection strategy applied to Odd-Even routing algorithm can effectively improves average delay and power consumption to meet power balance requirement and avoid hotspot with low hardware overhead.

片上网络(NoC)被提出作为未来嵌入式系统的可扩展和可重用的通信平台。NoC的性能在很大程度上取决于底层无死锁和高效的路由算法。当自适应路由返回一组可接受的输出通道时，需要使用选择策略来选择输出通道，因此选择策略影响自适应路由的效率。本文提出了一种基于模糊路由决策的避免拥塞区域的选择策略，该策略可用于任何自适应路由算法。所提出的选择策略的目标是选择具有更多空闲槽输入缓冲区和更低功耗的信道。路由路径由模糊控制器在考虑核的功耗和空闲插槽输入缓冲区的情况下，通过最小化代价来确定。在不同的交通场景下，利用飞行精度模拟器进行了性能评估。实验结果表明，将所提出的选择策略应用于奇偶路由算法中，可以有效地提高平均延迟和功耗，以满足功率均衡要求，并在硬件开销低的情况下避免热点。

{"title":"Power Distribution in NoCs Through a Fuzzy Based Selection Strategy for Adaptive Routing","authors":"Nastaran Salehi, A. Khademzadeh, A. Dana","doi":"10.1109/DSD.2010.24","DOIUrl":"https://doi.org/10.1109/DSD.2010.24","url":null,"abstract":"Network-on-chip (NoC) is being proposed as a scalable and reusable communication platform for future embedded systems. The performance of NoC largely depends on the underlying deadlock-free and efficient routing algorithm. When the adaptive routing returns a set of acceptable output channels, then a selection strategy is used to select the output channel, therefore the selection strategy affects the efficiency of adaptive routing. In this paper a novel selection strategy for avoiding congested areas using a fuzzy-based routing decision is proposed that can be used with any adaptive routing algorithm. The objective of the proposed selection strategy is to choose a channel that has more free slots input buffer and lower power consumption. The routing path is established by minimizing a cost which is calculated by fuzzy controller and considers the power consumption and free slots input buffer of cores. Performance evaluation is carried out by using a flit-accurate simulator under different traffic scenarios. Result experiments show that the proposed selection strategy applied to Odd-Even routing algorithm can effectively improves average delay and power consumption to meet power balance requirement and avoid hotspot with low hardware overhead.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123402309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Use of Genetic Algorithm to Derive Correlation Between Test Vector and Scan Register Sequences and Reduce Power Consumption 利用遗传算法推导测试向量和扫描寄存器序列之间的相关性，降低功耗

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.37

Z. Kotásek, Jaroslav Skarvada, Josef Strnadel

In most of existing approaches, the reorganization of test vector sequence and reordering scan chains registers to reduce power consumption are solved separately, they are seen as independent procedures. In the paper it is shown that a correlation between these two processes and strong reasons to combine them into one procedure run concurrently exist. Based on this idea, it is demonstrated that search spaces of both procedures can be combined together into a single search space in order to achieve better results during the optimization process. The optimization over the united search space was tested on ISCAS85, ISCAS89 and ITC99 benchmark circuits implemented by means of CMOS primitives from AMI technological libraries. Results presented in the paper show that lower power consumption can be achieved if the correlation is reflected, i.e., if the search space is united rather than divided into separate spaces. At the end of the paper, results achieved by genetic algorithm based optimization are presented, discussed and compared with results of existing methods.

在现有的方法中，为了降低功耗，测试向量序列的重组和扫描链寄存器的重新排序是分开解决的，它们被看作是独立的过程。本文证明了这两个过程之间存在相关性，并有充分的理由将它们合并为一个过程并发运行。在此基础上，证明了在优化过程中，为了获得更好的结果，可以将两个过程的搜索空间组合成一个搜索空间。在ISCAS85、ISCAS89和ITC99基准电路上，利用AMI技术库中的CMOS原语实现了统一搜索空间的优化。本文的研究结果表明，如果反映相关性，即如果将搜索空间统一起来而不是分割成单独的空间，则可以实现更低的功耗。最后给出了基于遗传算法的优化结果，并与现有方法的结果进行了讨论和比较。

{"title":"The Use of Genetic Algorithm to Derive Correlation Between Test Vector and Scan Register Sequences and Reduce Power Consumption","authors":"Z. Kotásek, Jaroslav Skarvada, Josef Strnadel","doi":"10.1109/DSD.2010.37","DOIUrl":"https://doi.org/10.1109/DSD.2010.37","url":null,"abstract":"In most of existing approaches, the reorganization of test vector sequence and reordering scan chains registers to reduce power consumption are solved separately, they are seen as independent procedures. In the paper it is shown that a correlation between these two processes and strong reasons to combine them into one procedure run concurrently exist. Based on this idea, it is demonstrated that search spaces of both procedures can be combined together into a single search space in order to achieve better results during the optimization process. The optimization over the united search space was tested on ISCAS85, ISCAS89 and ITC99 benchmark circuits implemented by means of CMOS primitives from AMI technological libraries. Results presented in the paper show that lower power consumption can be achieved if the correlation is reflected, i.e., if the search space is united rather than divided into separate spaces. At the end of the paper, results achieved by genetic algorithm based optimization are presented, discussed and compared with results of existing methods.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124912052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Low Cost Single-Cycle Router Based on Virtual Output Queuing for On-chip Networks 基于片上网络虚拟输出排队的低成本单周期路由器

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.15

S. Nguyen, S. Oyanagi

The communication latency of Network-on-Chip (NoC) is one of the factors that significantly impacts on the application performance on System-on-Chips. To reduce the NoC latency, we propose a low latency architecture of router, which utilizes virtual output queuing (VOQ) to shorten the processing time of a packet transfer. Based on taking advantage of VOQ in buffering, the number of pipeline stages of a packet transfer can be reduced to two stages of switch allocation and switch traversal. By speculatively implementing these stages in a parallel fashion, the router can perform a packet transfer in only one clock cycle. In addition, a multiple VOQ architecture that each input port maintains more than one queue for each output channel is also proposed for improving the throughput of router. We have implemented the proposed router on FPGA and evaluated in terms of communication latency, throughput and hardware amount. The experimental results show that in a 4x4 two-dimensional mesh network, the proposed router reduces the communication latency by 25% and cost of area by 67.3% as compared to the look-ahead speculative virtual channel router.

片上网络(NoC)的通信延迟是影响片上系统应用性能的重要因素之一。为了减少NoC延迟，我们提出了一种低延迟路由器架构，该架构利用虚拟输出排队(VOQ)来缩短数据包传输的处理时间。在利用VOQ缓冲的基础上，可以将数据包传输的管道阶段数减少到交换机分配和交换机遍历两个阶段。通过推测地以并行方式实现这些阶段，路由器可以仅在一个时钟周期内执行数据包传输。此外，为了提高路由器的吞吐量，还提出了一种每个输入端口为每个输出通道维护多个队列的多VOQ架构。我们已经在FPGA上实现了所提出的路由器，并在通信延迟、吞吐量和硬件数量方面进行了评估。实验结果表明，在4x4二维网状网络中，与前瞻性推测型虚拟通道路由器相比，该路由器的通信延迟降低了25%，面积成本降低了67.3%。

{"title":"A Low Cost Single-Cycle Router Based on Virtual Output Queuing for On-chip Networks","authors":"S. Nguyen, S. Oyanagi","doi":"10.1109/DSD.2010.15","DOIUrl":"https://doi.org/10.1109/DSD.2010.15","url":null,"abstract":"The communication latency of Network-on-Chip (NoC) is one of the factors that significantly impacts on the application performance on System-on-Chips. To reduce the NoC latency, we propose a low latency architecture of router, which utilizes virtual output queuing (VOQ) to shorten the processing time of a packet transfer. Based on taking advantage of VOQ in buffering, the number of pipeline stages of a packet transfer can be reduced to two stages of switch allocation and switch traversal. By speculatively implementing these stages in a parallel fashion, the router can perform a packet transfer in only one clock cycle. In addition, a multiple VOQ architecture that each input port maintains more than one queue for each output channel is also proposed for improving the throughput of router. We have implemented the proposed router on FPGA and evaluated in terms of communication latency, throughput and hardware amount. The experimental results show that in a 4x4 two-dimensional mesh network, the proposed router reduces the communication latency by 25% and cost of area by 67.3% as compared to the look-ahead speculative virtual channel router.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125013589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

C?aSH: Structural Descriptions of Synchronous Hardware Using Haskell C ?使用Haskell的同步硬件结构描述

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.21

Christiaan Baaij, M. Kooijman, J. Kuper, Arjan Boeijink, Marco E. T. Gerards

CλaSH is a functional hardware description language that borrows both its syntax and semantics from the functional programming language Haskell. Polymorphism and higher-order functions provide a level of abstraction and generality that allow a circuit designer to describe circuits in a more natural way than possible with the language elements found in the traditional hardware description languages. Circuit descriptions can be translated to synthesizable VHDL using the prototype CλaSH compiler. As the circuit descriptions, simulation code, and test input are also valid Haskell, complete simulations can be done by a Haskell compiler or interpreter, allowing high-speed simulation and analysis.

λ ash是一种函数式硬件描述语言，其语法和语义都借鉴了函数式编程语言Haskell。多态性和高阶函数提供了一定程度的抽象和通用性，使电路设计人员能够以比使用传统硬件描述语言中的语言元素更自然的方式描述电路。电路描述可以翻译成可合成的VHDL使用原型CλaSH编译器。由于电路描述、仿真代码和测试输入也是有效的Haskell，因此可以通过Haskell编译器或解释器完成完整的仿真，从而实现高速仿真和分析。

引用次数: 112

In-channel Flow Control Scheme for Network-on-Chip 片上网络的通道内流量控制方案

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.73

Vrishali Vijay Nimbalkar, Kuruvilla Varghese

Present day System-on-Chip utilizes Network-on-Chip for communication between the cores, which require proper flow control schemes for efficient utilization of network resources. We propose a flow control scheme that combines piggybacking and credit flit transmission with in-channel signaling to provide, router's input port's free buffer count information to the neighboring routers. Alternating bit protocol is used for transmitting and receiving data and credit flits. Our scheme does not use additional flit cycles or extra signaling lines overhead for credit flit transmission. We have used Noxim, a SystemC based simulator to evaluate the performance of our scheme. Compared to dedicated signaling flow control, in the proposed scheme, throughput remains the same, whereas, there is an increase in average delay by maximum of five flit cycles (13 percent) for transpose traffic and minimum of one flit cycle (0.3 percent) for hotspot traffic. Also, a router designed to implement our scheme requires 12.69 percent less signaling lines.

目前的片上系统采用片上网络实现核心之间的通信，需要适当的流量控制方案才能有效地利用网络资源。我们提出了一种将承载和信用流传输与通道内信令相结合的流量控制方案，以向相邻路由器提供路由器输入端口的空闲缓冲区计数信息。交换比特协议用于发送和接收数据和信用文件。我们的方案不使用额外的飞行周期或额外的信令线路开销进行信用飞行传输。我们使用基于SystemC的模拟器Noxim来评估我们的方案的性能。与专用信令流控制相比，在所提出的方案中，吞吐量保持不变，然而，转置流量的平均延迟增加了最多五个飞行周期(13%)，热点流量的平均延迟增加了最少一个飞行周期(0.3%)。此外，设计实现我们的方案的路由器需要减少12.69%的信令线路。

引用次数: 2

Creation of Partial FPGA Configurations at Run-Time 在运行时创建部分FPGA配置

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.14

M. Silva, J. Ferreira

This paper describes and evaluates a method to generate partial FPGA configurations at run-time. The proposed technique is aimed at adaptive embedded systems that employ run-time reconfiguration to achieve high flexibility and performance. The approach is based on the availability of a library of partial bit streams for a set of basic components. New partial configurations for circuits defined by net lists of basic components are created by merging together a default bit stream of the target area, the relocated configurations of the components, and the configurations of the switch matrices used for building the connections between the components. An implementation targeting the Virtex-II Pro platform FPGA is described. It runs on the embedded 300MHz Power PC CPU present in the FPGA. The proof-of-concept implementation was used to create partial configurations at run-time for 20 circuits with up to 21 components and 288 connections. The complete configuration creation process took between 7s and 97s.

本文描述并评估了一种在运行时生成部分FPGA配置的方法。该技术针对自适应嵌入式系统，采用运行时重构来实现高灵活性和高性能。该方法基于一组基本组件的部分比特流库的可用性。通过合并目标区域的默认比特流、组件的重新定位配置和用于在组件之间建立连接的开关矩阵的配置，可以创建由基本组件的网络列表定义的电路的新部分配置。描述了一种针对Virtex-II Pro平台FPGA的实现。它运行在FPGA中的嵌入式300MHz Power PC CPU上。概念验证实现用于在运行时为多达21个组件和288个连接的20个电路创建部分配置。完整的配置创建过程耗时7秒到97秒。

引用次数: 12

A Latency-Efficient Router Architecture for CMP Systems CMP系统中一种延迟效率高的路由器架构

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

Pub Date : 2010-09-01 DOI: 10.1109/DSD.2010.42

Antoni Roca, J. Flich, F. Silla, J. Duato

As technology advances, the number of cores in Chip Multi Processor systems (CMPs) and Multi Processor Systems-on-Chips (MPSoCs) keeps increasing. Current test chips and products reach tens of cores, and it is expected to reach hundreds of cores in the near future. Such complexity demands for an efficient network-on-chip (NoC). The common choice to build such networks is the 2D mesh topology (as it matches the regular tile-based design) and the Dimension-Order Routing (DOR) algorithm (because its simplicity). The network in such systems must provide sustained throughput and ultra low latencies. One of the key components in the network is the router, and thus, it plays a major role when designing for such performance levels. In this paper we propose a new pipelined router design focused in reducing the router latency. As a first step we identify the router components that take most of the critical path, and thus limit the router frequency. In particular, the arbiter is the one limiting the performance of the router. Based on this fact, we simplify the arbiter logic by using multiple smaller arbiters. The initial set of requests in the initial arbiter is then distributed over the smaller arbiters that operate in parallel. With this design procedure, and with a proper internal router organization, different router architectures are evolved. All of them enable the use of smaller arbiters in parallel by replicating ports and assuming the use of the DOR algorithm. The net result of such changes is a faster router. Preliminary results demonstrate a router latency reduction ranging from 10% to 21% with an increase of the router area. Network latency is reduced in a range from 11% to 15%.

随着技术的进步，芯片多处理器系统(cmp)和多处理器片上系统(mpsoc)的核心数量不断增加。目前的测试芯片和产品达到数十核，预计在不久的将来达到数百核。这种复杂性需要高效的片上网络(NoC)。构建此类网络的常见选择是2D网格拓扑(因为它与常规的基于瓷砖的设计相匹配)和维度顺序路由(DOR)算法(因为它简单)。这种系统中的网络必须提供持续的吞吐量和超低的延迟。路由器是网络中的关键组件之一，因此，在针对这种性能级别进行设计时，它起着主要作用。在本文中，我们提出了一种新的流水线路由器设计，重点是减少路由器的延迟。作为第一步，我们确定了采用大多数关键路径的路由器组件，从而限制了路由器的频率。特别地，仲裁者是限制路由器性能的仲裁者。基于这一事实，我们通过使用多个较小的仲裁器来简化仲裁器逻辑。然后，初始仲裁器中的初始请求集分布在并行操作的较小仲裁器上。通过这种设计过程，以及适当的内部路由器组织，可以进化出不同的路由器架构。它们都允许通过复制端口并假设使用DOR算法来并行地使用较小的仲裁器。这种改变的最终结果是一个更快的路由器。初步结果表明，随着路由器面积的增加，路由器延迟减少了10%到21%。网络延迟减少了11%到15%。

{"title":"A Latency-Efficient Router Architecture for CMP Systems","authors":"Antoni Roca, J. Flich, F. Silla, J. Duato","doi":"10.1109/DSD.2010.42","DOIUrl":"https://doi.org/10.1109/DSD.2010.42","url":null,"abstract":"As technology advances, the number of cores in Chip Multi Processor systems (CMPs) and Multi Processor Systems-on-Chips (MPSoCs) keeps increasing. Current test chips and products reach tens of cores, and it is expected to reach hundreds of cores in the near future. Such complexity demands for an efficient network-on-chip (NoC). The common choice to build such networks is the 2D mesh topology (as it matches the regular tile-based design) and the Dimension-Order Routing (DOR) algorithm (because its simplicity). The network in such systems must provide sustained throughput and ultra low latencies. One of the key components in the network is the router, and thus, it plays a major role when designing for such performance levels. In this paper we propose a new pipelined router design focused in reducing the router latency. As a first step we identify the router components that take most of the critical path, and thus limit the router frequency. In particular, the arbiter is the one limiting the performance of the router. Based on this fact, we simplify the arbiter logic by using multiple smaller arbiters. The initial set of requests in the initial arbiter is then distributed over the smaller arbiters that operate in parallel. With this design procedure, and with a proper internal router organization, different router architectures are evolved. All of them enable the use of smaller arbiters in parallel by replicating ports and assuming the use of the DOR algorithm. The net result of such changes is a faster router. Preliminary results demonstrate a router latency reduction ranging from 10% to 21% with an increase of the router area. Network latency is reduced in a range from 11% to 15%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124384374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀