13th Symposium on High Performance Interconnects (HOTI'05)最新文献

英文中文

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.34

J. Turner

The Open Network Laboratory is a resource designed to enable experimental evaluation of advanced networking concepts in a realistic operating environment. The laboratory is built around a set of open-source, extensible, high performance routers, which can be accessed by remote users through a remote laboratory interface (RLI). The RLI allows users to configure the testbed network, run applications and monitor those running applications using built-in data gathering mechanisms. Support for data visualization and real-time remote display is provided. The RLI also allows users to extend, modify or replace the software running in the routers' embedded processors and to similarly extend, modify or replace the routers' packet processing hardware, which is implemented largely using field programmable gate arrays. The routers included in the testbed are architecturally similar to high performance commercial routers, enabling researchers to evaluate their ideas in a much more realistic context than can be provided by PC-based routers. The Open Network Laboratory is designed to provide a setting in which systems researchers can evaluate and refine their ideas and then demonstrate them to those interested in moving their technology into new products and services. This tutorial will teach users how to use the ONL. It will include detailed presentations on the system architecture and principles of operation, as well as live demonstrations. We also plan to give participants an opportunity for hands-on experience with setting up and running experiments themselves.

开放网络实验室是一种资源，旨在实现在现实操作环境中对先进网络概念的实验评估。该实验室是围绕一组开源、可扩展的高性能路由器构建的，远程用户可以通过远程实验室接口(RLI)访问这些路由器。RLI允许用户配置测试平台网络，运行应用程序，并使用内置的数据收集机制监视那些正在运行的应用程序。支持数据可视化和实时远程显示。RLI还允许用户扩展、修改或替换在路由器的嵌入式处理器中运行的软件，并类似地扩展、修改或替换路由器的数据包处理硬件，这主要是使用现场可编程门阵列实现的。测试平台中包含的路由器在架构上与高性能商用路由器相似，使研究人员能够在比基于pc的路由器更现实的环境中评估他们的想法。开放网络实验室的目的是提供一个环境，让系统研究人员可以评估和改进他们的想法，然后向那些有兴趣将他们的技术应用到新产品和服务中的人展示。本教程将教用户如何使用ONL。它将包括系统架构和操作原理的详细介绍，以及现场演示。我们还计划让参与者有机会亲自动手建立和运行实验。

{"title":"Using the open network lab","authors":"J. Turner","doi":"10.1109/CONECT.2005.34","DOIUrl":"https://doi.org/10.1109/CONECT.2005.34","url":null,"abstract":"The Open Network Laboratory is a resource designed to enable experimental evaluation of advanced networking concepts in a realistic operating environment. The laboratory is built around a set of open-source, extensible, high performance routers, which can be accessed by remote users through a remote laboratory interface (RLI). The RLI allows users to configure the testbed network, run applications and monitor those running applications using built-in data gathering mechanisms. Support for data visualization and real-time remote display is provided. The RLI also allows users to extend, modify or replace the software running in the routers' embedded processors and to similarly extend, modify or replace the routers' packet processing hardware, which is implemented largely using field programmable gate arrays. The routers included in the testbed are architecturally similar to high performance commercial routers, enabling researchers to evaluate their ideas in a much more realistic context than can be provided by PC-based routers. The Open Network Laboratory is designed to provide a setting in which systems researchers can evaluate and refine their ideas and then demonstrate them to those interested in moving their technology into new products and services. This tutorial will teach users how to use the ONL. It will include detailed presentations on the system architecture and principles of operation, as well as live demonstrations. We also plan to give participants an opportunity for hands-on experience with setting up and running experiments themselves.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127965528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Breaking the connection: RDMA deconstructed 断开连接:解构RDMA

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.9

Rajeev Sivaram, R. Govindaraju, P. Hochschild, Robert Blackmore, Piyush Chaudhary

The architecture, design and performance of RDMA (remote direct memory access) over the IBM HPS (high performance switch and adapter) are described. Unlike conventional implementations such as InfiniBand, our RDMA transport model is layered on top of an unreliable datagram interface, while leaving the task of enforcing reliability to the ULP (upper layer protocol). We demonstrate that our model allows a single MPI task to deliver bidirectional bandwidth of close to 3.0 GB/s across a single link and 24.0 GB/s when striped across 8 links. In addition, we show that this transport protocol has superior attributes in terms of a) being able to handle RDMA packets coming out of order; b) being able to use multiple routes between a source-destination pair and c) reducing the size of adapter caches.

描述了基于IBM HPS(高性能交换机和适配器)的RDMA(远程直接内存访问)的体系结构、设计和性能。与InfiniBand等传统实现不同，我们的RDMA传输模型是在不可靠的数据报接口之上分层的，而将增强可靠性的任务留给了ULP(上层协议)。我们证明，我们的模型允许单个MPI任务在单个链路上提供接近3.0 GB/s的双向带宽，在跨8条链路时提供24.0 GB/s的双向带宽。此外，我们还表明，该传输协议在以下方面具有优越的属性:a)能够处理乱序的RDMA数据包;B)能够在源-目的地对之间使用多条路由，c)减少适配器缓存的大小。

引用次数: 12

High-speed and low-power network search engine using adaptive block-selection scheme 高速低功耗网络搜索引擎采用自适应块选择方案

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.20

M. Akhbarizadeh, M. Nourani, R. Panigrahy, Samar Sharma

A new approach for using block-selection scheme to increase the search throughput of multi-block TCAM-based network search engines is proposed. While the existing methods try to counter and forcibly balance the inherent bias of the Internet traffic, our method takes advantage of it. Our method improves flexibility of table management and gains scalability towards high rates of change in traffic bias. It offers higher throughput than the current art and a very low average power consumption. One of the embodiments of the proposed model, using four TCAM chips, can deliver over six times the throughput of a conventional configuration of the same TCAM chips.

提出了一种利用块选择方案提高多块tcam网络搜索引擎搜索吞吐量的新方法。当现有的方法试图对抗和强制平衡互联网流量的固有偏见时，我们的方法利用了它。我们的方法提高了表管理的灵活性，并在流量偏差的高变化率方面获得了可伸缩性。它提供了比当前艺术更高的吞吐量和非常低的平均功耗。所提出的模型的实施例之一，使用四个TCAM芯片，可以提供超过六倍的吞吐量的传统配置的相同的TCAM芯片。

引用次数: 6

Design of randomized multichannel packet storage for high performance routers 高性能路由器随机多通道分组存储设计

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.17

S. Sushanth Kumar, P. Crowley, J. Turner

High performance routers require substantial amounts of memory to store packets awaiting transmission, requiring the use of dedicated memory devices with the density and capacity to provide the required storage economically. The memory bandwidth required for packet storage subsystems often exceeds the bandwidth of individual memory devices, making it necessary to implement packet storage using multiple memory channels. This raises the question of how to design multichannel storage systems that make effective use of the available memory and memory bandwidth, while forwarding packets at link rate in the presence of arbitrary packet retrieval patterns. A recent series of papers has demonstrated an architecture that uses on-chip SRAM to buffer packets going to/from a multichannel storage system, while maintaining high performance in the presence worst-case traffic patterns. Unfortunately, the amount of on-chip storage required grows as the product of the number of channels and the number of separate queues served by the packet storage system. This makes it too expensive to use in systems with large numbers of queues. We show how to design a practical randomized packet storage system that can sustain high performance using an amount of on-chip storage that is independent of the number of queues.

高性能路由器需要大量的内存来存储等待传输的数据包，需要使用具有密度和容量的专用内存设备来经济地提供所需的存储。分组存储子系统所需的内存带宽通常超过单个内存设备的带宽，因此有必要使用多个内存通道实现分组存储。这就提出了一个问题，即如何设计多通道存储系统，使其有效地利用可用内存和内存带宽，同时在存在任意数据包检索模式的情况下以链路速率转发数据包。最近的一系列论文展示了一种架构，该架构使用片上SRAM来缓冲进出多通道存储系统的数据包，同时在最坏的流量模式下保持高性能。不幸的是，所需的片上存储量随着通道数量和包存储系统所服务的单独队列数量的乘积而增长。这使得在具有大量队列的系统中使用它的成本太高。我们展示了如何设计一个实用的随机分组存储系统，该系统可以使用独立于队列数量的片上存储来维持高性能。

{"title":"Design of randomized multichannel packet storage for high performance routers","authors":"S. Sushanth Kumar, P. Crowley, J. Turner","doi":"10.1109/CONECT.2005.17","DOIUrl":"https://doi.org/10.1109/CONECT.2005.17","url":null,"abstract":"High performance routers require substantial amounts of memory to store packets awaiting transmission, requiring the use of dedicated memory devices with the density and capacity to provide the required storage economically. The memory bandwidth required for packet storage subsystems often exceeds the bandwidth of individual memory devices, making it necessary to implement packet storage using multiple memory channels. This raises the question of how to design multichannel storage systems that make effective use of the available memory and memory bandwidth, while forwarding packets at link rate in the presence of arbitrary packet retrieval patterns. A recent series of papers has demonstrated an architecture that uses on-chip SRAM to buffer packets going to/from a multichannel storage system, while maintaining high performance in the presence worst-case traffic patterns. Unfortunately, the amount of on-chip storage required grows as the product of the number of channels and the number of separate queues served by the packet storage system. This makes it too expensive to use in systems with large numbers of queues. We show how to design a practical randomized packet storage system that can sustain high performance using an amount of on-chip storage that is independent of the number of queues.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115857829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

A scalable, self-routed, terabit capacity, photonic interconnection network 一个可扩展的，自路由的，太比特容量的，光子互连网络

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.6

A. Shacham, Benjamin G. Lee, K. Bergman

We present SPINet (Scalable Photonic Integrated Network), an optical switching architecture particularly designed for photonic integration. The performance of SPlNet-based networks is investigated through simulations, and it is shown that SPINet can provide the bandwidth demanded by high performance computing systems while meeting the ultra-low latency and scalability requirements. Experiments are conducted on a model SOA-based switching node to verify the feasibility of the SPINet concepts, and demonstrate error-free routing of 160 Gb/s peak bandwidth payload.

我们提出了SPINet(可扩展光子集成网络)，一种专为光子集成而设计的光交换架构。通过仿真研究了基于splinet的网络的性能，结果表明，SPINet在满足高性能计算系统的超低延迟和可扩展性要求的同时，能够提供高性能计算系统所需的带宽。在基于soa模型的交换节点上进行了实验，验证了SPINet概念的可行性，并演示了160 Gb/s峰值带宽负载的无差错路由。

引用次数: 19

Reconfigurable networking hardware: a classroom tool 可重构网络硬件:课堂工具

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.32

M. Casado, G. Watson, N. McKeown

We present an educational platform for teaching the design, debugging and deployment of real networking equipment in the operational Internet. The emphasis of our work is on teaching and, therefore, on providing an environment that is flexible, robust, low cost and easy to use. The platform is built around 'NetFPGAs'-custom boards containing eight Ethernet ports and two FPGAs. NetFPGA boards, when used with VNS (Virtual Network System-another tool we have developed), can be integrated into dynamically configurable network topologies reachable from the Internet. VNS enables a user-space process running on any remote computer to function as a system controller for the NetFPGA boards. NetFPGA and VNS are used at Stanford in a graduate level networking course to teach router implementation in hardware and software.

我们提出了一个教学平台，用于教学实际网络设备的设计、调试和部署。我们的工作重点是教学，因此，提供一个灵活，强大，低成本和易于使用的环境。该平台是围绕“netfpga”构建的-包含八个以太网端口和两个fpga的定制板。当NetFPGA板与VNS(虚拟网络系统-另一个我们开发的工具)一起使用时，可以集成到从Internet可访问的动态可配置网络拓扑中。VNS允许在任何远程计算机上运行的用户空间进程作为NetFPGA板的系统控制器。在斯坦福大学的研究生级网络课程中，使用NetFPGA和VNS来教授路由器的硬件和软件实现。

引用次数: 13

A scalable switch for service guarantees 可扩展的交换机，提供服务保证

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.5

Bill Lin, I. Keslassy

Operators need routers to provide service guarantees such as guaranteed flow rates and fairness among flows, so as to support real-time traffic and traffic engineering. However, current centralized input-queued router architectures cannot scale to fast line rates while providing these service guarantees. On the other hand, while load-balanced switch architectures that rely on two identical stages of fixed configuration switches appear to be an effective way to scale Internet routers to very high capacities, there is currently no practical and scalable solution for providing service guarantees in these architectures. In this paper, we introduce the interleaved matching switch (IMS) architecture, which relies on a novel approach to provide service guarantees using load-balanced switches. The approach is based on emulating a Birkhoff-von Neumann switch with a load-balanced switch architecture and is applicable to any admissible traffic. In cases where: fixed frame sizes are applicable, we also present an efficient frame-based decomposition method. More generally, we show that the IMS architecture can be used to emulate any input queued or combined input-output queued switch.

运营商需要路由器提供保证流量、流间公平性等业务保障，以支持实时流量和流量工程。然而，当前的集中式输入排队路由器架构在提供这些服务保证的同时无法扩展到快速的线路速率。另一方面，虽然依赖于两个相同阶段的固定配置交换机的负载均衡交换机体系结构似乎是将Internet路由器扩展到非常高容量的有效方法，但目前还没有实用的可扩展解决方案来在这些体系结构中提供服务保证。在本文中，我们介绍了交错匹配交换机(IMS)体系结构，它依赖于一种使用负载均衡交换机提供服务保证的新方法。该方法基于负载均衡交换机架构的伯克霍夫-冯·诺伊曼交换机的仿真，适用于任何允许的流量。在适用固定帧大小的情况下，我们还提出了一种有效的基于帧的分解方法。更一般地说，我们展示了IMS体系结构可用于模拟任何输入排队或组合输入-输出排队开关。

引用次数: 10

Performance characterization of a 10-Gigabit Ethernet TOE 10千兆以太网TOE的性能表征

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.30

Wu-chun Feng, P. Balaji, C. Baron, L. Bhuyan, D. Panda

Though traditional Ethernet based network architectures such as Gigabit Ethernet have suffered from a huge performance difference as compared to other high performance networks (e.g, InfiniBand, Quadrics, Myrinet), Ethernet has continued to be the most widely used network architecture today. This trend is mainly attributed to the low cost of the network components and their backward compatibility with the existing Ethernet infrastructure. With the advent of 10-Gigabit Ethernet and TCP offload engines (TOEs), whether this performance gap be bridged is an open question. In this paper, we present a detailed performance evaluation of the Chelsio T110 10-Gigabit Ethernet adapter with TOE. We have done performance evaluations in three broad categories: (i) detailed micro-benchmark performance evaluation at the sockets layer, (ii) performance evaluation of the message passing interface (MPI) stack atop the sockets interface, and (iii) application-level evaluations using the Apache Web server. Our experimental results demonstrate latency as low as 8.9 /spl mu/s and throughput of nearly 7.6 Gbps for these adapters. Further, we see an order-of-magnitude improvement in the performance of the Apache Web server while utilizing the TOE as compared to the basic 10-Gigabit Ethernet adapter without TOE.

尽管传统的基于以太网的网络架构，如千兆以太网，与其他高性能网络(如InfiniBand, Quadrics, Myrinet)相比，存在巨大的性能差异，但以太网仍然是当今使用最广泛的网络架构。这种趋势主要归因于网络组件的低成本以及它们与现有以太网基础设施的向后兼容性。随着10千兆以太网和TCP卸载引擎(toe)的出现，这种性能差距能否被弥合是一个悬而未决的问题。在本文中，我们提出了详细的性能评估切尔西T110 10千兆以太网适配器与TOE。我们在三大类中进行了性能评估:(i)套接字层的详细微基准性能评估，(ii)套接字接口之上的消息传递接口(MPI)堆栈的性能评估，以及(iii)使用Apache Web服务器的应用程序级评估。我们的实验结果表明，这些适配器的延迟低至8.9 /spl mu/s，吞吐量近7.6 Gbps。此外，我们看到，与没有TOE的基本10千兆以太网适配器相比，使用TOE的Apache Web服务器的性能有了数量级的提高。

{"title":"Performance characterization of a 10-Gigabit Ethernet TOE","authors":"Wu-chun Feng, P. Balaji, C. Baron, L. Bhuyan, D. Panda","doi":"10.1109/CONECT.2005.30","DOIUrl":"https://doi.org/10.1109/CONECT.2005.30","url":null,"abstract":"Though traditional Ethernet based network architectures such as Gigabit Ethernet have suffered from a huge performance difference as compared to other high performance networks (e.g, InfiniBand, Quadrics, Myrinet), Ethernet has continued to be the most widely used network architecture today. This trend is mainly attributed to the low cost of the network components and their backward compatibility with the existing Ethernet infrastructure. With the advent of 10-Gigabit Ethernet and TCP offload engines (TOEs), whether this performance gap be bridged is an open question. In this paper, we present a detailed performance evaluation of the Chelsio T110 10-Gigabit Ethernet adapter with TOE. We have done performance evaluations in three broad categories: (i) detailed micro-benchmark performance evaluation at the sockets layer, (ii) performance evaluation of the message passing interface (MPI) stack atop the sockets interface, and (iii) application-level evaluations using the Apache Web server. Our experimental results demonstrate latency as low as 8.9 /spl mu/s and throughput of nearly 7.6 Gbps for these adapters. Further, we see an order-of-magnitude improvement in the performance of the Apache Web server while utilizing the TOE as compared to the basic 10-Gigabit Ethernet adapter without TOE.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134359794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

Hybrid cache architecture for high speed packet processing 用于高速分组处理的混合缓存体系结构

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1049/iet-cdt:20060085

Z. Liu, K. Zheng, B. Liu

The exposed memory hierarchies employed in many network processors (NPs) are expensive and hard to be effectively utilized. On the other hand, conventional cache cannot be directly incorporated into NP either because of its low efficiency in locality exploitation for network applications. In this paper, a novel memory hierarchy component, called split control cache, is presented. The proposed scheme employs two independent low latency memory stores to temporarily hold the flow-based and application-relevant information, exploiting the different locality behaviors exhibited by these two types of data. Data movement is manipulated by specially designed hardware to relieve the programmers from details of memory management. Performance evaluation shows that this component can achieve a hit rate of over 90% with only 16 KB of memories in route lookup under link rate of OC-3c and provide enough flexibility for the implementation of most network applications.

许多网络处理器(NPs)中使用的公开内存层次结构非常昂贵，而且难以有效利用。另一方面，由于传统缓存在网络应用的局部利用效率较低，也不能直接集成到NP中。本文提出了一种新的内存层次结构构件——分割控制缓存。该方案采用两个独立的低延迟存储器来临时保存基于流和应用相关的信息，利用这两种类型的数据所表现出的不同局部性行为。数据移动由专门设计的硬件操作，从而使程序员从内存管理的细节中解脱出来。性能评估表明，在OC-3c的链路速率下，该组件在路由查找中仅使用16 KB内存即可实现90%以上的命中率，并为大多数网络应用的实现提供了足够的灵活性。

引用次数: 7

Can memory-less network adapters benefit next-generation infiniband systems? 无内存网络适配器能使下一代无限带宽系统受益吗?

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.10

S. Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang, D. Panda

InfiniBand is emerging as a high-performance interconnect. It is gaining popularity because of its high performance and open standard. Recently, PCI-Express, which is the third generation high-performance I/O bus used to interconnect peripheral devices, has been released. The third generation of InfiniBand adapters allow applications to take advantage of PCI-Express. PCI-Express offers very low latency access of the host memory by network interface cards (NICs). Earlier generation InfiniBand adapters used to have an external DIMM attached as local NIC memory. This memory was used to store internal information. This memory increases the overall cost of the NIC. In this paper we design experiments, analyze the performance of various communication patterns and end applications on PCI-Express based systems, whose adapters can be chosen to run with or without local NIC memory. Our investigations reveal that on these systems, the memory fetch latency is the same for both local NIC memory and host memory. Under heavy I/O bus usage, the latency of a scatter operation increased only by 10% and only for message sizes IB -4 KB. These memory-less adapters allow more efficient use of overall system memory and show practically no performance impact (less than 0.1%) for the NAS parallel benchmarks on 8 processes. These results indicate that memory-less network adapters can benefit next generation InfiniBand systems.

InfiniBand正在成为一种高性能互连技术。由于它的高性能和开放标准，它越来越受欢迎。最近，用于连接外设的第三代高性能I/O总线PCI-Express发布了。第三代InfiniBand适配器允许应用程序利用PCI-Express。PCI-Express通过网络接口卡(nic)提供非常低延迟的主机内存访问。早期InfiniBand适配器使用外接DIMM作为本地网卡内存。这个存储器是用来存储内部信息的。这些内存增加了网卡的总体成本。在本文中，我们设计了实验，分析了基于PCI-Express系统的各种通信模式和终端应用程序的性能，这些系统的适配器可以选择在有或没有本地NIC内存的情况下运行。我们的调查表明，在这些系统上，内存获取延迟对于本地NIC内存和主机内存是相同的。在大量使用I/O总线的情况下，分散操作的延迟仅增加了10%，并且仅适用于消息大小为IB -4 KB的情况。这些无内存适配器允许更有效地使用整个系统内存，并且对于8个进程的NAS并行基准测试几乎没有性能影响(小于0.1%)。这些结果表明，无内存网络适配器可以使下一代InfiniBand系统受益。

{"title":"Can memory-less network adapters benefit next-generation infiniband systems?","authors":"S. Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang, D. Panda","doi":"10.1109/CONECT.2005.10","DOIUrl":"https://doi.org/10.1109/CONECT.2005.10","url":null,"abstract":"InfiniBand is emerging as a high-performance interconnect. It is gaining popularity because of its high performance and open standard. Recently, PCI-Express, which is the third generation high-performance I/O bus used to interconnect peripheral devices, has been released. The third generation of InfiniBand adapters allow applications to take advantage of PCI-Express. PCI-Express offers very low latency access of the host memory by network interface cards (NICs). Earlier generation InfiniBand adapters used to have an external DIMM attached as local NIC memory. This memory was used to store internal information. This memory increases the overall cost of the NIC. In this paper we design experiments, analyze the performance of various communication patterns and end applications on PCI-Express based systems, whose adapters can be chosen to run with or without local NIC memory. Our investigations reveal that on these systems, the memory fetch latency is the same for both local NIC memory and host memory. Under heavy I/O bus usage, the latency of a scatter operation increased only by 10% and only for message sizes IB -4 KB. These memory-less adapters allow more efficient use of overall system memory and show practically no performance impact (less than 0.1%) for the NAS parallel benchmarks on 8 processes. These results indicate that memory-less network adapters can benefit next generation InfiniBand systems.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

13th Symposium on High Performance Interconnects (HOTI'05)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀