13th Symposium on High Performance Interconnects (HOTI'05)最新文献

英文中文

Long round-trip time support with shared-memory crosspoint buffered packet switch 长往返时间支持与共享内存交叉点缓冲数据包交换

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.26

Z. Dong, R. Rojas-Cessa

The amount of memory in buffered crossbars in combined input-crosspoint buffered switches is proportional to the number of crosspoints, or O(N/sup 2/), where N is the number of ports, and to the crosspoint buffer size, which is defined by the distance between the line cards and the buffered crossbar, to achieve 100% throughput under port-rate data flows. A long distance between these two components can make a buffered crossbar costly to implement. In this paper, we propose and examine two shared-memory crosspoint buffered packet switches that use small crosspoint buffers to support a long round-trip time, which is mainly affected by the transmission delay caused by the distance between line cards and the buffered crossbar. The proposed switch reduces the required buffer memory of the buffered crossbar by 50% or more. We show that a shared-memory crosspoint buffer switch can provide high this improvement without speedup.

在组合输入交叉点缓冲开关中，缓冲横条中的内存量与交叉点的数量或O(N/sup 2/)成正比，其中N为端口数，并与交叉点缓冲区大小成正比，该大小由线卡和缓冲横条之间的距离定义，在端口速率数据流下实现100%的吞吐量。这两个组件之间的距离很长，可能会使缓冲横杆的实现成本很高。在本文中，我们提出并研究了两种共享内存交叉点缓冲分组交换机，它们使用小的交叉点缓冲区来支持较长的往返时间，这主要是由线卡和缓冲交叉条之间的距离引起的传输延迟造成的。提议的开关将缓冲交叉条所需的缓冲内存减少了50%或更多。我们证明了共享内存交叉点缓冲开关可以在不加速的情况下提供很高的改进。

引用次数: 12

Zero copy sockets direct protocol over infiniband-preliminary implementation and performance analysis 零复制套接字直接协议在无限带宽-初步实现和性能分析

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.35

Dror Goldenberg, Michael Kagan, Ran Ravid, Michael S. Tsirkin

Sockets direct protocol (SDP) is a byte-stream transport protocol implementing the TCP SOCK/spl I.bar/STREAM semantics utilizing transport offloading capabilities of the infiniband fabric: Under the hood, SDP supports zero-copy (ZCopy) operation mode, using the infiniband RDMA capability to transfer data directly between application buffers. Alternatively, in buffer copy (BCopy) mode, data is copied to and from transport buffers. In the initial open-source SDP implementation, ZCopy mode was restricted to asynchronous I/O operations. We added a prototype ZCopy support for send()/recv() synchronous socket calls. This paper presents the major architectural aspects of the SDP protocol, the ZCopy implementation, and a preliminary performance evaluation. We show substantial benefits of ZCopy when multiple connections are running in parallel on the same host. For example, when 8 connections are simultaneously active, enabling ZCopy yields a bandwidth growth from 500 MB/s to 700 MB/s, while CPU utilization decreases 8 times.

套接字直接协议(SDP)是一种字节流传输协议，利用infiniband结构的传输卸载功能实现TCP SOCK/spl I.bar/STREAM语义:在底层，SDP支持零拷贝(ZCopy)操作模式，使用infiniband RDMA功能在应用程序缓冲区之间直接传输数据。另外，在缓冲区复制(BCopy)模式中，数据被复制到传输缓冲区或从传输缓冲区复制。在最初的开源SDP实现中，ZCopy模式仅限于异步I/O操作。我们为send()/recv()同步套接字调用添加了一个原型ZCopy支持。本文介绍了SDP协议的主要架构方面，ZCopy的实现，以及初步的性能评估。当多个连接在同一台主机上并行运行时，我们展示了ZCopy的巨大优势。例如，当8个连接同时处于活动状态时，启用ZCopy将使带宽从500mb /s增加到700mb /s，而CPU利用率降低了8倍。

引用次数: 34

Congestion control in InfiniBand networks ib网络的拥塞控制

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.14

M. Gusat, D. Craddock, W. Denzel, Antonius P. J. Engbersen, N. Ni, G. Pfister, W. Rooney, J. Duato

Driving computer interconnection networks closer to saturation minimizes cost/performance and power consumption, but requires efficient congestion control to prevent catastrophic performance degradation during traffic peaks or "hot spot" traffic patterns. The InfiniBand/spl trade/Architecture provides such congestion control, but lacks guidance for setting its parameters. At its adoption, it was unproven that there were any settings that would work at all, avoid instability or oscillations. This paper reports on a simulation-driven exploration of that parameter space which verifies that the architected scheme can, in fact, work properly despite inherent delays in its feedback mechanism.

推动计算机互连网络接近饱和，可以最大限度地降低成本/性能和功耗，但需要有效的拥塞控制，以防止在流量高峰或“热点”流量模式期间灾难性的性能下降。InfiniBand/spl贸易/架构提供了这种拥塞控制，但缺乏设置参数的指导。在它被采用时，还没有证明有任何设置可以工作，避免不稳定或振荡。本文报告了对该参数空间的仿真驱动探索，验证了该体系结构方案实际上可以正常工作，尽管其反馈机制存在固有延迟。

引用次数: 32

Initial performance evaluation of the Cray SeaStar interconnect 克雷海星互连的初步性能评估

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.24

R. Brightwell, K. Pedretti, K. Underwood

The Cray SeaStar is a new network interface and router for the Cray Red Storm and XT3 supercomputer. The SeaStar was designed specifically to meet the performance and reliability needs of a large-scale, distributed-memory scientific computing platform. In this paper, we present an initial performance evaluation of the SeaStar. We first provide a detailed overview of the hardware and software features of the SeaStar, followed by the results of several low-level micro-benchmarks. These initial results indicate that SeaStar is on a path to achieving its performance targets.

克雷海星是克雷红风暴和XT3超级计算机的新网络接口和路由器。SeaStar是专门为满足大规模分布式内存科学计算平台的性能和可靠性需求而设计的。在本文中，我们提出了海洋之星的初步性能评估。我们首先提供SeaStar硬件和软件特性的详细概述，然后是几个低级微基准测试的结果。这些初步结果表明，SeaStar正在实现其性能目标。

引用次数: 50

Control path implementation for a low-latency optical HPC switch 一个低延迟光HPC交换机的控制路径实现

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.15

C. Minkenberg, F. Abel, Peter Müller, R. Krishnamurthy, M. Gusat, B. Hemenway

A crucial part of any high-performance computing system is its interconnection network. In the OSMOSIS project, Corning and IBM are jointly developing a demonstrator interconnect based on optical cell switching with electronic control. Starting from the core set of requirements, we present the system design rationale and show how it impacts the practical implementation. Our focus is on solving the technical issues related to the electronic control path, and we show that it is feasible at the targeted design point.

任何高性能计算系统的关键部分都是其互连网络。在OSMOSIS项目中，康宁和IBM正在联合开发一种基于光电开关和电子控制的互连示范装置。从需求的核心集开始，我们展示了系统设计的基本原理，并展示了它是如何影响实际实现的。我们的重点是解决与电子控制路径相关的技术问题，我们表明它在目标设计点上是可行的。

引用次数: 34

SIFT: snort intrusion filter for TCP SIFT: TCP的snort入侵过滤器

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.33

Michael Attig, J. Lockwood

Intrusion rule processing in reconfigurable hardware enables intrusion detection and prevention services to run at multiGigabit/second rates. High-level intrusion rules mapped directly into hardware separate malicious content from benign content in network traffic. Hardware parallelism allows intrusion systems to scale to support fast network links, such as OC-192 and 10 Gbps Ethernet. In this paper, a snort intrusion filter for TCP (SIFT) is presented that operates as a preprocessor to prevent benign traffic from being inspected by an intrusion monitor running Snort. Snort is a popular open-source rule-processing intrusion system. SIFT selectively forwards IP packets that contain questionable headers or defined signatures to a PC where complete rule processing is performed. SIFT alleviates the need for most network traffic from being inspected by software. Statistics, like how many packets match rules, are used to optimize rule processing systems. SIFT has been implemented and tested in FPGA hardware and used to process Internet traffic from a campus Internet backbone with live data.

可重构硬件中的入侵规则处理使入侵检测和防御服务能够以千兆位/秒的速率运行。直接映射到硬件的高级入侵规则将网络流量中的恶意内容与良性内容分开。硬件并行性允许入侵系统扩展以支持快速网络链路，例如OC-192和10gbps以太网。本文提出了一种用于TCP的snort入侵过滤器(SIFT)，它作为预处理器来防止良性流量被运行snort的入侵监视器检查。Snort是一种流行的开源规则处理入侵系统。SIFT有选择地将包含可疑头或定义签名的IP数据包转发到PC机，在PC机中执行完整的规则处理。SIFT减轻了大多数网络流量被软件检测的需要。统计数据(如有多少数据包匹配规则)用于优化规则处理系统。SIFT已在FPGA硬件上实现和测试，并用于处理来自校园互联网骨干网的实时数据流量。

引用次数: 51

Addressing queuing bottlenecks at high speeds 解决高速排队瓶颈问题

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.7

S. Sushanth Kumar, J. Turner, P. Crowley

Modern routers and switch fabrics can have hundreds of input and output ports running at up to 10 Gb/s; 40 Gb/s systems are starting to appear. At these rates, the performance of the buffering and queuing subsystem becomes a significant bottleneck. In high performance routers with more than a few queues, packet buffering is typically implemented using DRAM for data storage and a combination of off-chip and on-chip SRAM for storing the linked-list nodes and packet length, and the queue headers, respectively. This paper focuses on the performance bottlenecks associated with the use of off-chip SRAM. We show how the combination of implicit buffer pointers and multi-buffer list nodes can dramatically reduce the impact of buffering and queuing subsystem on queuing performance. We also show how combining it with coarse-grained scheduling can improve the performance of fair queuing algorithms, while also reducing the amount of off-chip memory and bandwidth needed. These techniques can reduce the amount of SRAM needed to hold the list nodes by a factor of 10 at the cost of about 10% wastage of the DRAM space, assuming an aggregation degree of 16.

现代路由器和交换机结构可以有数百个输入和输出端口，运行速度高达10gb /s;40 Gb/s的系统开始出现。在这样的速率下，缓冲和排队子系统的性能将成为一个重要的瓶颈。在具有多个队列的高性能路由器中，数据包缓冲通常使用DRAM来实现数据存储，并使用片外和片内SRAM的组合来分别存储链表节点和数据包长度以及队列头。本文的重点是与使用片外SRAM相关的性能瓶颈。我们展示了隐式缓冲区指针和多缓冲区列表节点的组合如何显著降低缓冲和排队子系统对排队性能的影响。我们还展示了将它与粗粒度调度相结合如何提高公平排队算法的性能，同时还减少了所需的片外内存和带宽。假设聚合度为16，这些技术可以将保存列表节点所需的SRAM数量减少10倍，代价是DRAM空间浪费约10%。

引用次数: 7

Quality of service in global grid computing 全局网格计算中的服务质量

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.31

L. Valcarenghi

This tutorial tries to address some of the issue related to the keywords present in Foster's grid computing definition. Specifically it tackles the problem of providing global grid computing applications with a network infrastructure able to guarantee quality of service. After reviewing the basics of grid computing, this tutorial focuses on specific network infrastructure issues. Quality of service (QoS) parameters such as throughput, delay, and resilience are considered. It is shown that how the integration of the grid programming environment with an intelligent grid network infrastructure allows to dynamically adapt the utilized computational and network resources to meet the application QoS requirements transparently to the user. Finally the performance evaluation of a specific implementation of an integrated application and network layer resilience scheme is presented.

本教程试图解决与Foster的网格计算定义中出现的关键字相关的一些问题。具体来说，它解决了为全球网格计算应用程序提供能够保证服务质量的网络基础设施的问题。在回顾了网格计算的基础知识之后，本教程将重点关注特定的网络基础设施问题。考虑了服务质量(QoS)参数，如吞吐量、延迟和弹性。展示了网格编程环境与智能网格网络基础设施的集成如何允许动态调整所利用的计算和网络资源，以透明地满足用户的应用QoS需求。最后给出了一个具体实现的应用和网络层弹性集成方案的性能评价。

引用次数: 2

Centralized and distributed topology discovery service implementations 集中式和分布式拓扑发现服务实现

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.11

L. Valcarenghi, F. Paolucci, L. Foschini, F. Cugini, P. Castoldi

In global grid computing, i.e., wide area network (WAN) grid computing. Grid network services allow grid users or the programming environment to monitor the status of network resources and to reallocate them. Specifically, the network information and monitoring service (NIMS) provides up to date information on the grid network status. In this study two implementations of a specific NIMS component, i.e., the topology discovery service (TDS), are presented. The first implementation features a centralized broker that produces information for the consumers/users. In the second one, users are contemporarily producers and consumers of the required information. Both implementations are applicable to networks based on commercial routers without requiring any router protocol modification.

在全局网格计算中，即广域网(WAN)网格计算。网格网络服务允许网格用户或编程环境监视网络资源的状态并重新分配它们。具体来说，网络信息和监控服务(NIMS)提供了有关电网网络状态的最新信息。在本研究中，提出了特定NIMS组件的两种实现，即拓扑发现服务(TDS)。第一个实现的特点是为消费者/用户生成信息的集中式代理。在第二种情况下，用户同时是所需要信息的生产者和消费者。这两种实现都适用于基于商用路由器的网络，而不需要修改路由器协议。

引用次数: 4

Design and implementation of a content-aware switch using a network processor 使用网络处理器的内容感知交换机的设计和实现

13th Symposium on High Performance Interconnects (HOTI'05)

Pub Date : 2005-08-17 DOI: 10.1109/CONECT.2005.16

Li Zhao, Yan Luo, L. Bhuyan, R. Iyer

Cluster based server architectures have been widely used as a solution to overloading in Web servers because of their cost effectiveness, scalability and reliability. A content aware switch can be used to examine the Web requests and distribute them to the servers based on application level information. In this paper, we present the analysis, design and implementation of such a content aware switch based on an IXP2400 network processor (NP). We first analyze the mechanisms for implementing a content-aware switch and present the necessity for an NP-based solution. We then present various possibilities of workload allocation among different computation resources in an NP and discuss the design tradeoffs. Measurement results based on an IXP 2400 NP demonstrate that our NP-based switch can reduce the http processing latency by an average of 83.3% for a 1 K byte Web page, compared to a Linux-based switch. The amount of reduction increases with larger file sizes. It is also shown that the packet throughput can be improved by up to 5.7x across a range of files by taking advantage of multithreading and multiprocessing, available in the NP.

基于集群的服务器架构由于其成本效益、可伸缩性和可靠性而被广泛用作Web服务器过载的解决方案。可以使用内容感知交换机来检查Web请求，并根据应用程序级信息将它们分发到服务器。本文介绍了基于IXP2400网络处理器(NP)的内容感知交换机的分析、设计和实现。我们首先分析了实现内容感知交换机的机制，并提出了基于np的解决方案的必要性。然后，我们提出了在NP中不同计算资源之间分配工作负载的各种可能性，并讨论了设计权衡。基于IXP 2400 NP的测量结果表明，与基于linux的交换机相比，我们的基于NP的交换机可以将1 K字节Web页面的http处理延迟平均减少83.3%。减少的数量随着文件大小的增大而增加。它还表明，通过利用NP中可用的多线程和多处理，数据包吞吐量可以在一系列文件中提高5.7倍。

{"title":"Design and implementation of a content-aware switch using a network processor","authors":"Li Zhao, Yan Luo, L. Bhuyan, R. Iyer","doi":"10.1109/CONECT.2005.16","DOIUrl":"https://doi.org/10.1109/CONECT.2005.16","url":null,"abstract":"Cluster based server architectures have been widely used as a solution to overloading in Web servers because of their cost effectiveness, scalability and reliability. A content aware switch can be used to examine the Web requests and distribute them to the servers based on application level information. In this paper, we present the analysis, design and implementation of such a content aware switch based on an IXP2400 network processor (NP). We first analyze the mechanisms for implementing a content-aware switch and present the necessity for an NP-based solution. We then present various possibilities of workload allocation among different computation resources in an NP and discuss the design tradeoffs. Measurement results based on an IXP 2400 NP demonstrate that our NP-based switch can reduce the http processing latency by an average of 83.3% for a 1 K byte Web page, compared to a Linux-based switch. The amount of reduction increases with larger file sizes. It is also shown that the packet throughput can be improved by up to 5.7x across a range of files by taking advantage of multithreading and multiprocessing, available in the NP.","PeriodicalId":148282,"journal":{"name":"13th Symposium on High Performance Interconnects (HOTI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129864773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

13th Symposium on High Performance Interconnects (HOTI'05)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀