Network-aware Data Management最新文献

英文中文

Design and implementation of control sequence generator for SDN-enhanced MPI 为 SDN 增强型 MPI 设计和实施控制序列生成器

Network-aware Data Management

Pub Date : 2015-11-15 DOI: 10.1145/2832099.2832103

Baatarsuren Munkhdorj, Keichi Takahashi, Khureltulga Dashdavaa, Yasuhiro Watashiba, Y. Kido, S. Date, S. Shimojo

MPI (Message Passing Interface) offers a suite of APIs for inter-process communication among parallel processes. We have approached to the acceleration of MPI collective communication such as MPI_Bcast and MPI_Allreduce, taking advantage of network programmability brought by Software Defined Networking (SDN). The basic idea is to allow a SDN controller to dynamically control the packet flows generated by MPI collective communication based on the communication pattern and the underlying network conditions. Although our research have succeeded to accelerate an MPI collective communication in terms of execution time, the switching of network control functionality for MPI collective communication along MPI program execution have not been considered yet. This paper presents a mechanism that provides the control sequence for SDN controller to control packet flows based on the communication plan for the entire MPI application. The control sequence encloses a chronologically ordered list of the MPI collectives operated in the MPI application and the process-related information of each in the list. To verify if the SDN-enhanced MPI collectives can be used in combination with the proposed mechanism, the envisioned environment was prototyped. As a result, SDN-enhanced MPI collectives were able to be used in combination.

MPI（消息传递接口）提供了一套用于并行进程间通信的 API。我们利用软件定义网络（SDN）带来的网络可编程性，对 MPI 集体通信（如 MPI_Bcast 和 MPI_Allreduce）进行了加速。其基本思想是允许 SDN 控制器根据通信模式和底层网络条件动态控制 MPI 集体通信产生的数据包流。虽然我们的研究成功地加快了 MPI 集体通信的执行时间，但尚未考虑在 MPI 程序执行过程中切换 MPI 集体通信的网络控制功能。本文提出了一种机制，为 SDN 控制器提供控制序列，以便根据整个 MPI 应用程序的通信计划控制数据包流。控制序列包括一个按时间顺序排列的 MPI 应用程序中运行的 MPI 集合列表，以及列表中每个集合的进程相关信息。为了验证 SDN 增强的 MPI 集合能否与建议的机制结合使用，我们对设想的环境进行了原型验证。结果显示，SDN 增强型 MPI 集合可以结合使用。

{"title":"Design and implementation of control sequence generator for SDN-enhanced MPI","authors":"Baatarsuren Munkhdorj, Keichi Takahashi, Khureltulga Dashdavaa, Yasuhiro Watashiba, Y. Kido, S. Date, S. Shimojo","doi":"10.1145/2832099.2832103","DOIUrl":"https://doi.org/10.1145/2832099.2832103","url":null,"abstract":"MPI (Message Passing Interface) offers a suite of APIs for inter-process communication among parallel processes. We have approached to the acceleration of MPI collective communication such as MPI_Bcast and MPI_Allreduce, taking advantage of network programmability brought by Software Defined Networking (SDN). The basic idea is to allow a SDN controller to dynamically control the packet flows generated by MPI collective communication based on the communication pattern and the underlying network conditions. Although our research have succeeded to accelerate an MPI collective communication in terms of execution time, the switching of network control functionality for MPI collective communication along MPI program execution have not been considered yet. This paper presents a mechanism that provides the control sequence for SDN controller to control packet flows based on the communication plan for the entire MPI application. The control sequence encloses a chronologically ordered list of the MPI collectives operated in the MPI application and the process-related information of each in the list. To verify if the SDN-enhanced MPI collectives can be used in combination with the proposed mechanism, the envisioned environment was prototyped. As a result, SDN-enhanced MPI collectives were able to be used in combination.","PeriodicalId":108576,"journal":{"name":"Network-aware Data Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128519033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Hysteresis-based optimization of data transfer throughput 基于迟滞的数据传输吞吐量优化

Network-aware Data Management

Pub Date : 2015-11-15 DOI: 10.1145/2832099.2832104

M. S. Q. Z. Nine, Kemal Guner, T. Kosar

The achievable throughput for a data transfer can be determined by a variety of factors such as network bandwidth, round trip time, background traffic, dataset size, and end-system configuration. For the best-effort optimization of the transfer throughput, three application-layer transfer parameters -- pipelining, parallelism and concurrency -- have been actively used in the literature. However, it is highly challenging to identify the best combination of these parameter settings for a specific data transfer request. In this paper, we analyze historical data consisting of 70 Million file transfers; apply data mining techniques to extract the hidden relations among the parameters and the optimal throughput; and propose a novel approach based on hysteresis to predict the optimal parameter settings.

数据传输的可实现吞吐量可以由各种因素决定，例如网络带宽、往返时间、后台流量、数据集大小和终端系统配置。为了最大限度地优化传输吞吐量，文献中积极使用了三个应用层传输参数——流水线、并行和并发。但是，为特定的数据传输请求确定这些参数设置的最佳组合非常具有挑战性。在本文中，我们分析了由7000万文件传输组成的历史数据;应用数据挖掘技术提取参数之间的隐含关系和最优吞吐量;并提出了一种基于磁滞的最优参数预测方法。

引用次数: 14

Managing scientific data with named data networking 使用命名数据网络管理科学数据

Network-aware Data Management

Pub Date : 2015-11-15 DOI: 10.1145/2832099.2832100

Chengyu Fan, Susmit Shannigrahi, S. DiBenedetto, C. Olschanowsky, C. Papadopoulos, H. Newman

Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network. We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control.

许多科学领域，如气候科学和高能物理(High Energy Physics, HEP)，都有数据管理需求，而IP网络架构并不能很好地支持这些需求。命名数据网络(NDN)是一种新的网络体系结构，它的服务模型更符合面向数据的应用的需求。NDN提供了诸如最佳位置检索、缓存、负载共享和透明故障转移等特性，否则每个应用程序将在IP网络中使用点对点语义辛苦地(重新)实现这些特性。我们提出了第一个基于NDN设计和实现的科学数据管理应用。我们使用该应用程序在专用的高性能测试平台上管理气候和HEP数据。我们的应用程序有两个主要组件:用于数据集发现查询的UI和同步名称目录的联合。我们将展示如何使用NDN原语来实现常见的数据管理操作，如发布、搜索、高效检索和发布访问控制。

引用次数: 39

A multi-domain SDN for dynamic layer-2 path service 多域SDN，用于动态第二层路径服务

Network-aware Data Management

Pub Date : 2015-11-15 DOI: 10.1145/2832099.2832101

S. Tepsuporn, Fatma Alali, M. Veeraraghavan, Xiang Ji, Brian Cashman, Andrew J. Ragusa, Luke Fowler, C. Guok, T. Lehman, Xi Yang

This paper describes our experience in deploying a multidomain Software-Defined Network (SDN) that supports dynamic Layer-2 (L2) path service, and offers insights gained from this experience. SDN controllers, capable of handling requests for advance-reservation and provisioning of rate-guaranteed L2 paths, were deployed in each domain. The experience demonstrated that this architecture can support global-scale multi-domain dynamic L2 path service. However, to reach this scale, better tools are required for diagnostics of end-to-end L2 connectivity, and better error-reporting functionality is needed from the SDN controllers. As a use case for rate-guaranteed L2 path service, we experimented with high-speed large dataset transfers. We found that a combination of Circuit TCP (CTCP), in which the sending rate is held fixed, and a token bucket filter based rate shaper at the sending host, is best to achieve almost 0-loss, high-throughput transfers across L2 paths. Detailed studies were conducted to understand the impact of the rate-shaper and CTCP parameters to find the best settings.

本文描述了我们在部署支持动态第二层(L2)路径服务的多域软件定义网络(SDN)方面的经验，并提供了从这些经验中获得的见解。每个域中都部署了SDN控制器，能够处理提前预订和提供速率保证的L2路径的请求。实践证明，该体系结构能够支持全局尺度的多域动态L2路径服务。然而，要达到这种规模，需要更好的工具来诊断端到端L2连接，并且需要SDN控制器提供更好的错误报告功能。作为速率保证的L2路径服务的用例，我们试验了高速大型数据集传输。我们发现电路TCP (CTCP)的组合，其中发送速率保持固定，在发送主机上基于令牌桶过滤器的速率形状器，最好实现跨L2路径的几乎零损耗，高吞吐量传输。进行了详细的研究，以了解速率形状器和CTCP参数的影响，以找到最佳设置。

引用次数: 7

Approximate causal consistency for partially replicated geo-replicated cloud storage 部分复制的地理复制云存储的近似因果一致性

Network-aware Data Management

Pub Date : 2015-11-15 DOI: 10.1145/2832099.2832102

A. Kshemkalyani, T. Hsu

In geo-replicated systems and the cloud, data replication provides fault tolerance and low latency. Causal consistency in such systems is an interesting consistency model. Most existing works assume the data is fully replicated because this greatly simplifies the design of the algorithms to implement causal consistency. Recently, we proposed causal consistency under partial replication because it reduces the number of messages used under a wide range of workloads. One drawback of partial replication is that its meta-data tends to be relatively large when the message size is small. In this paper, we propose approximate causal consistency whereby we can reduce the meta-data at the cost of some violations of causal consistency. The amount of violations can be made arbitrarily small by controlling a tunable parameter, that we call credits.

在地理复制系统和云中，数据复制提供了容错性和低延迟。这种系统中的因果一致性是一种有趣的一致性模型。大多数现有的工作都假设数据是完全复制的，因为这大大简化了算法的设计，以实现因果一致性。最近，我们提出了部分复制下的因果一致性，因为它减少了在各种工作负载下使用的消息数量。部分复制的一个缺点是，当消息大小较小时，其元数据往往相对较大。在本文中，我们提出近似因果一致性，从而我们可以以违反因果一致性为代价来减少元数据。违规的数量可以通过控制一个可调参数来任意减少，我们称之为信用。

引用次数: 5

Network-aware virtual machine consolidation for large data centers 面向大型数据中心的网络感知虚拟机整合

Network-aware Data Management

Pub Date : 2013-11-17 DOI: 10.1145/2534695.2534702

Dharmesh Kakadia, N. Kopri, Vasudeva Varma

Resource management in modern data centers has become a challenging task due to the tremendous growth of data centers. In large virtual data centers, performance of applications is highly dependent on the communication bandwidth available among virtual machines. Traditional algorithms either do not consider network I/O details of the applications or are computationally intensive. We address the problem of identifying the virtual machine clusters based on the network traffic and placing them intelligently in order to improve the application performance and optimize the network usage in large data center. We propose a greedy consolidation algorithm that ensures the number of migrations is small and the placement decisions are fast, which makes it practical for large data centers. We evaluated our approach on real world traces from private and academic data centers, using simulation and compared the existing algorithms on various parameters like scheduling time, performance improvement and number of migrations. We observed a ~70% savings of the interconnect bandwidth and overall ~60% improvements in the applications performances. Also, these improvements were produced within a fraction of scheduling time and number of migrations.

随着数据中心的迅猛发展，现代数据中心的资源管理已成为一项具有挑战性的任务。在大型虚拟数据中心中，应用程序的性能高度依赖于虚拟机之间可用的通信带宽。传统算法要么不考虑应用程序的网络I/O细节，要么计算量很大。为了提高大型数据中心的应用程序性能和优化网络使用，我们解决了基于网络流量的虚拟机集群识别和智能放置的问题。我们提出了一种贪心合并算法，保证了迁移数量少，放置决策快，使其适用于大型数据中心。我们在私人和学术数据中心的真实世界轨迹上评估了我们的方法，使用模拟并比较了现有算法在调度时间、性能改进和迁移数量等各种参数上的表现。我们观察到互连带宽节省了约70%，总体应用程序性能提高了约60%。此外，这些改进是在调度时间和迁移数量的一小部分内产生的。

{"title":"Network-aware virtual machine consolidation for large data centers","authors":"Dharmesh Kakadia, N. Kopri, Vasudeva Varma","doi":"10.1145/2534695.2534702","DOIUrl":"https://doi.org/10.1145/2534695.2534702","url":null,"abstract":"Resource management in modern data centers has become a challenging task due to the tremendous growth of data centers. In large virtual data centers, performance of applications is highly dependent on the communication bandwidth available among virtual machines. Traditional algorithms either do not consider network I/O details of the applications or are computationally intensive. We address the problem of identifying the virtual machine clusters based on the network traffic and placing them intelligently in order to improve the application performance and optimize the network usage in large data center. We propose a greedy consolidation algorithm that ensures the number of migrations is small and the placement decisions are fast, which makes it practical for large data centers. We evaluated our approach on real world traces from private and academic data centers, using simulation and compared the existing algorithms on various parameters like scheduling time, performance improvement and number of migrations. We observed a ~70% savings of the interconnect bandwidth and overall ~60% improvements in the applications performances. Also, these improvements were produced within a fraction of scheduling time and number of migrations.","PeriodicalId":108576,"journal":{"name":"Network-aware Data Management","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132400998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

End-to-end data movement using MPI-IO over routed terabits infrastructures 在路由太比特基础设施上使用MPI-IO的端到端数据移动

Network-aware Data Management

Pub Date : 2013-11-17 DOI: 10.1145/2534695.2534705

Geoffroy R. Vallée, S. Atchley, Youngjae Kim, G. Shipman

Scientific discovery is nowadays driven by large-scale simulations running on massively parallel high-performance computing (HPC) systems. These applications each generate a large amount of data, which then needs to be post-processed for example for data mining or visualization. Unfortunately, the computing platform used for post processing might be different from the one on which the data is initially generated, introducing the challenge of moving large amount of data between computing platforms. This is especially challenging when these two platforms are geographically separated since the data needs to be moved between computing facilities. This is even more critical when scientists tightly couple their domain specific applications with a post processing application. The paper presents a solution for the data transfer between MPI applications using a dedicated wide area network (WAN) terabit infrastructure. The proposed solution is based on parallel access to data files and the Message Passing Interface (MPI) over the Common Communication Infrastructure (CCI) for the data transfer over a routed infrastructure. In the context of this research, the Energy Sciences Network (ESnet) of the U.S. Department of Energy (DOE) is targeted for the transfer of data between DOE national laboratories.

如今，科学发现是由在大规模并行高性能计算(HPC)系统上运行的大规模模拟驱动的。这些应用程序都会生成大量数据，然后需要对这些数据进行后处理，例如用于数据挖掘或可视化。不幸的是，用于后处理的计算平台可能与最初生成数据的计算平台不同，这就带来了在计算平台之间移动大量数据的挑战。当这两个平台在地理位置上分开时，这尤其具有挑战性，因为数据需要在计算设施之间移动。当科学家将特定领域的应用程序与后处理应用程序紧密结合时，这一点就更加重要了。本文提出了一种利用专用的广域网(WAN)太比特基础设施在MPI应用程序之间进行数据传输的解决方案。所提出的解决方案基于对数据文件的并行访问和公共通信基础设施(CCI)上的消息传递接口(MPI)，用于在路由基础设施上进行数据传输。在这项研究的背景下，美国能源部(DOE)的能源科学网络(ESnet)是能源部国家实验室之间数据传输的目标。

{"title":"End-to-end data movement using MPI-IO over routed terabits infrastructures","authors":"Geoffroy R. Vallée, S. Atchley, Youngjae Kim, G. Shipman","doi":"10.1145/2534695.2534705","DOIUrl":"https://doi.org/10.1145/2534695.2534705","url":null,"abstract":"Scientific discovery is nowadays driven by large-scale simulations running on massively parallel high-performance computing (HPC) systems. These applications each generate a large amount of data, which then needs to be post-processed for example for data mining or visualization. Unfortunately, the computing platform used for post processing might be different from the one on which the data is initially generated, introducing the challenge of moving large amount of data between computing platforms. This is especially challenging when these two platforms are geographically separated since the data needs to be moved between computing facilities. This is even more critical when scientists tightly couple their domain specific applications with a post processing application.\u0000 The paper presents a solution for the data transfer between MPI applications using a dedicated wide area network (WAN) terabit infrastructure. The proposed solution is based on parallel access to data files and the Message Passing Interface (MPI) over the Common Communication Infrastructure (CCI) for the data transfer over a routed infrastructure. In the context of this research, the Energy Sciences Network (ESnet) of the U.S. Department of Energy (DOE) is targeted for the transfer of data between DOE national laboratories.","PeriodicalId":108576,"journal":{"name":"Network-aware Data Management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131576135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

In-network, push-based network resource monitoring: scalable, responsive network management 网络内，基于推送的网络资源监控:可扩展，响应式网络管理

Network-aware Data Management

Pub Date : 2013-11-17 DOI: 10.1145/2534695.2534704

Taylor L. Groves, D. Arnold, Yihua He

We present preliminary work from our experiences with distributed, push-based monitoring of networks at Yahoo!. Network switches have grown beyond mere ASICs into machines which support unmodified Linux kernels and familiar user interfaces. These advances have enabled a paradigm shift in network monitoring. In lieu of traditional approaches where network diagnostics were delivered via SNMP we utilize Sysdb of Arista's EOS to implement a push based approach to network monitoring. This leaves the individual switches in charge of determining what monitoring data to send and when to send it. With this approach -- on-switch collection, dissemination, and analysis of interfaces and protocols become possible. This push based approach reduces the feedback loop of network diagnostics and enables networkaware applications, middleware and resource managers to have access to the freshest available data. Our work utilizes the OpenTSDB monitoring framework to provide a scalable back-end for accessing and storing real-time statistics delivered by on-switch collection agents. OpenTSDB is built on top of Hadoop/HBase, which handles the underlying access and storage for the monitoring system. We wrote two collection agents as prototypes to explore the framework and demonstrate the benefits of push based network monitoring.

我们将根据我们在雅虎的分布式、基于推送的网络监控方面的经验介绍初步工作。网络交换机已经从单纯的asic发展成为支持未经修改的Linux内核和熟悉的用户界面的机器。这些进步使网络监测的模式发生了转变。代替传统的通过SNMP提供网络诊断的方法，我们利用Arista的EOS的Sysdb来实现基于推送的网络监控方法。这使得各个交换机负责确定要发送哪些监视数据以及何时发送。有了这种方法，对接口和协议的收集、传播和分析就成为可能。这种基于推送的方法减少了网络诊断的反馈循环，并使网络感知应用程序、中间件和资源管理器能够访问最新的可用数据。我们的工作利用OpenTSDB监控框架来提供一个可扩展的后端，用于访问和存储由开关收集代理提供的实时统计数据。OpenTSDB建立在Hadoop/HBase之上，为监控系统处理底层的访问和存储。我们编写了两个收集代理作为原型来探索该框架并演示基于推送的网络监控的好处。

{"title":"In-network, push-based network resource monitoring: scalable, responsive network management","authors":"Taylor L. Groves, D. Arnold, Yihua He","doi":"10.1145/2534695.2534704","DOIUrl":"https://doi.org/10.1145/2534695.2534704","url":null,"abstract":"We present preliminary work from our experiences with distributed, push-based monitoring of networks at Yahoo!. Network switches have grown beyond mere ASICs into machines which support unmodified Linux kernels and familiar user interfaces. These advances have enabled a paradigm shift in network monitoring. In lieu of traditional approaches where network diagnostics were delivered via SNMP we utilize Sysdb of Arista's EOS to implement a push based approach to network monitoring. This leaves the individual switches in charge of determining what monitoring data to send and when to send it. With this approach -- on-switch collection, dissemination, and analysis of interfaces and protocols become possible. This push based approach reduces the feedback loop of network diagnostics and enables networkaware applications, middleware and resource managers to have access to the freshest available data.\u0000 Our work utilizes the OpenTSDB monitoring framework to provide a scalable back-end for accessing and storing real-time statistics delivered by on-switch collection agents. OpenTSDB is built on top of Hadoop/HBase, which handles the underlying access and storage for the monitoring system. We wrote two collection agents as prototypes to explore the framework and demonstrate the benefits of push based network monitoring.","PeriodicalId":108576,"journal":{"name":"Network-aware Data Management","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128573280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Efficient wide area data transfer protocols for 100 Gbps networks and beyond 适用于100gbps及以上网络的高效广域数据传输协议

Network-aware Data Management

Pub Date : 2013-11-17 DOI: 10.1145/2534695.2534699

E. Kissel, D. M. Swany, B. Tierney, Eric Pouyoul

Due to a number of recent technology developments, now is the right time to re-examine the use of TCP for very large data transfers. These developments include the deployment of 100 Gigabit per second (Gbps) network backbones, hosts that can easily manage 40 Gbps, and higher, data transfers, the Science DMZ model, the availability of virtual circuit technology, and wide-area Remote Direct Memory Access (RDMA) protocols. In this paper we show that RDMA works well over wide-area virtual circuits, and uses much less CPU than TCP or UDP. We also characterize the limitations of RDMA in the presence of other traffic, including competing RDMA flows. We conclude that RDMA for Science DMZ to Science DMZ transfers of massive data is a viable and desirable option for high-performance data transfer.

由于最近的一些技术发展，现在是重新检查TCP用于非常大的数据传输的合适时机。这些发展包括部署每秒100千兆比特(Gbps)的网络主干网、可以轻松管理40 Gbps甚至更高速率的数据传输的主机、Science DMZ模型、虚拟电路技术的可用性以及广域远程直接内存访问(RDMA)协议。在本文中，我们展示了RDMA在广域虚拟电路上工作得很好，并且比TCP或UDP使用更少的CPU。我们还描述了RDMA在存在其他流量(包括竞争的RDMA流)时的局限性。我们得出结论，用于Science DMZ到Science DMZ的大规模数据传输的RDMA是高性能数据传输的可行且理想的选择。

引用次数: 19

Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows 表征端系统亲和关系对高速流动端到端性能的影响

Network-aware Data Management

Pub Date : 2013-11-17 DOI: 10.1145/2534695.2534697

Nathan Hanford, V. Ahuja, Mehmet Balman, M. Farrens, D. Ghosal, Eric Pouyoul, B. Tierney

Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to assign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end-system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end-to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.

多核终端系统使用接收端缩放(RSS)来并行化协议处理。RSS使用标准流描述符上的散列函数和间接表将传入数据包分配给固定到特定核心的接收队列。这确保了流的亲缘性，因为属于特定流的所有数据包的中断处理都由相同的核心处理。标准RSS的一个关键限制是，在确定流关联时，它没有考虑使用传入数据的应用程序进程。在本文中，我们在具有双六核端系统的40 Gbps试验台网络中对应用亲和度对性能的影响进行了详细的实验分析。我们表明，与传统观点相反，当应用程序进程和流被关联到同一个核心时，性能(以端到端TCP吞吐量衡量)明显低于线路速率。当流和应用程序进程被关联到属于同一套接字的不同内核时，可以观察到近线速率性能。此外，将应用程序和流关联到不同套接字上的内核会导致吞吐量明显低于线路速率。产生这些结果的原因是内存瓶颈，这是通过使用服务于应用程序进程的核心中的缓存命中率的初步相关数据来证明的。

{"title":"Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows","authors":"Nathan Hanford, V. Ahuja, Mehmet Balman, M. Farrens, D. Ghosal, Eric Pouyoul, B. Tierney","doi":"10.1145/2534695.2534697","DOIUrl":"https://doi.org/10.1145/2534695.2534697","url":null,"abstract":"Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to assign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end-system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end-to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.","PeriodicalId":108576,"journal":{"name":"Network-aware Data Management","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115925305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Network-aware Data Management

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀