首页 > 最新文献

2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)最新文献

英文 中文
Packet classification using a bloom filter in a leaf-pushing area-based quad-trie 在基于叶推进区域的四叉树中使用bloom过滤器进行包分类
Hyesook Lim, Hayoung Byun
Packet classification is one of the most essential functions that Internet routers should perform at wire-speed for every incoming packet. An area-based quad-trie (AQT) for packet classification has an issue in search performance since many rule nodes can be encountered in a search procedure. A leaf-pushing AQT improves the search performance of the AQT by making a single rule node exist in each search path. This paper proposes a new algorithm to improve the search performance of the leaf-pushing AQT further. The proposed algorithm builds a leaf-pushing AQT using a Bloom filter and a hash table stored in on-chip memories. The level of a rule node and a pointer to a rule database are identified by sequentially querying the Bloom filter and by accessing the hash table, respectively.
分组分类是互联网路由器在有线速度下对每个传入数据包执行的最基本功能之一。由于在搜索过程中可能遇到许多规则节点,因此基于区域的四叉树(AQT)分组分类在搜索性能方面存在问题。推叶式AQT通过在每个搜索路径中存在单个规则节点来提高AQT的搜索性能。本文提出了一种新的算法来进一步提高推叶式AQT的搜索性能。该算法使用Bloom过滤器和存储在片上存储器中的哈希表构建叶推AQT。规则节点的级别和指向规则数据库的指针分别通过顺序查询Bloom过滤器和访问散列表来标识。
{"title":"Packet classification using a bloom filter in a leaf-pushing area-based quad-trie","authors":"Hyesook Lim, Hayoung Byun","doi":"10.1109/ANCS.2015.7110131","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110131","url":null,"abstract":"Packet classification is one of the most essential functions that Internet routers should perform at wire-speed for every incoming packet. An area-based quad-trie (AQT) for packet classification has an issue in search performance since many rule nodes can be encountered in a search procedure. A leaf-pushing AQT improves the search performance of the AQT by making a single rule node exist in each search path. This paper proposes a new algorithm to improve the search performance of the leaf-pushing AQT further. The proposed algorithm builds a leaf-pushing AQT using a Bloom filter and a hash table stored in on-chip memories. The level of a rule node and a pointer to a rule database are identified by sequentially querying the Bloom filter and by accessing the hash table, respectively.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116182420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
qSDS: A QoS-Aware I/O scheduling framework towards software defined storage qSDS:面向软件定义存储的qos感知I/O调度框架
Jianzong Wang, Lianglun Cheng
The inadequate resource allocation, lack of I/O performance prediction and insufficient isolation are affecting the storage performance in the multi-tenant cloud storage environment. In order to guarantee the Quality of Service (QoS), Softwaredefined Storage (SDS) is an effective approach in data centers. However, the lack of intelligence, robustness and selfadjustment are blocking the applications and promotions of SDS heavily. This paper focuses on the QoS-Aware I/O resource scheduling problem to build data centers with high availability, scalability and QoS. We will study workload characteristics, requirement analysis, the theory of QoS in SDS and I/O scheduling strategies. We obtain such goals by proposing a mathematics model of workload burstness, QoS semantic description with rule execution mechanisms and dynamic robust I/O scheduling algorithms for multi-type resources allocation. In the current progress, A QoS-Aware I/O Scheduling Framework towards SDS, qSDS has been proposed for the SSD/HDD hybrid storage. The preliminary evaluation in some benchmarks shows that qSDS can gain better performance compared with other strategies.
在多租户云存储环境下,资源分配不合理、缺乏I/O性能预测、隔离不充分等问题影响了存储性能。为了保证服务质量(QoS),软件定义存储(SDS)是数据中心的一种有效方法。然而,智能、鲁棒性和自调节性的不足严重阻碍了SDS的应用和推广。本文主要研究基于QoS感知的I/O资源调度问题,以构建高可用性、高可扩展性和高QoS的数据中心。我们将学习工作负载特征、需求分析、SDS中的QoS理论和I/O调度策略。为了实现这一目标,我们提出了工作负载突发性的数学模型、带有规则执行机制的QoS语义描述和多类型资源分配的动态鲁棒I/O调度算法。在当前的研究进展中,针对SSD/HDD混合存储,提出了一种面向SDS的qos感知I/O调度框架。在一些基准测试中的初步评价表明,与其他策略相比,qSDS可以获得更好的性能。
{"title":"qSDS: A QoS-Aware I/O scheduling framework towards software defined storage","authors":"Jianzong Wang, Lianglun Cheng","doi":"10.1109/ANCS.2015.7110137","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110137","url":null,"abstract":"The inadequate resource allocation, lack of I/O performance prediction and insufficient isolation are affecting the storage performance in the multi-tenant cloud storage environment. In order to guarantee the Quality of Service (QoS), Softwaredefined Storage (SDS) is an effective approach in data centers. However, the lack of intelligence, robustness and selfadjustment are blocking the applications and promotions of SDS heavily. This paper focuses on the QoS-Aware I/O resource scheduling problem to build data centers with high availability, scalability and QoS. We will study workload characteristics, requirement analysis, the theory of QoS in SDS and I/O scheduling strategies. We obtain such goals by proposing a mathematics model of workload burstness, QoS semantic description with rule execution mechanisms and dynamic robust I/O scheduling algorithms for multi-type resources allocation. In the current progress, A QoS-Aware I/O Scheduling Framework towards SDS, qSDS has been proposed for the SSD/HDD hybrid storage. The preliminary evaluation in some benchmarks shows that qSDS can gain better performance compared with other strategies.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114556047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Orange: multi field openflow based range classifier 橙色:基于多字段开放流的范围分类器
Liron Schiff, Y. Afek, A. Bremler-Barr
Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries. In this work we present, ORange, the first solution to multi dimensional range classification in OpenFlow. Our solution is based on paradigms used in state of the art non-OpenFlow classifiers and is designed in a modular fashion allowing future extensions and improvements. We consider switch space utilization as well as atomic updates functionality, and in the network context we provide flow consistency even if flows change their entrance point to the network during policy updates, a property we name cross-entrance consistency. Our scheme achieves remarkable results and is easy to deploy.
在网络交换机中配置基于范围的数据包分类规则对于防火墙和路由等所有网络核心功能至关重要。但是,目前领先的SDN交换机管理协议OpenFlow缺乏直接配置范围规则的接口,只提供基于掩码的规则,称为流项。在这项工作中,我们提出了ORange, OpenFlow中第一个多维范围分类的解决方案。我们的解决方案基于最先进的非openflow分类器中使用的范例,并以模块化的方式设计,允许未来的扩展和改进。我们考虑交换机空间利用率以及原子更新功能,并且在网络上下文中,即使流在策略更新期间更改其网络入口点,我们也提供流一致性,我们将此属性称为交叉入口一致性。该方案取得了显著的效果,且易于部署。
{"title":"Orange: multi field openflow based range classifier","authors":"Liron Schiff, Y. Afek, A. Bremler-Barr","doi":"10.1109/ANCS.2015.7110121","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110121","url":null,"abstract":"Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries. In this work we present, ORange, the first solution to multi dimensional range classification in OpenFlow. Our solution is based on paradigms used in state of the art non-OpenFlow classifiers and is designed in a modular fashion allowing future extensions and improvements. We consider switch space utilization as well as atomic updates functionality, and in the network context we provide flow consistency even if flows change their entrance point to the network during policy updates, a property we name cross-entrance consistency. Our scheme achieves remarkable results and is easy to deploy.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134451330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Scalable many-field packet classification using multidimensional-cutting via selective bit-concatenation 通过选择性位连接使用多维切割的可扩展多字段分组分类
Cheng-Liang Hsieh, N. Weng
OpenFlow Switch in Software-Defined Networking (SDN) has changed packet classification from standard 5-tuple to arbitrary many-field. The growing number of fields in a rule and the increasing number of rules in a ruleset poses great challenges for packet classification in terms of performance, storage, and update cost. In this paper, we design a two-stage packet classification system to address those issues by exploiting ruleset sparsity and rule fields independence. A ruleset is examined offline with proposed matrices to find representative bits from different field in a rule. We leverage those representative bits and concatenate them as sample values to divide a ruleset into several subsets in sample spaces. Each subset is given a unique address for each sample space. A ruleset update only affects those related addresses. The proposed pre-filtering stage comes out only highly related rules by intersecting candidate rules from different sample spaces for full match process. Out system throughput is 356 MPPS for 1K 15-field rules and 213 MPPS for 100K 15-field rules when using a single NVIDIA K20C GPU card.
软件定义网络(SDN)中的OpenFlow交换机将数据包分类从标准的5元组转变为任意多字段。规则中的字段数量和规则集中的规则数量不断增加,对分组分类的性能、存储和更新成本提出了很大的挑战。在本文中,我们设计了一个两阶段的包分类系统,利用规则集稀疏性和规则域独立性来解决这些问题。使用建议的矩阵离线检查规则集,以查找规则中不同字段的代表性位。我们利用这些代表性的位并将它们连接为样本值,将规则集在样本空间中划分为几个子集。对于每个样本空间,每个子集都有一个唯一的地址。规则集更新只影响那些相关的地址。所提出的预滤波阶段通过将来自不同样本空间的候选规则相交,只产生高度相关的规则进行全匹配。当使用单个NVIDIA K20C GPU卡时,我们的系统吞吐量为1K 15字段规则时为356 MPPS, 100K 15字段规则时为213 MPPS。
{"title":"Scalable many-field packet classification using multidimensional-cutting via selective bit-concatenation","authors":"Cheng-Liang Hsieh, N. Weng","doi":"10.1109/ANCS.2015.7110133","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110133","url":null,"abstract":"OpenFlow Switch in Software-Defined Networking (SDN) has changed packet classification from standard 5-tuple to arbitrary many-field. The growing number of fields in a rule and the increasing number of rules in a ruleset poses great challenges for packet classification in terms of performance, storage, and update cost. In this paper, we design a two-stage packet classification system to address those issues by exploiting ruleset sparsity and rule fields independence. A ruleset is examined offline with proposed matrices to find representative bits from different field in a rule. We leverage those representative bits and concatenate them as sample values to divide a ruleset into several subsets in sample spaces. Each subset is given a unique address for each sample space. A ruleset update only affects those related addresses. The proposed pre-filtering stage comes out only highly related rules by intersecting candidate rules from different sample spaces for full match process. Out system throughput is 356 MPPS for 1K 15-field rules and 213 MPPS for 100K 15-field rules when using a single NVIDIA K20C GPU card.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128251892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU 在FPGA、多核通用处理器和GPU上优化多域分组分类
Yun Qu, Hao Zhang, Shijie Zhou, V. Prasanna
Due to the rapid growth of Internet, there is an increasing need for efficiently classifying packets with many header fields in large rule sets. For example, in Software Defined Networking (SDN), the OpenFlow table lookup can require 15 packet header fields to be examined. In this paper, we present several decomposition-based packet classification implementations with efficient optimization techniques. In the searching phase, packet headers are split or combined. In the merging phase, the partial searching results from all the fields are merged to generate the final result. We prototype our implementations on state-of-the-art Field Programmable Gate Array (FPGA), multi-core General Purpose Processor (GPP), and Graphics Processing Unit (GPU). On FPGA, we propose two optimization techniques to divide generic ranges; modular processing elements are constructed and concatenated into a systolic array. On multi-core GPP, we parallelize both the searching and merging phases using parallel program threads. On the GPU-accelerated platform, we minimize branch divergence and reduce the data communication overhead. Experimental results show that 500Million Packets Per Second (MPPS) throughput and 3μs latency can be achieved for 1:5K rule sets on FPGA. We achieve 14:7MPPS throughput and 30:5MPPS throughput for 32K rule sets on multi-core GPP and GPU-accelerated platforms, respectively. As a heterogeneous solution, our GPU-accelerated packet classier shows 2x speedup compared to the implementation using multi-core GPP only. Compared with prior works, our designs can match long packet headers against very complex rule sets.
由于Internet的快速发展,对大型规则集中具有多个报头字段的数据包进行高效分类的需求日益增加。例如,在软件定义网络(SDN)中,OpenFlow表查找可能需要检查15个数据包报头字段。在本文中,我们提出了几种基于分解的数据包分类实现和高效的优化技术。在搜索阶段,对报文头进行拆分或合并。在合并阶段,将所有字段的部分搜索结果合并生成最终结果。我们在最先进的现场可编程门阵列(FPGA),多核通用处理器(GPP)和图形处理单元(GPU)上实现原型。在FPGA上,我们提出了两种优化技术来划分通用范围;模块处理元素被构造并连接到一个收缩数组中。在多核GPP上,我们使用并行程序线程并行化搜索和归并阶段。在gpu加速平台上,我们最大限度地减少了分支发散,减少了数据通信开销。实验结果表明,对于1:5K的规则集,FPGA可以实现5亿个数据包每秒(MPPS)的吞吐量和3μs的延迟。我们在多核GPP和gpu加速平台上分别实现了14:7MPPS和30:5MPPS吞吐量的32K规则集。作为一种异构解决方案,我们的gpu加速数据包分类器与仅使用多核GPP的实现相比,速度提高了2倍。与以往的工作相比,我们的设计可以匹配长包头和非常复杂的规则集。
{"title":"Optimizing many-field packet classification on FPGA, multi-core general purpose processor, and GPU","authors":"Yun Qu, Hao Zhang, Shijie Zhou, V. Prasanna","doi":"10.1109/ANCS.2015.7110123","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110123","url":null,"abstract":"Due to the rapid growth of Internet, there is an increasing need for efficiently classifying packets with many header fields in large rule sets. For example, in Software Defined Networking (SDN), the OpenFlow table lookup can require 15 packet header fields to be examined. In this paper, we present several decomposition-based packet classification implementations with efficient optimization techniques. In the searching phase, packet headers are split or combined. In the merging phase, the partial searching results from all the fields are merged to generate the final result. We prototype our implementations on state-of-the-art Field Programmable Gate Array (FPGA), multi-core General Purpose Processor (GPP), and Graphics Processing Unit (GPU). On FPGA, we propose two optimization techniques to divide generic ranges; modular processing elements are constructed and concatenated into a systolic array. On multi-core GPP, we parallelize both the searching and merging phases using parallel program threads. On the GPU-accelerated platform, we minimize branch divergence and reduce the data communication overhead. Experimental results show that 500Million Packets Per Second (MPPS) throughput and 3μs latency can be achieved for 1:5K rule sets on FPGA. We achieve 14:7MPPS throughput and 30:5MPPS throughput for 32K rule sets on multi-core GPP and GPU-accelerated platforms, respectively. As a heterogeneous solution, our GPU-accelerated packet classier shows 2x speedup compared to the implementation using multi-core GPP only. Compared with prior works, our designs can match long packet headers against very complex rule sets.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124756007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Reliably scalable name prefix lookup 可靠的可扩展名称前缀查找
Haowei Yuan, P. Crowley
Name prefix lookup is a core building block of information-centric networking (ICN). In ICN hierarchical naming schemes, each packet has a name that consists of multiple variable-length name components, and packets are forwarded based on longest name prefix matching (LNPM). LNPM is challenging because names are longer than IP addresses and the namespace is unbounded. Recently proposed solutions have shown encouraging performance, however, most are optimized for or evaluated with a limited number of URL datasets that may not fully characterize the forwarding information base (FIB).What's more, the worst-case scenarios of several schemes require O(k) string lookups, where k is the number of components in each prefix. Thus, the sustained performance of existing solutions is not guaranteed. In this paper, we present a LNPM design based on the binary search of hash tables, which was originally proposed for IP lookup. With this design, the worst-case number of string lookups is O(log(k)) for prefixes with up to k components, regardless of the characteristics of the FIB. We implemented the design in software and demonstrated 10 Gbps throughput with one billion synthetic longest name prefix matching rules, each containing up to seven components. We also propose level pulling to optimize the average LNPM performance based on the observation that some prefixes have large numbers of next-level suffixes in the available URL datasets.
名称前缀查找是信息中心网络(ICN)的核心组成部分。在ICN分层命名方案中,每个报文都有一个由多个变长名称组成的名称,并根据最长名称前缀匹配(LNPM)转发报文。LNPM具有挑战性,因为名称比IP地址长,而且名称空间是无限的。最近提出的解决方案显示出令人鼓舞的性能,然而,大多数都是针对有限数量的URL数据集进行优化或评估的,这些数据集可能无法完全表征转发信息库(FIB)。更重要的是,几种方案的最坏情况需要O(k)个字符串查找,其中k是每个前缀中组件的数量。因此,现有解决方案的持续性能不能得到保证。在本文中,我们提出了一种基于哈希表二分查找的LNPM设计,该设计最初是为IP查找而提出的。使用这种设计,无论FIB的特性如何,对于具有多达k个分量的前缀,最坏情况下的字符串查找次数是O(log(k))。我们在软件中实现了该设计,并演示了10亿个合成最长名称前缀匹配规则的10gbps吞吐量,每个规则最多包含七个组件。我们还提出级别拉取来优化平均LNPM性能,这是基于在可用的URL数据集中观察到一些前缀具有大量的下一级后缀。
{"title":"Reliably scalable name prefix lookup","authors":"Haowei Yuan, P. Crowley","doi":"10.1109/ANCS.2015.7110125","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110125","url":null,"abstract":"Name prefix lookup is a core building block of information-centric networking (ICN). In ICN hierarchical naming schemes, each packet has a name that consists of multiple variable-length name components, and packets are forwarded based on longest name prefix matching (LNPM). LNPM is challenging because names are longer than IP addresses and the namespace is unbounded. Recently proposed solutions have shown encouraging performance, however, most are optimized for or evaluated with a limited number of URL datasets that may not fully characterize the forwarding information base (FIB).What's more, the worst-case scenarios of several schemes require O(k) string lookups, where k is the number of components in each prefix. Thus, the sustained performance of existing solutions is not guaranteed. In this paper, we present a LNPM design based on the binary search of hash tables, which was originally proposed for IP lookup. With this design, the worst-case number of string lookups is O(log(k)) for prefixes with up to k components, regardless of the characteristics of the FIB. We implemented the design in software and demonstrated 10 Gbps throughput with one billion synthetic longest name prefix matching rules, each containing up to seven components. We also propose level pulling to optimize the average LNPM performance based on the observation that some prefixes have large numbers of next-level suffixes in the available URL datasets.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129684107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
A systematic evaluation of emerging mesh-like CMP NoCs 新兴网状CMP noc的系统评价
Antonis Psathakis, Vassilis D. Papaefstathiou, Nikolaos Chrysos, Fabien Chaix, E. Vasilakis, D. Pnevmatikatos, M. Katevenis
This paper studies alternative Network-on-Chip architectures for emerging many-core chip multiprocessors, by exploring the following design options on mesh-based networks: Multiple physical networks (P), cores concentration (C), express channels (X), it widths (W), and virtual channels (V). We exhaustively evaluate all combinations of the afore-mentioned parameters (P, C, X, W, V), using the energy-throughput ratio (ETR) as a metric to classify network congurations. Our experimental results show that, on one hand, with an appropriate selection of parameters (V,W), an optimized baseline 2D mesh offers the best possible ETR for NoCs with up to a few tens of cores (64-core NoC). More complicated networks, using concentration and express channels, can reduce the zero-load latency, but do not necessarily help to improve ETR. On the other hand, for larger CMPs, a 2D mesh with multiple physical networks is a better option: once optimized, this architectural choice can reduce the ETR by up to 46% for 256 cores.
本文通过探索基于网格的网络上的以下设计选项,研究了新兴多核芯片多处理器的备选片上网络架构:多个物理网络(P),核心浓度(C),快速通道(X),它的宽度(W)和虚拟通道(V)。我们详尽地评估了上述参数(P, C, X, W, V)的所有组合,使用能量-吞吐量比(ETR)作为分类网络配置的度量。我们的实验结果表明,一方面,通过适当的参数选择(V,W),优化的基线2D网格为多达几十核的NoC(64核NoC)提供了最佳的ETR。更复杂的网络,使用集中和快速通道,可以减少零负载延迟,但不一定有助于提高ETR。另一方面,对于较大的cmp,具有多个物理网络的2D网格是更好的选择:一旦优化,这种架构选择可以将256核的ETR降低高达46%。
{"title":"A systematic evaluation of emerging mesh-like CMP NoCs","authors":"Antonis Psathakis, Vassilis D. Papaefstathiou, Nikolaos Chrysos, Fabien Chaix, E. Vasilakis, D. Pnevmatikatos, M. Katevenis","doi":"10.1109/ANCS.2015.7110129","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110129","url":null,"abstract":"This paper studies alternative Network-on-Chip architectures for emerging many-core chip multiprocessors, by exploring the following design options on mesh-based networks: Multiple physical networks (P), cores concentration (C), express channels (X), it widths (W), and virtual channels (V). We exhaustively evaluate all combinations of the afore-mentioned parameters (P, C, X, W, V), using the energy-throughput ratio (ETR) as a metric to classify network congurations. Our experimental results show that, on one hand, with an appropriate selection of parameters (V,W), an optimized baseline 2D mesh offers the best possible ETR for NoCs with up to a few tens of cores (64-core NoC). More complicated networks, using concentration and express channels, can reduce the zero-load latency, but do not necessarily help to improve ETR. On the other hand, for larger CMPs, a 2D mesh with multiple physical networks is a better option: once optimized, this architectural choice can reduce the ETR by up to 46% for 256 cores.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122348193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Disaggregation – the new way to build mega (and micro) data centers 分解——构建大型(和微型)数据中心的新方法
Y. Bachar
Summary form only given. The talk will discuss how open hardware, open software, and disaggregation enable to build efficient data centers in any size from small to super-big. The talk will show a real life Facebook architecture in network, hardware and software that will enable future data centers developers to control their destiny and scale at their own pace the data center capacity and size with the most effective PUI in the industry. We will discuss Wedge, 6-Pack, FBOSS, oBMC and many other aspects of the Facebook data centers.
只提供摘要形式。该演讲将讨论开放硬件、开放软件和分解如何能够构建从小型到超大型的任何规模的高效数据中心。该演讲将展示一个真实的Facebook架构,包括网络、硬件和软件,这将使未来的数据中心开发人员能够控制自己的命运,并以自己的速度扩展数据中心的容量和规模,拥有业界最有效的PUI。我们将讨论Wedge、6-Pack、FBOSS、oBMC和Facebook数据中心的许多其他方面。
{"title":"Disaggregation – the new way to build mega (and micro) data centers","authors":"Y. Bachar","doi":"10.1109/ANCS.2015.7110114","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110114","url":null,"abstract":"Summary form only given. The talk will discuss how open hardware, open software, and disaggregation enable to build efficient data centers in any size from small to super-big. The talk will show a real life Facebook architecture in network, hardware and software that will enable future data centers developers to control their destiny and scale at their own pace the data center capacity and size with the most effective PUI in the industry. We will discuss Wedge, 6-Pack, FBOSS, oBMC and many other aspects of the Facebook data centers.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123192306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Linux XIA: an interoperable meta network architecture to crowdsource the future internet Linux XIA:一个可互操作的元网络架构,以众包未来的互联网
Michel Machado, Cody Doucette, J. Byers
With the growing number of proposed clean-slate redesigns of the Internet, the need for a medium that enables all stakeholders to participate in the realization, evaluation, and selection of these designs is increasing. We believe that the missing catalyst is a meta network architecture that welcomes most, if not all, clean-state designs on a level playing field, lowers deployment barriers, and leaves the final evaluation to the broader community. This paper presents Linux XIA, a native implementation of XIA in the Linux kernel, as a candidate. We first describe Linux XIA in terms of its architectural realizations and algorithmic contributions. We then demonstrate how to port several distinct and unrelated network architectures onto Linux XIA. Finally, we provide a hybrid evaluation of Linux XIA at three levels of abstraction in terms of its ability to: evolve and foster interoperation of new architectures, embed disparate architectures inside the implementation's framework, and maintain a comparable forwarding performance to that of the legacy TCP/IP implementation. Given this evaluation, we substantiate a previously unsupported claim of XIA: that it readily supports and enables network evolution, collaboration, and interoperability - traits we view as central to the success of any future Internet architecture.
随着越来越多的提议对Internet进行全新的重新设计,对一种能够使所有涉众参与这些设计的实现、评估和选择的媒介的需求正在增加。我们认为,缺少的催化剂是一个元网络架构,它在一个公平的竞争环境中欢迎大多数(如果不是全部)干净状态的设计,降低部署障碍,并将最终评估留给更广泛的社区。本文介绍了Linux XIA,它是Linux内核中XIA的一个本地实现。我们首先从架构实现和算法贡献的角度描述Linux XIA。然后,我们将演示如何将几种不同且不相关的网络体系结构移植到Linux XIA上。最后,我们对Linux XIA在三个抽象层次上的能力进行了混合评估:发展和促进新体系结构的互操作,在实现的框架内嵌入不同的体系结构,并保持与传统TCP/IP实现相当的转发性能。鉴于这一评估,我们证实了先前未被支持的XIA声明:它随时支持并使网络进化、协作和互操作性成为可能——我们认为这些特征是未来任何互联网架构成功的核心。
{"title":"Linux XIA: an interoperable meta network architecture to crowdsource the future internet","authors":"Michel Machado, Cody Doucette, J. Byers","doi":"10.1109/ANCS.2015.7110128","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110128","url":null,"abstract":"With the growing number of proposed clean-slate redesigns of the Internet, the need for a medium that enables all stakeholders to participate in the realization, evaluation, and selection of these designs is increasing. We believe that the missing catalyst is a meta network architecture that welcomes most, if not all, clean-state designs on a level playing field, lowers deployment barriers, and leaves the final evaluation to the broader community. This paper presents Linux XIA, a native implementation of XIA in the Linux kernel, as a candidate. We first describe Linux XIA in terms of its architectural realizations and algorithmic contributions. We then demonstrate how to port several distinct and unrelated network architectures onto Linux XIA. Finally, we provide a hybrid evaluation of Linux XIA at three levels of abstraction in terms of its ability to: evolve and foster interoperation of new architectures, embed disparate architectures inside the implementation's framework, and maintain a comparable forwarding performance to that of the legacy TCP/IP implementation. Given this evaluation, we substantiate a previously unsupported claim of XIA: that it readily supports and enables network evolution, collaboration, and interoperability - traits we view as central to the success of any future Internet architecture.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128446517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Flowshadow: a fast path for uninterrupted packet processing in SDN switches 流影:SDN交换机中不间断数据包处理的快速路径
Yi Wang, Dongzhe Tai, Ting Zhang, Linxiao Jin, Huichen Dai, B. Liu, Xin Wu
Updating rules in the flow tables of SDN switches are complex and time-consuming. Therefore, we propose a cache-based scheme (named FlowShadow) to improve the packet processing performance and keep continuous operating while updating rules in the flow tables. FlowShadow caches the microflows in the hash table to build a fast path for packet processing. By leveraging the Action Table, FlowShadow achieves update consistency and good update performance. In order to examine the reliability, validity, utility and scalability of FlowShadow, we implement FlowShadow on the Open VSwitch and conduct numerous experiments with different settings to measure the performance of FlowShadow. The experimental results demonstrate that FlowShadow achieves a lookup speed of 75 million packets per second on a commodity PC under the real backbone traces; the system with FlowShadow speeds up 3.4× times of the original Open VSwitch.
SDN交换机流表中的规则更新复杂且耗时。因此,我们提出了一种基于缓存的方案(FlowShadow)来提高数据包处理性能,并在更新流表规则的同时保持连续运行。FlowShadow将微流缓存在哈希表中,以构建数据包处理的快速路径。通过利用动作表,FlowShadow实现了更新一致性和良好的更新性能。为了检验FlowShadow的可靠性、有效性、实用性和可扩展性,我们在Open VSwitch上实现了FlowShadow,并在不同的设置下进行了大量的实验来测量FlowShadow的性能。实验结果表明,FlowShadow在真实骨干路径下,在商用PC上实现了每秒7500万数据包的查找速度;具有FlowShadow的系统速度是原始Open VSwitch的3.4倍。
{"title":"Flowshadow: a fast path for uninterrupted packet processing in SDN switches","authors":"Yi Wang, Dongzhe Tai, Ting Zhang, Linxiao Jin, Huichen Dai, B. Liu, Xin Wu","doi":"10.1109/ANCS.2015.7110142","DOIUrl":"https://doi.org/10.1109/ANCS.2015.7110142","url":null,"abstract":"Updating rules in the flow tables of SDN switches are complex and time-consuming. Therefore, we propose a cache-based scheme (named FlowShadow) to improve the packet processing performance and keep continuous operating while updating rules in the flow tables. FlowShadow caches the microflows in the hash table to build a fast path for packet processing. By leveraging the Action Table, FlowShadow achieves update consistency and good update performance. In order to examine the reliability, validity, utility and scalability of FlowShadow, we implement FlowShadow on the Open VSwitch and conduct numerous experiments with different settings to measure the performance of FlowShadow. The experimental results demonstrate that FlowShadow achieves a lookup speed of 75 million packets per second on a commodity PC under the real backbone traces; the system with FlowShadow speeds up 3.4× times of the original Open VSwitch.","PeriodicalId":186232,"journal":{"name":"2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125400032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1