Manycore Network Interfaces for in-memory rack-scale computing

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2015-06-13 DOI:10.1145/2749469.2750415

Alexandros Daglis, Stanko Novakovic, Edouard Bugnion, B. Falsafi, Boris Grot

{"title":"Manycore Network Interfaces for in-memory rack-scale computing","authors":"Alexandros Daglis, Stanko Novakovic, Edouard Bugnion, B. Falsafi, Boris Grot","doi":"10.1145/2749469.2750415","DOIUrl":null,"url":null,"abstract":"Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on commodity SoCs, low-latency and high-bandwidth communication fabrics and a remote memory access model to enable aggregation of a rack's memory for critical data-intensive applications such as graph processing or key-value stores. Low latency and high bandwidth not only dictate eliminating communication bottlenecks in the software protocols and off-chip fabrics but also a careful on-chip integration of network interfaces. The latter is a key challenge especially in architectures with RDMA-inspired one-sided operations that aim to achieve low latency and high bandwidth through on-chip Network Interface (NI) support. This paper proposes and evaluates network interface architectures for tiled manycore SoCs for in-memory rack-scale computing. Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth. Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol or the core's microarchitecture.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"11 2 1","pages":"567-579"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on commodity SoCs, low-latency and high-bandwidth communication fabrics and a remote memory access model to enable aggregation of a rack's memory for critical data-intensive applications such as graph processing or key-value stores. Low latency and high bandwidth not only dictate eliminating communication bottlenecks in the software protocols and off-chip fabrics but also a careful on-chip integration of network interfaces. The latter is a key challenge especially in architectures with RDMA-inspired one-sided operations that aim to achieve low latency and high bandwidth through on-chip Network Interface (NI) support. This paper proposes and evaluates network interface architectures for tiled manycore SoCs for in-memory rack-scale computing. Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth. Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol or the core's microarchitecture.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于内存机架级计算的多核网络接口

数据中心运营商依靠低成本、高密度的技术来最大限度地提高数据密集型服务的吞吐量。内存中的机架规模计算正在成为横向扩展数据中心的一个有前途的范例，它利用商品soc、低延迟和高带宽通信结构以及远程内存访问模型，为关键数据密集型应用程序(如图形处理或键值存储)提供机架内存聚合。低延迟和高带宽不仅要求消除软件协议和片外结构中的通信瓶颈，而且还要求对网络接口进行仔细的片上集成。后者是一个关键的挑战，特别是在具有rdma启发的单边操作的体系结构中，这种操作旨在通过片上网络接口(NI)支持实现低延迟和高带宽。本文提出并评估了用于内存机架规模计算的平铺多核soc的网络接口体系结构。我们的研究结果表明，沿着NOC维度仔细拆分每个芯片瓦片和芯片边缘的NI功能，可以使机架级架构同时优化延迟和带宽。我们最好的多核NI架构可以在理想硬件NUMA的3%内实现延迟，并有效地使用NOC的全等分带宽，而无需改变片上一致性协议或核心的微架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量

期刊最新文献

Redundant Memory Mappings for fast access to large memories Multiple Clone Row DRAM: A low latency and area optimized DRAM Manycore Network Interfaces for in-memory rack-scale computing Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures ShiDianNao: Shifting vision processing closer to the sensor