HyperData: A Data Transfer Accelerator for Software Data Planes Based on Targeted Prefetching

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI:10.1109/ICCD53106.2021.00059

Hossein Golestani, T. Wenisch

{"title":"HyperData: A Data Transfer Accelerator for Software Data Planes Based on Targeted Prefetching","authors":"Hossein Golestani, T. Wenisch","doi":"10.1109/ICCD53106.2021.00059","DOIUrl":null,"url":null,"abstract":"Datacenter systems rely on fast, efficient I/O soft-ware stacks—Software Data Planes (SDPs)—to coordinate frequent interaction among myriad processes (or VMs) and I/O devices (NICs, SSDs, etc.). Thanks to the impressive and ever-growing speed of today’s I/O devices and μs-scale computation due to hyper-tenancy and microservice-based applications, SDPs play a crucial role in overall system performance and efficiency. In this work, we aim to enhance data transfer among the SDP, I/O devices, and applications/VMs by designing the HyperData accelerator. Data items in SDP systems, such as network packets or storage blocks, are transferred through shared memory queues. Consumer cores typically access the data from DRAM or, thanks to technologies like Intel DDIO, from the (shared) last-level cache. Today, consumers cannot effectively prefetch such data to nearer caches due to the lack of a proper arrival notification mechanism and the complex access pattern of data buffers. HyperData is designed to perform targeted prefetching, wherein the exact data items (or a required subset) are prefetched to the L1 cache of the consumer core. Furthermore, HyperData is applicable to both core–device and core–core data communication, and it supports complex queue formats like Virtio and multi-consumer queues. HyperData is realized with a per-core programmable prefetcher, which issues the prefetch requests, and a system-level monitoring set, which monitors queues for data arrival and triggers prefetch operations. We show that HyperData improves processing latency by 1.20-2.42× in a simulation of a state-of-the-art SDP, with only a few hundred bytes of per-core overhead.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Datacenter systems rely on fast, efficient I/O soft-ware stacks—Software Data Planes (SDPs)—to coordinate frequent interaction among myriad processes (or VMs) and I/O devices (NICs, SSDs, etc.). Thanks to the impressive and ever-growing speed of today’s I/O devices and μs-scale computation due to hyper-tenancy and microservice-based applications, SDPs play a crucial role in overall system performance and efficiency. In this work, we aim to enhance data transfer among the SDP, I/O devices, and applications/VMs by designing the HyperData accelerator. Data items in SDP systems, such as network packets or storage blocks, are transferred through shared memory queues. Consumer cores typically access the data from DRAM or, thanks to technologies like Intel DDIO, from the (shared) last-level cache. Today, consumers cannot effectively prefetch such data to nearer caches due to the lack of a proper arrival notification mechanism and the complex access pattern of data buffers. HyperData is designed to perform targeted prefetching, wherein the exact data items (or a required subset) are prefetched to the L1 cache of the consumer core. Furthermore, HyperData is applicable to both core–device and core–core data communication, and it supports complex queue formats like Virtio and multi-consumer queues. HyperData is realized with a per-core programmable prefetcher, which issues the prefetch requests, and a system-level monitoring set, which monitors queues for data arrival and triggers prefetch operations. We show that HyperData improves processing latency by 1.20-2.42× in a simulation of a state-of-the-art SDP, with only a few hundred bytes of per-core overhead.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HyperData:基于目标预取的软件数据平面数据传输加速器

数据中心系统依赖于快速、高效的I/O软件堆栈——软件数据平面(sdp)——来协调无数进程(或虚拟机)和I/O设备(网卡、ssd等)之间频繁的交互。由于当今I/O设备的速度令人印象深刻且不断增长，以及基于超租户和微服务的应用程序所带来的μs级计算，sdp在整体系统性能和效率方面发挥着至关重要的作用。在这项工作中，我们的目标是通过设计HyperData加速器来增强SDP, I/O设备和应用程序/ vm之间的数据传输。SDP系统中的数据项(如网络数据包或存储块)通过共享内存队列传输。消费级核心通常从DRAM访问数据，或者(得益于英特尔DDIO等技术)从(共享的)最后一级缓存访问数据。目前，由于缺乏适当的到达通知机制和数据缓冲区的复杂访问模式，消费者无法有效地将这些数据预取到更近的缓存中。HyperData被设计用于执行目标预取，其中精确的数据项(或所需的子集)被预取到消费者核心的L1缓存中。此外，HyperData既适用于核心设备之间的数据通信，也适用于核心设备之间的数据通信，它支持复杂的队列格式，如Virtio和多消费者队列。HyperData是通过一个每核可编程预取器和一个系统级监控集实现的，前者负责发出预取请求，后者负责监控数据到达队列并触发预取操作。在最先进的SDP模拟中，我们展示了HyperData将处理延迟提高了1.20-2.42倍，而每核开销只有几百字节。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 39th International Conference on Computer Design (ICCD)

自引率

0.00%

发文量