HyperData: A Data Transfer Accelerator for Software Data Planes Based on Targeted Prefetching

Hossein Golestani, T. Wenisch
{"title":"HyperData: A Data Transfer Accelerator for Software Data Planes Based on Targeted Prefetching","authors":"Hossein Golestani, T. Wenisch","doi":"10.1109/ICCD53106.2021.00059","DOIUrl":null,"url":null,"abstract":"Datacenter systems rely on fast, efficient I/O soft-ware stacks—Software Data Planes (SDPs)—to coordinate frequent interaction among myriad processes (or VMs) and I/O devices (NICs, SSDs, etc.). Thanks to the impressive and ever-growing speed of today’s I/O devices and μs-scale computation due to hyper-tenancy and microservice-based applications, SDPs play a crucial role in overall system performance and efficiency. In this work, we aim to enhance data transfer among the SDP, I/O devices, and applications/VMs by designing the HyperData accelerator. Data items in SDP systems, such as network packets or storage blocks, are transferred through shared memory queues. Consumer cores typically access the data from DRAM or, thanks to technologies like Intel DDIO, from the (shared) last-level cache. Today, consumers cannot effectively prefetch such data to nearer caches due to the lack of a proper arrival notification mechanism and the complex access pattern of data buffers. HyperData is designed to perform targeted prefetching, wherein the exact data items (or a required subset) are prefetched to the L1 cache of the consumer core. Furthermore, HyperData is applicable to both core–device and core–core data communication, and it supports complex queue formats like Virtio and multi-consumer queues. HyperData is realized with a per-core programmable prefetcher, which issues the prefetch requests, and a system-level monitoring set, which monitors queues for data arrival and triggers prefetch operations. We show that HyperData improves processing latency by 1.20-2.42× in a simulation of a state-of-the-art SDP, with only a few hundred bytes of per-core overhead.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Datacenter systems rely on fast, efficient I/O soft-ware stacks—Software Data Planes (SDPs)—to coordinate frequent interaction among myriad processes (or VMs) and I/O devices (NICs, SSDs, etc.). Thanks to the impressive and ever-growing speed of today’s I/O devices and μs-scale computation due to hyper-tenancy and microservice-based applications, SDPs play a crucial role in overall system performance and efficiency. In this work, we aim to enhance data transfer among the SDP, I/O devices, and applications/VMs by designing the HyperData accelerator. Data items in SDP systems, such as network packets or storage blocks, are transferred through shared memory queues. Consumer cores typically access the data from DRAM or, thanks to technologies like Intel DDIO, from the (shared) last-level cache. Today, consumers cannot effectively prefetch such data to nearer caches due to the lack of a proper arrival notification mechanism and the complex access pattern of data buffers. HyperData is designed to perform targeted prefetching, wherein the exact data items (or a required subset) are prefetched to the L1 cache of the consumer core. Furthermore, HyperData is applicable to both core–device and core–core data communication, and it supports complex queue formats like Virtio and multi-consumer queues. HyperData is realized with a per-core programmable prefetcher, which issues the prefetch requests, and a system-level monitoring set, which monitors queues for data arrival and triggers prefetch operations. We show that HyperData improves processing latency by 1.20-2.42× in a simulation of a state-of-the-art SDP, with only a few hundred bytes of per-core overhead.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HyperData:基于目标预取的软件数据平面数据传输加速器
数据中心系统依赖于快速、高效的I/O软件堆栈——软件数据平面(sdp)——来协调无数进程(或虚拟机)和I/O设备(网卡、ssd等)之间频繁的交互。由于当今I/O设备的速度令人印象深刻且不断增长,以及基于超租户和微服务的应用程序所带来的μs级计算,sdp在整体系统性能和效率方面发挥着至关重要的作用。在这项工作中,我们的目标是通过设计HyperData加速器来增强SDP, I/O设备和应用程序/ vm之间的数据传输。SDP系统中的数据项(如网络数据包或存储块)通过共享内存队列传输。消费级核心通常从DRAM访问数据,或者(得益于英特尔DDIO等技术)从(共享的)最后一级缓存访问数据。目前,由于缺乏适当的到达通知机制和数据缓冲区的复杂访问模式,消费者无法有效地将这些数据预取到更近的缓存中。HyperData被设计用于执行目标预取,其中精确的数据项(或所需的子集)被预取到消费者核心的L1缓存中。此外,HyperData既适用于核心设备之间的数据通信,也适用于核心设备之间的数据通信,它支持复杂的队列格式,如Virtio和多消费者队列。HyperData是通过一个每核可编程预取器和一个系统级监控集实现的,前者负责发出预取请求,后者负责监控数据到达队列并触发预取操作。在最先进的SDP模拟中,我们展示了HyperData将处理延迟提高了1.20-2.42倍,而每核开销只有几百字节。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms CoRe-ECO: Concurrent Refinement of Detailed Place-and-Route for an Efficient ECO Automation Accurate and Fast Performance Modeling of Processors with Decoupled Front-end Block-LSM: An Ether-aware Block-ordered LSM-tree based Key-Value Storage Engine Dynamic File Cache Optimization for Hybrid SSDs with High-Density and Low-Cost Flash Memory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1