{"title":"HyperData: A Data Transfer Accelerator for Software Data Planes Based on Targeted Prefetching","authors":"Hossein Golestani, T. Wenisch","doi":"10.1109/ICCD53106.2021.00059","DOIUrl":null,"url":null,"abstract":"Datacenter systems rely on fast, efficient I/O soft-ware stacks—Software Data Planes (SDPs)—to coordinate frequent interaction among myriad processes (or VMs) and I/O devices (NICs, SSDs, etc.). Thanks to the impressive and ever-growing speed of today’s I/O devices and μs-scale computation due to hyper-tenancy and microservice-based applications, SDPs play a crucial role in overall system performance and efficiency. In this work, we aim to enhance data transfer among the SDP, I/O devices, and applications/VMs by designing the HyperData accelerator. Data items in SDP systems, such as network packets or storage blocks, are transferred through shared memory queues. Consumer cores typically access the data from DRAM or, thanks to technologies like Intel DDIO, from the (shared) last-level cache. Today, consumers cannot effectively prefetch such data to nearer caches due to the lack of a proper arrival notification mechanism and the complex access pattern of data buffers. HyperData is designed to perform targeted prefetching, wherein the exact data items (or a required subset) are prefetched to the L1 cache of the consumer core. Furthermore, HyperData is applicable to both core–device and core–core data communication, and it supports complex queue formats like Virtio and multi-consumer queues. HyperData is realized with a per-core programmable prefetcher, which issues the prefetch requests, and a system-level monitoring set, which monitors queues for data arrival and triggers prefetch operations. We show that HyperData improves processing latency by 1.20-2.42× in a simulation of a state-of-the-art SDP, with only a few hundred bytes of per-core overhead.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Datacenter systems rely on fast, efficient I/O soft-ware stacks—Software Data Planes (SDPs)—to coordinate frequent interaction among myriad processes (or VMs) and I/O devices (NICs, SSDs, etc.). Thanks to the impressive and ever-growing speed of today’s I/O devices and μs-scale computation due to hyper-tenancy and microservice-based applications, SDPs play a crucial role in overall system performance and efficiency. In this work, we aim to enhance data transfer among the SDP, I/O devices, and applications/VMs by designing the HyperData accelerator. Data items in SDP systems, such as network packets or storage blocks, are transferred through shared memory queues. Consumer cores typically access the data from DRAM or, thanks to technologies like Intel DDIO, from the (shared) last-level cache. Today, consumers cannot effectively prefetch such data to nearer caches due to the lack of a proper arrival notification mechanism and the complex access pattern of data buffers. HyperData is designed to perform targeted prefetching, wherein the exact data items (or a required subset) are prefetched to the L1 cache of the consumer core. Furthermore, HyperData is applicable to both core–device and core–core data communication, and it supports complex queue formats like Virtio and multi-consumer queues. HyperData is realized with a per-core programmable prefetcher, which issues the prefetch requests, and a system-level monitoring set, which monitors queues for data arrival and triggers prefetch operations. We show that HyperData improves processing latency by 1.20-2.42× in a simulation of a state-of-the-art SDP, with only a few hundred bytes of per-core overhead.