COREC: Concurrent non-blocking single-queue receive driver for low latency networking

IF 4.6 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Computer Networks Pub Date : 2025-02-01 Epub Date: 2024-12-26 DOI:10.1016/j.comnet.2024.110982
Marco Faltelli , Giacomo Belocchi , Francesco Quaglia , Giuseppe Bianchi
{"title":"COREC: Concurrent non-blocking single-queue receive driver for low latency networking","authors":"Marco Faltelli ,&nbsp;Giacomo Belocchi ,&nbsp;Francesco Quaglia ,&nbsp;Giuseppe Bianchi","doi":"10.1016/j.comnet.2024.110982","DOIUrl":null,"url":null,"abstract":"<div><div>Existing network stacks tackle performance and scalability aspects by relying on multiple receive queues. However, at software level, each queue is processed by a single thread, which prevents simultaneous work on the same queue and limits performance in terms of tail latency. To overcome this limitation, we introduce COREC, the first software implementation of a concurrent non-blocking single-queue receive driver. By sharing a single queue among multiple threads, workload distribution is improved, leading to a work-conserving policy for network stacks. On the technical side, instead of relying on traditional critical sections — which would sequentialize the operations by threads — COREC coordinates the threads that concurrently access the same receive queue in non-blocking manner via atomic machine instructions from the Read-Modify-Write (RMW) class. These instructions allow threads to access and update memory locations atomically, based on specific conditions, such as the matching of a target value selected by the thread. Also, they enable making any update globally visible in the memory hierarchy, bypassing interference on memory consistency caused by the CPU store buffers. Extensive evaluation results demonstrate that the possible additional reordering, which our approach may occasionally cause, is non-critical and has minimal impact on performance, even in the worst-case scenario of a single large TCP flow, with performance impairments accounting to at most 2-3 percent. Conversely, substantial latency gains are achieved when handling UDP traffic, real-world traffic mix, and multiple shorter TCP flows.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"258 ","pages":"Article 110982"},"PeriodicalIF":4.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624008144","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Existing network stacks tackle performance and scalability aspects by relying on multiple receive queues. However, at software level, each queue is processed by a single thread, which prevents simultaneous work on the same queue and limits performance in terms of tail latency. To overcome this limitation, we introduce COREC, the first software implementation of a concurrent non-blocking single-queue receive driver. By sharing a single queue among multiple threads, workload distribution is improved, leading to a work-conserving policy for network stacks. On the technical side, instead of relying on traditional critical sections — which would sequentialize the operations by threads — COREC coordinates the threads that concurrently access the same receive queue in non-blocking manner via atomic machine instructions from the Read-Modify-Write (RMW) class. These instructions allow threads to access and update memory locations atomically, based on specific conditions, such as the matching of a target value selected by the thread. Also, they enable making any update globally visible in the memory hierarchy, bypassing interference on memory consistency caused by the CPU store buffers. Extensive evaluation results demonstrate that the possible additional reordering, which our approach may occasionally cause, is non-critical and has minimal impact on performance, even in the worst-case scenario of a single large TCP flow, with performance impairments accounting to at most 2-3 percent. Conversely, substantial latency gains are achieved when handling UDP traffic, real-world traffic mix, and multiple shorter TCP flows.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
COREC:用于低延迟网络的并发非阻塞单队列接收驱动程序
现有的网络堆栈通过依赖多个接收队列来解决性能和可伸缩性问题。然而,在软件级别,每个队列由单个线程处理,这防止了在同一队列上同时工作,并在尾部延迟方面限制了性能。为了克服这一限制,我们引入了COREC,这是并发非阻塞单队列接收驱动程序的第一个软件实现。通过在多个线程之间共享单个队列,可以改进工作负载分布,从而为网络堆栈提供节省工作的策略。在技术方面,COREC没有依赖传统的临界区(将线程的操作顺序化),而是通过来自读-修改-写(RMW)类的原子机器指令,以非阻塞的方式协调并发访问同一接收队列的线程。这些指令允许线程根据特定条件自动访问和更新内存位置,比如线程选择的目标值是否匹配。此外,它们可以使任何更新在内存层次结构中全局可见,从而绕过CPU存储缓冲区对内存一致性造成的干扰。广泛的评估结果表明,我们的方法可能偶尔引起的额外重新排序是非关键的,对性能的影响最小,即使在单个大型TCP流的最坏情况下,性能损失最多占2- 3%。相反,在处理UDP流量、真实流量混合和多个较短的TCP流时,可以获得大量的延迟增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Networks
Computer Networks 工程技术-电信学
CiteScore
10.80
自引率
3.60%
发文量
434
审稿时长
8.6 months
期刊介绍: Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.
期刊最新文献
From simulation to deep learning: Survey on network performance modeling approaches Eco-efficient task scheduling for MLLMs in edge-cloud continuum TraceX: Early-stage advanced persistent threat detection framework using semantic network traffic analysis Beyond flat identification: Exploiting site-page structure for hierarchical webpage fingerprinting RFD-R: AI-driven dynamic repacking framework for cloud-native O-RAN functions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1