A lightweight RDMA connection protocol based on post-hoc confirmation

IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-10-01 DOI:10.1016/j.jpdc.2024.104991
Ke Wu, Dezun Dong, Weixia Xu
{"title":"A lightweight RDMA connection protocol based on post-hoc confirmation","authors":"Ke Wu,&nbsp;Dezun Dong,&nbsp;Weixia Xu","doi":"10.1016/j.jpdc.2024.104991","DOIUrl":null,"url":null,"abstract":"<div><div>With the increasing scale and complexity of high-performance computing systems, the rising failure rate poses significant challenges for RDMA networks that aim for high bandwidth and low latency. RDMA networks require hardware-level end-to-end reliable data transmission services to avoid the high cost of software failure recovery. Tianhe HPC interconnection network adopts a NIC-based RDMA reliable connection protocol, RCP. RCP establishes a connection for each message that enters the NIC and releases it after the transmission is complete. However, this introduces an additional round-trip time RTT connection overhead for each message, which severely impacts the performance of networks dominated by short messages in high-performance computing systems. We have found that utilization of receiver-side connection resources has been consistently low because maintaining message-grained connections on the NIC results in rapid release of connections. Therefore, we propose a lightweight RDMA connection protocol based on post-hoc confirmation, PCP. PCP assumes the receiver has connection resources by default and eliminates the need for confirmation from the receiver before sending a message, thus reducing the connection overhead of almost all messages by one RTT. At the same time, PCP also includes mechanisms to address the special case where the receiver lacks connection resources. Evaluation results demonstrate that PCP significantly optimizes short messages and applications dominated by short messages. Moreover, PCP further reduces the usage of receiver-side connection resources. Additionally, PCP does not experience performance degradation even under large-scale heavy loads and severe endpoint congestion.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001552","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

With the increasing scale and complexity of high-performance computing systems, the rising failure rate poses significant challenges for RDMA networks that aim for high bandwidth and low latency. RDMA networks require hardware-level end-to-end reliable data transmission services to avoid the high cost of software failure recovery. Tianhe HPC interconnection network adopts a NIC-based RDMA reliable connection protocol, RCP. RCP establishes a connection for each message that enters the NIC and releases it after the transmission is complete. However, this introduces an additional round-trip time RTT connection overhead for each message, which severely impacts the performance of networks dominated by short messages in high-performance computing systems. We have found that utilization of receiver-side connection resources has been consistently low because maintaining message-grained connections on the NIC results in rapid release of connections. Therefore, we propose a lightweight RDMA connection protocol based on post-hoc confirmation, PCP. PCP assumes the receiver has connection resources by default and eliminates the need for confirmation from the receiver before sending a message, thus reducing the connection overhead of almost all messages by one RTT. At the same time, PCP also includes mechanisms to address the special case where the receiver lacks connection resources. Evaluation results demonstrate that PCP significantly optimizes short messages and applications dominated by short messages. Moreover, PCP further reduces the usage of receiver-side connection resources. Additionally, PCP does not experience performance degradation even under large-scale heavy loads and severe endpoint congestion.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于事后确认的轻量级 RDMA 连接协议
随着高性能计算系统的规模和复杂性不断扩大,故障率的上升给追求高带宽和低延迟的 RDMA 网络带来了巨大挑战。RDMA 网络需要硬件级的端到端可靠数据传输服务,以避免高昂的软件故障恢复成本。天河高性能计算互连网络采用了基于网卡的 RDMA 可靠连接协议 RCP。RCP 为每个进入网卡的报文建立连接,并在传输完成后释放连接。然而,这为每个报文带来了额外的往返时间 RTT 连接开销,严重影响了高性能计算系统中以短报文为主的网络性能。我们发现,接收端连接资源的利用率一直很低,因为在网卡上维护消息粒度连接会导致连接的快速释放。因此,我们提出了一种基于事后确认的轻量级 RDMA 连接协议 PCP。PCP 默认假定接收方拥有连接资源,在发送消息前无需接收方确认,因此几乎所有消息的连接开销都减少了一个 RTT。同时,PCP 还包括处理接收方缺乏连接资源的特殊情况的机制。评估结果表明,PCP 显著优化了短信息和以短信息为主的应用。此外,PCP 还进一步减少了接收方连接资源的使用。此外,即使在大规模重负载和端点严重拥塞的情况下,PCP 也不会出现性能下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing 工程技术-计算机:理论方法
CiteScore
10.30
自引率
2.60%
发文量
172
审稿时长
12 months
期刊介绍: This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.
期刊最新文献
Fault-tolerance in biswapped multiprocessor interconnection networks Editorial Board Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) Design and experimental evaluation of algorithms for optimizing the throughput of dispersed computing Hands-on parallel & distributed computing with Raspberry Pi devices and clusters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1