sRDMA: RDMA通用的低开销调度程序

Xizheng Wang, Shuai Wang, Dan Li
{"title":"sRDMA: RDMA通用的低开销调度程序","authors":"Xizheng Wang, Shuai Wang, Dan Li","doi":"10.1145/3600061.3600082","DOIUrl":null,"url":null,"abstract":"Remote Direct Memory Access (RDMA) has been widely deployed in data centers to improve application performance. However, the characteristic of RDMA to deliver messages in order cannot meet the emerging requirements of applications for scheduling messages within an RDMA connection, making RDMA unable to be fully utilized. Some works try to schedule the data to be transferred in specific applications before delivering to RDMA, or distribute messages to different connections. However, these approaches tightly couple scheduling logic with application logic and may result in high scheduling overhead. In this paper, we propose sRDMA, a general and low-overhead scheduler working in user-space RDMA driver. sRDMA allows the application to express the expected transfer order to RDMA hardware via work requests (WRs). With priority information in WRs, sRDMA slices and schedules WRs to achieve desired order of message transfer and reduce blocking impact of large messages in the same RDMA connection. Our experiments show that sRDMA can improve the performance of applications, e.g., TensorFlow, by up to , and sRDMA has negligible overhead in terms of CPU and flow throughput.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"sRDMA: A General and Low-Overhead Scheduler for RDMA\",\"authors\":\"Xizheng Wang, Shuai Wang, Dan Li\",\"doi\":\"10.1145/3600061.3600082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Remote Direct Memory Access (RDMA) has been widely deployed in data centers to improve application performance. However, the characteristic of RDMA to deliver messages in order cannot meet the emerging requirements of applications for scheduling messages within an RDMA connection, making RDMA unable to be fully utilized. Some works try to schedule the data to be transferred in specific applications before delivering to RDMA, or distribute messages to different connections. However, these approaches tightly couple scheduling logic with application logic and may result in high scheduling overhead. In this paper, we propose sRDMA, a general and low-overhead scheduler working in user-space RDMA driver. sRDMA allows the application to express the expected transfer order to RDMA hardware via work requests (WRs). With priority information in WRs, sRDMA slices and schedules WRs to achieve desired order of message transfer and reduce blocking impact of large messages in the same RDMA connection. Our experiments show that sRDMA can improve the performance of applications, e.g., TensorFlow, by up to , and sRDMA has negligible overhead in terms of CPU and flow throughput.\",\"PeriodicalId\":228934,\"journal\":{\"name\":\"Proceedings of the 7th Asia-Pacific Workshop on Networking\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th Asia-Pacific Workshop on Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3600061.3600082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th Asia-Pacific Workshop on Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3600061.3600082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

RDMA (Remote Direct Memory Access)技术被广泛应用于数据中心,以提高应用程序的性能。但是,RDMA按顺序传递消息的特性不能满足应用程序对RDMA连接内消息调度的新需求,无法充分利用RDMA。一些工作尝试在交付到RDMA之前安排在特定应用程序中传输的数据,或者将消息分发到不同的连接。然而,这些方法将调度逻辑与应用程序逻辑紧密耦合,可能导致较高的调度开销。在本文中,我们提出了sRDMA,一个在用户空间RDMA驱动程序中工作的通用的低开销调度程序。sRDMA允许应用程序通过工作请求(wr)向RDMA硬件表达预期的传输顺序。利用wr中的优先级信息,sRDMA对wr进行切片和调度,以实现所需的消息传输顺序,并减少同一RDMA连接中大消息的阻塞影响。我们的实验表明,sRDMA可以将应用程序(例如TensorFlow)的性能提高多达,并且sRDMA在CPU和流量吞吐量方面的开销可以忽略不计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
sRDMA: A General and Low-Overhead Scheduler for RDMA
Remote Direct Memory Access (RDMA) has been widely deployed in data centers to improve application performance. However, the characteristic of RDMA to deliver messages in order cannot meet the emerging requirements of applications for scheduling messages within an RDMA connection, making RDMA unable to be fully utilized. Some works try to schedule the data to be transferred in specific applications before delivering to RDMA, or distribute messages to different connections. However, these approaches tightly couple scheduling logic with application logic and may result in high scheduling overhead. In this paper, we propose sRDMA, a general and low-overhead scheduler working in user-space RDMA driver. sRDMA allows the application to express the expected transfer order to RDMA hardware via work requests (WRs). With priority information in WRs, sRDMA slices and schedules WRs to achieve desired order of message transfer and reduce blocking impact of large messages in the same RDMA connection. Our experiments show that sRDMA can improve the performance of applications, e.g., TensorFlow, by up to , and sRDMA has negligible overhead in terms of CPU and flow throughput.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Deadline Enables In-Order Flowlet Switching for Load Balancing Online Detection of 1D and 2D Hierarchical Super-Spreaders in High-Speed Networks ABC: Adaptive Bitrate Algorithm Commander for Multi-Client Video Streaming Bamboo: Boosting Training Efficiency for Real-Time Video Streaming via Online Grouped Federated Transfer Learning Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1