Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2023-05-19 DOI:10.1145/3589980

Jia-Jen Lin, T. Ji, Xiangpeng Hao, Hokeun Cha, Yanfang Le, Xiangyao Yu, Aditya Akella

{"title":"Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs","authors":"Jia-Jen Lin, T. Ji, Xiangpeng Hao, Hokeun Cha, Yanfang Le, Xiangyao Yu, Aditya Akella","doi":"10.1145/3589980","DOIUrl":null,"url":null,"abstract":"The wide adoption of the emerging SmartNIC technology creates new opportunities to offload application-level computation into the networking layer, which frees the burden of host CPUs, leading to performance improvement. Shuffle, the all-to-all data exchange process, is a critical building block for network communication in distributed data-intensive applications and can potentially benefit from SmartNICs. In this paper, we develop SmartShuffle, which accelerates the data-intensive application's shuffle process by offloading various computation tasks into the SmartNIC devices. SmartShuffle supports offloading both low-level network functions, including data partitioning and network transport, and high-level computation tasks, including filtering, aggregation, and sorting. SmartShuffle adopts a coordinated offload architecture to make sender-side and receiver-side SmartNICs jointly contribute to the benefits of shuffle computation offload. SmartShuffle carefully manages the tight and time-varying computation and memory constraints on the device. We propose a liquid offloading approach, which dynamically migrates operators between the host CPU and the SmartNIC at runtime such that resources in both devices are fully utilized. We prototype SmartShuffle on the Stingray SoC SmartNICs and plug it into Spark. Our evaluation shows that SmartShuffle improves host CPU efficiency and I/O efficiency with lower job completion time. SmartShuffle outperforms Spark, and Spark RDMA by up to 40% on TPC-H.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The wide adoption of the emerging SmartNIC technology creates new opportunities to offload application-level computation into the networking layer, which frees the burden of host CPUs, leading to performance improvement. Shuffle, the all-to-all data exchange process, is a critical building block for network communication in distributed data-intensive applications and can potentially benefit from SmartNICs. In this paper, we develop SmartShuffle, which accelerates the data-intensive application's shuffle process by offloading various computation tasks into the SmartNIC devices. SmartShuffle supports offloading both low-level network functions, including data partitioning and network transport, and high-level computation tasks, including filtering, aggregation, and sorting. SmartShuffle adopts a coordinated offload architecture to make sender-side and receiver-side SmartNICs jointly contribute to the benefits of shuffle computation offload. SmartShuffle carefully manages the tight and time-varying computation and memory constraints on the device. We propose a liquid offloading approach, which dynamically migrates operators between the host CPU and the SmartNIC at runtime such that resources in both devices are fully utilized. We prototype SmartShuffle on the Stingray SoC SmartNICs and plug it into Spark. Our evaluation shows that SmartShuffle improves host CPU efficiency and I/O efficiency with lower job completion time. SmartShuffle outperforms Spark, and Spark RDMA by up to 40% on TPC-H.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用smartnic加速数据密集型应用的Shuffle进程

新兴SmartNIC技术的广泛采用为将应用级计算转移到网络层创造了新的机会，从而减轻了主机cpu的负担，从而提高了性能。Shuffle是全对全数据交换过程，是分布式数据密集型应用中网络通信的关键组成部分，可以从smartnic中获益。在本文中，我们开发了SmartShuffle，它通过将各种计算任务卸载到SmartNIC设备中来加速数据密集型应用程序的shuffle过程。SmartShuffle既可以卸载数据分区、网络传输等底层网络功能，也可以卸载过滤、聚合、排序等高层计算任务。SmartShuffle采用协调的分流架构，使发送端和接收端smartnic共同实现shuffle计算分流的好处。SmartShuffle仔细地管理设备上紧的和时变的计算和内存约束。我们提出了一种液体卸载方法，该方法在运行时在主机CPU和SmartNIC之间动态迁移操作符，从而充分利用两个设备中的资源。我们在Stingray SoC smartnic上原型化SmartShuffle，并将其插入Spark。我们的评估表明，SmartShuffle提高了主机CPU效率和I/O效率，同时缩短了作业完成时间。在TPC-H上，SmartShuffle的性能比Spark和Spark RDMA高出40%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量