Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA

Chengxin Guo, Hong Chen, Feng Zhang, Cuiping Li
{"title":"Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA","authors":"Chengxin Guo, Hong Chen, Feng Zhang, Cuiping Li","doi":"10.1145/3337821.3337862","DOIUrl":null,"url":null,"abstract":"In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the communication cost between nodes in distributed systems are usually bottleneck for improving system performance. Recently, GPUDirect RDMA has been developed and has received a lot of attention. It contains the features of the RDMA and GPUDirect technologies, which provides new opportunities for optimizing query processing. In this paper, we revisit the join algorithm, one of the most important operators in query processing, with GPUDirect RDMA. Specifically, we explore the performance of the hash join and sort merge join with GPUDirect RDMA. We present a new design using GPUDirect RDMA to improve the data communication in distributed join algorithms on multi-GPU clusters. We propose a series of techniques, including multi-layer data partitioning, and adaptive data communication path selection for various transmission channels. Experiments show that the proposed distributed join algorithms using GPUDirect RDMA achieve up to 1.83x performance speedup compared to the state-of-the-art distributed join algorithms. To the best of our knowledge, this is the first work for distributed GPU join algorithms. We believe that the insights and implications in this study shall shed lights on future researches using GPUDirect RDMA.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"165 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the communication cost between nodes in distributed systems are usually bottleneck for improving system performance. Recently, GPUDirect RDMA has been developed and has received a lot of attention. It contains the features of the RDMA and GPUDirect technologies, which provides new opportunities for optimizing query processing. In this paper, we revisit the join algorithm, one of the most important operators in query processing, with GPUDirect RDMA. Specifically, we explore the performance of the hash join and sort merge join with GPUDirect RDMA. We present a new design using GPUDirect RDMA to improve the data communication in distributed join algorithms on multi-GPU clusters. We propose a series of techniques, including multi-layer data partitioning, and adaptive data communication path selection for various transmission channels. Experiments show that the proposed distributed join algorithms using GPUDirect RDMA achieve up to 1.83x performance speedup compared to the state-of-the-art distributed join algorithms. To the best of our knowledge, this is the first work for distributed GPU join algorithms. We believe that the insights and implications in this study shall shed lights on future researches using GPUDirect RDMA.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于GPUDirect RDMA的多cpu集群分布式连接算法
在数据管理系统中,在gpu或分布式集群上进行查询处理已被证明是一种高效的有效方法。然而,在分布式系统中,cpu和gpu之间的PCIe数据传输开销和节点之间的通信开销往往是制约系统性能提升的瓶颈。最近,GPUDirect RDMA得到了发展,并受到了广泛的关注。它包含了RDMA和GPUDirect技术的特性,为优化查询处理提供了新的机会。在本文中,我们用GPUDirect RDMA重新讨论了查询处理中最重要的运算符之一的连接算法。具体来说,我们探讨了使用GPUDirect RDMA的散列连接和排序合并连接的性能。提出了一种利用GPUDirect RDMA改进多gpu集群分布式连接算法中的数据通信的新设计。我们提出了一系列的技术,包括多层数据划分和自适应数据通信路径选择的各种传输信道。实验表明,采用GPUDirect RDMA的分布式连接算法与现有的分布式连接算法相比,性能提升高达1.83倍。据我们所知,这是分布式GPU连接算法的第一个工作。我们相信本研究的见解和意义将为未来使用GPUDirect RDMA的研究提供启示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Express Link Placement for NoC-Based Many-Core Platforms Cartesian Collective Communication Artemis A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs diBELLA: Distributed Long Read to Long Read Alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1