Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337862

Chengxin Guo, Hong Chen, Feng Zhang, Cuiping Li

{"title":"Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA","authors":"Chengxin Guo, Hong Chen, Feng Zhang, Cuiping Li","doi":"10.1145/3337821.3337862","DOIUrl":null,"url":null,"abstract":"In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the communication cost between nodes in distributed systems are usually bottleneck for improving system performance. Recently, GPUDirect RDMA has been developed and has received a lot of attention. It contains the features of the RDMA and GPUDirect technologies, which provides new opportunities for optimizing query processing. In this paper, we revisit the join algorithm, one of the most important operators in query processing, with GPUDirect RDMA. Specifically, we explore the performance of the hash join and sort merge join with GPUDirect RDMA. We present a new design using GPUDirect RDMA to improve the data communication in distributed join algorithms on multi-GPU clusters. We propose a series of techniques, including multi-layer data partitioning, and adaptive data communication path selection for various transmission channels. Experiments show that the proposed distributed join algorithms using GPUDirect RDMA achieve up to 1.83x performance speedup compared to the state-of-the-art distributed join algorithms. To the best of our knowledge, this is the first work for distributed GPU join algorithms. We believe that the insights and implications in this study shall shed lights on future researches using GPUDirect RDMA.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"165 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In data management systems, query processing on GPUs or distributed clusters have proven to be an effective method for high efficiency. However, the high PCIe data transfer overhead between CPUs and GPUs, and the communication cost between nodes in distributed systems are usually bottleneck for improving system performance. Recently, GPUDirect RDMA has been developed and has received a lot of attention. It contains the features of the RDMA and GPUDirect technologies, which provides new opportunities for optimizing query processing. In this paper, we revisit the join algorithm, one of the most important operators in query processing, with GPUDirect RDMA. Specifically, we explore the performance of the hash join and sort merge join with GPUDirect RDMA. We present a new design using GPUDirect RDMA to improve the data communication in distributed join algorithms on multi-GPU clusters. We propose a series of techniques, including multi-layer data partitioning, and adaptive data communication path selection for various transmission channels. Experiments show that the proposed distributed join algorithms using GPUDirect RDMA achieve up to 1.83x performance speedup compared to the state-of-the-art distributed join algorithms. To the best of our knowledge, this is the first work for distributed GPU join algorithms. We believe that the insights and implications in this study shall shed lights on future researches using GPUDirect RDMA.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于GPUDirect RDMA的多cpu集群分布式连接算法

在数据管理系统中，在gpu或分布式集群上进行查询处理已被证明是一种高效的有效方法。然而，在分布式系统中，cpu和gpu之间的PCIe数据传输开销和节点之间的通信开销往往是制约系统性能提升的瓶颈。最近，GPUDirect RDMA得到了发展，并受到了广泛的关注。它包含了RDMA和GPUDirect技术的特性，为优化查询处理提供了新的机会。在本文中，我们用GPUDirect RDMA重新讨论了查询处理中最重要的运算符之一的连接算法。具体来说，我们探讨了使用GPUDirect RDMA的散列连接和排序合并连接的性能。提出了一种利用GPUDirect RDMA改进多gpu集群分布式连接算法中的数据通信的新设计。我们提出了一系列的技术，包括多层数据划分和自适应数据通信路径选择的各种传输信道。实验表明，采用GPUDirect RDMA的分布式连接算法与现有的分布式连接算法相比，性能提升高达1.83倍。据我们所知，这是分布式GPU连接算法的第一个工作。我们相信本研究的见解和意义将为未来使用GPUDirect RDMA的研究提供启示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊