Design and Experimental Evaluation of Distributed Heterogeneous Graph-Processing Systems

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.53

Yong Guo, A. Varbanescu, D. Epema, A. Iosup

{"title":"Design and Experimental Evaluation of Distributed Heterogeneous Graph-Processing Systems","authors":"Yong Guo, A. Varbanescu, D. Epema, A. Iosup","doi":"10.1109/CCGrid.2016.53","DOIUrl":null,"url":null,"abstract":"Graph processing is increasingly used in a variety of domains, from engineering to logistics and from scientific computing to online gaming. To process graphs efficiently, GPU-enabled graph-processing systems such as TOTEM and Medusa exploit the GPU or the combined CPU+GPU capabilities of a single machine. Unlike scalable distributed CPU-based systems such as Pregel and GraphX, existing GPU-enabled systems are restricted to the resources of a single machine, including the limited amount of GPU memory, and thus cannot analyze the increasingly large-scale graphs we see in practice. To address this problem, we design and implement three families of distributed heterogeneous graph-processing systems that can use both the CPUs and GPUs of multiple machines. We further focus on graph partitioning, for which we compare existing graph-partitioning policies and a new policy specifically targeted at heterogeneity. We implement all our distributed heterogeneous systems based on the programming model of the single-machine TOTEM, to which we add (1) a new communication layer for CPUs and GPUs across multiple machines to support distributed graphs, and (2) a workload partitioning method that uses offline profiling to distribute the work on the CPUs and the GPUs. We conduct a comprehensive real-world performance evaluation for all three families. To ensure representative results, we select 3 typical algorithms and 5 datasets with different characteristics. Our results include algorithm run time, performance breakdown, scalability, graph partitioning time, and comparison with other graph-processing systems. They demonstrate the feasibility of distributed heterogeneous graph processing and show evidence of the high performance that can be achieved by combining CPUs and GPUs in a distributed environment.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Graph processing is increasingly used in a variety of domains, from engineering to logistics and from scientific computing to online gaming. To process graphs efficiently, GPU-enabled graph-processing systems such as TOTEM and Medusa exploit the GPU or the combined CPU+GPU capabilities of a single machine. Unlike scalable distributed CPU-based systems such as Pregel and GraphX, existing GPU-enabled systems are restricted to the resources of a single machine, including the limited amount of GPU memory, and thus cannot analyze the increasingly large-scale graphs we see in practice. To address this problem, we design and implement three families of distributed heterogeneous graph-processing systems that can use both the CPUs and GPUs of multiple machines. We further focus on graph partitioning, for which we compare existing graph-partitioning policies and a new policy specifically targeted at heterogeneity. We implement all our distributed heterogeneous systems based on the programming model of the single-machine TOTEM, to which we add (1) a new communication layer for CPUs and GPUs across multiple machines to support distributed graphs, and (2) a workload partitioning method that uses offline profiling to distribute the work on the CPUs and the GPUs. We conduct a comprehensive real-world performance evaluation for all three families. To ensure representative results, we select 3 typical algorithms and 5 datasets with different characteristics. Our results include algorithm run time, performance breakdown, scalability, graph partitioning time, and comparison with other graph-processing systems. They demonstrate the feasibility of distributed heterogeneous graph processing and show evidence of the high performance that can be achieved by combining CPUs and GPUs in a distributed environment.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布式异构图形处理系统的设计与实验评价

图形处理越来越多地应用于各种领域，从工程到物流，从科学计算到在线游戏。为了有效地处理图形，支持GPU的图形处理系统(如TOTEM和Medusa)利用GPU或单个机器的CPU+GPU组合功能。与Pregel和GraphX等可扩展的分布式cpu系统不同，现有的支持GPU的系统受限于单个机器的资源，包括有限的GPU内存，因此无法分析我们在实践中看到的日益大规模的图形。为了解决这个问题，我们设计并实现了三种分布式异构图形处理系统，它们可以同时使用多台机器的cpu和gpu。我们进一步关注图分区，为此我们比较了现有的图分区策略和专门针对异质性的新策略。我们基于单机TOTEM的编程模型实现了我们所有的分布式异构系统，在此基础上我们增加了(1)跨多台机器的cpu和gpu的新通信层来支持分布式图形，以及(2)使用离线分析的工作负载分区方法来分配cpu和gpu上的工作。我们对这三个家庭进行了全面的实际表现评估。为了确保结果具有代表性，我们选择了3种典型算法和5个不同特征的数据集。我们的结果包括算法运行时间、性能分解、可伸缩性、图分区时间以及与其他图处理系统的比较。他们展示了分布式异构图形处理的可行性，并展示了在分布式环境中结合cpu和gpu可以实现高性能的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量