TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2487581

Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, Hwanjo Yu

{"title":"TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC","authors":"Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, Hwanjo Yu","doi":"10.1145/2487575.2487581","DOIUrl":null,"url":null,"abstract":"Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}\" as executable files.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"244","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 244

Abstract

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TurboGraph:一个快速并行图形引擎，在单个PC上处理十亿规模的图形

图用于模拟许多真实对象，如社交网络和网络图。各个领域的许多实际应用都需要对大规模图结构数据进行高效的管理。尽管像GBase和Pregel这样的分布式图形引擎可以处理十亿规模的图形，但是用户需要熟练地管理和调优集群中的分布式系统，这对于普通用户来说是一项非常重要的工作。此外，为了提供合理的性能，这些分布式系统需要集群中的许多机器。为了解决这个问题，最近提出了一种基于磁盘的并行图引擎，称为graph - chi。尽管graph - chi的性能明显优于所有代表性的(基于磁盘的)分布式图引擎，但我们观察到，由于1)有限的并行性和2)I/O处理和CPU处理的单独步骤，对于许多重要类型的图查询，graph - chi仍然存在严重的性能问题。在本文中，我们提出了一个通用的，基于磁盘的图形引擎，称为TurboGraph，通过在单个PC上使用现代硬件来非常有效地处理十亿规模的图形。TurboGraph是第一个真正的并行图形引擎，它利用了1)全并行性，包括多核并行性和FlashSSD IO并行性;2)CPU处理和I/O处理尽可能地完全重叠。具体来说，我们提出了一种新的并行执行模型，称为钉-滑动。TurboGraph还提供引擎级操作器，如BFS，这些操作器是在钉-滑动模型下实现的。使用大型真实数据集的广泛实验结果表明，TurboGraph始终显著优于Graph-Chi多达四个数量级!我们的TurboGraph实现可以在“http://wshan.net/turbograph}”以可执行文件的形式获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories