TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, Hwanjo Yu
{"title":"TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC","authors":"Wook-Shin Han, Sangyeon Lee, Kyungyeol Park, Jeong-Hoon Lee, Min-Soo Kim, Jinha Kim, Hwanjo Yu","doi":"10.1145/2487575.2487581","DOIUrl":null,"url":null,"abstract":"Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}\" as executable files.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"244","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 244

Abstract

Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributed systems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called Graph-Chi, has been recently proposed. Although Graph-Chi significantly outperforms all representative (disk-based) distributed graph engines, we observe that Graph-Chi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called TurboGraph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms Graph-Chi by up to four orders of magnitude! Our implementation of TurboGraph is available at ``http://wshan.net/turbograph}" as executable files.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TurboGraph:一个快速并行图形引擎,在单个PC上处理十亿规模的图形
图用于模拟许多真实对象,如社交网络和网络图。各个领域的许多实际应用都需要对大规模图结构数据进行高效的管理。尽管像GBase和Pregel这样的分布式图形引擎可以处理十亿规模的图形,但是用户需要熟练地管理和调优集群中的分布式系统,这对于普通用户来说是一项非常重要的工作。此外,为了提供合理的性能,这些分布式系统需要集群中的许多机器。为了解决这个问题,最近提出了一种基于磁盘的并行图引擎,称为graph - chi。尽管graph - chi的性能明显优于所有代表性的(基于磁盘的)分布式图引擎,但我们观察到,由于1)有限的并行性和2)I/O处理和CPU处理的单独步骤,对于许多重要类型的图查询,graph - chi仍然存在严重的性能问题。在本文中,我们提出了一个通用的,基于磁盘的图形引擎,称为TurboGraph,通过在单个PC上使用现代硬件来非常有效地处理十亿规模的图形。TurboGraph是第一个真正的并行图形引擎,它利用了1)全并行性,包括多核并行性和FlashSSD IO并行性;2)CPU处理和I/O处理尽可能地完全重叠。具体来说,我们提出了一种新的并行执行模型,称为钉-滑动。TurboGraph还提供引擎级操作器,如BFS,这些操作器是在钉-滑动模型下实现的。使用大型真实数据集的广泛实验结果表明,TurboGraph始终显著优于Graph-Chi多达四个数量级!我们的TurboGraph实现可以在“http://wshan.net/turbograph}”以可执行文件的形式获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A general bootstrap performance diagnostic Flexible and robust co-regularized multi-domain graph clustering Beyond myopic inference in big data pipelines Constrained stochastic gradient descent for large-scale least squares problem Inferring distant-time location in low-sampling-rate trajectories
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1