Estimating the Impact of Communication Schemes for Distributed Graph Processing

Tian Ye, S. Kuppannagari, C. Rose, Sasindu Wijeratne, R. Kannan, V. Prasanna
{"title":"Estimating the Impact of Communication Schemes for Distributed Graph Processing","authors":"Tian Ye, S. Kuppannagari, C. Rose, Sasindu Wijeratne, R. Kannan, V. Prasanna","doi":"10.1109/ISPDC55340.2022.00016","DOIUrl":null,"url":null,"abstract":"Extreme scale graph analytics is imperative for several real-world Big Data applications with the underlying graph structure containing millions or billions of vertices and edges. Since such huge graphs cannot fit into the memory of a single computer, distributed processing of the graph is required. Several frameworks have been developed for performing graph processing on distributed systems. The frameworks focus primarily on choosing the right computation model and the partitioning scheme under the assumption that such design choices will automatically reduce the communication overheads. For any computational model and partitioning scheme, communication schemes — the data to be communicated and the virtual interconnection network among the nodes — have significant impact on the performance. To analyze this impact, in this work, we identify widely used communication schemes and estimate their performance. Analyzing the trade-offs between the number of compute nodes and communication costs of various schemes on a distributed platform by brute force experimentation can be prohibitively expensive. Thus, our performance estimation models provide an economic way to perform the analyses given the partitions and the communication scheme as input. We validate our model on a local HPC cluster as well as the cloud hosted NSF Chameleon cluster. Using our estimates as well as the actual measurements, we compare the communication schemes and provide conditions under which one scheme should be preferred over the others.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC55340.2022.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Extreme scale graph analytics is imperative for several real-world Big Data applications with the underlying graph structure containing millions or billions of vertices and edges. Since such huge graphs cannot fit into the memory of a single computer, distributed processing of the graph is required. Several frameworks have been developed for performing graph processing on distributed systems. The frameworks focus primarily on choosing the right computation model and the partitioning scheme under the assumption that such design choices will automatically reduce the communication overheads. For any computational model and partitioning scheme, communication schemes — the data to be communicated and the virtual interconnection network among the nodes — have significant impact on the performance. To analyze this impact, in this work, we identify widely used communication schemes and estimate their performance. Analyzing the trade-offs between the number of compute nodes and communication costs of various schemes on a distributed platform by brute force experimentation can be prohibitively expensive. Thus, our performance estimation models provide an economic way to perform the analyses given the partitions and the communication scheme as input. We validate our model on a local HPC cluster as well as the cloud hosted NSF Chameleon cluster. Using our estimates as well as the actual measurements, we compare the communication schemes and provide conditions under which one scheme should be preferred over the others.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
估计通信方案对分布式图处理的影响
对于包含数百万或数十亿个顶点和边的底层图结构的几个现实世界的大数据应用程序来说,极端尺度图分析是必不可少的。由于如此庞大的图形无法装入单个计算机的内存,因此需要对图形进行分布式处理。为了在分布式系统上执行图形处理,已经开发了几个框架。这些框架主要关注选择正确的计算模型和划分方案,并假设这样的设计选择将自动减少通信开销。对于任何计算模型和分区方案,通信方案——要通信的数据和节点之间的虚拟互联网络——对性能有重要影响。为了分析这种影响,在这项工作中,我们确定了广泛使用的通信方案并估计了它们的性能。通过蛮力实验分析分布式平台上各种方案的计算节点数量和通信成本之间的权衡可能会非常昂贵。因此,我们的性能估计模型提供了一种经济的方法来执行给定分区和通信方案作为输入的分析。我们在本地HPC集群和云托管的NSF变色龙集群上验证了我们的模型。使用我们的估计和实际测量,我们比较了通信方案,并提供了一种方案应该优于其他方案的条件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Estimating the Impact of Communication Schemes for Distributed Graph Processing Sponsors and Conference Support Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory [Full] Deep Heuristic for Broadcasting in Arbitrary Networks Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1