Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines

S. Lumetta, A. Krishnamurthy, D. Culler
{"title":"Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines","authors":"S. Lumetta, A. Krishnamurthy, D. Culler","doi":"10.1145/224170.224275","DOIUrl":null,"url":null,"abstract":"We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

We present and analyze a portable, high-performance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the effect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
并行机器上快速连接组件算法的性能建模研究
我们提出并分析了一种可移植的高性能算法,用于在现代分布式内存多处理器上查找连接组件。该算法是每个处理器本地子图上的经典DFS算法和子图全局集合上的Shiloach-Vishkin PRAM算法的一种变体的混合。我们在Split-C中实现了该算法,并在Cray T3D、Meiko CS-2和Thinking Machines CM-5上使用从计算物理中的聚类动力学方法派生的一类图来测量性能。在256处理器的Cray T3D上,该实现比以前的所有解决方案都要好一个数量级。图形参数的特性使我们能够选择突出显示关键性能特征的图形。我们研究了这些参数和机器特性对算法局部和全局阶段之间时间平衡的影响,发现边缘密度、面体积比和相对通信成本主导了性能。通过了解机器特性对性能的影响,该研究揭示了计算和/或通信性能改进对这个具有挑战性的问题的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Web Interface to Parallel Program Source Code Archetypes Parallel Implementations of the Power System Transient Stability Problem on Clusters of Workstations The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors SCIRun: A Scientific Programming Environment for Computational Steering Surface Fitting Using GCV Smoothing Splines on Supercomputers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1