Network Topologies and Inevitable Contention

Grey Ballard, J. Demmel, A. Gearhart, Benjamin Lipshitz, Yishai Oltchik, O. Schwartz, Sivan Toledo
{"title":"Network Topologies and Inevitable Contention","authors":"Grey Ballard, J. Demmel, A. Gearhart, Benjamin Lipshitz, Yishai Oltchik, O. Schwartz, Sivan Toledo","doi":"10.1109/COM-HPC.2016.10","DOIUrl":null,"url":null,"abstract":"Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network contention may inevitably become a bottleneck, even if algorithms are optimally designed so that each processor communicates as little as possible. We obtain novel contention lower bounds that are functions of the network and the computation graph parameters. For several combinations of fundamental computations and common network topologies, our new analysis improves upon previous per-processor lower bounds which only specify the number of words communicated by the busiest individual processor. We consider torus and mesh topologies, universal fat-trees, and hypercubes; algorithms covered include classical matrix multiplication and direct numerical linear algebra, fast matrix multiplication algorithms, programs that reference arrays, N-body computations, and the FFT. For example, we show that fast matrix multiplication algorithms (e.g., Strassen's) running on a 3D torus will suffer from contention bottlenecks. On the other hand, this network is likely sufficient for a classical matrix multiplication algorithm. Our new lower bounds are matched by existing algorithms only in very few cases, leaving many open problems for network and algorithmic design.","PeriodicalId":332852,"journal":{"name":"2016 First International Workshop on Communication Optimizations in HPC (COMHPC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 First International Workshop on Communication Optimizations in HPC (COMHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COM-HPC.2016.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network contention may inevitably become a bottleneck, even if algorithms are optimally designed so that each processor communicates as little as possible. We obtain novel contention lower bounds that are functions of the network and the computation graph parameters. For several combinations of fundamental computations and common network topologies, our new analysis improves upon previous per-processor lower bounds which only specify the number of words communicated by the busiest individual processor. We consider torus and mesh topologies, universal fat-trees, and hypercubes; algorithms covered include classical matrix multiplication and direct numerical linear algebra, fast matrix multiplication algorithms, programs that reference arrays, N-body computations, and the FFT. For example, we show that fast matrix multiplication algorithms (e.g., Strassen's) running on a 3D torus will suffer from contention bottlenecks. On the other hand, this network is likely sufficient for a classical matrix multiplication algorithm. Our new lower bounds are matched by existing algorithms only in very few cases, leaving many open problems for network and algorithmic design.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
网络拓扑与不可避免的争用
由于处理器间的通信,网络拓扑结构会对并行算法的执行成本产生重大影响。对于计算和网络拓扑的特定组合,代价高昂的网络争用可能不可避免地成为瓶颈,即使算法经过优化设计,使每个处理器尽可能少地通信。我们得到了新的竞争下界,它是网络和计算图参数的函数。对于基本计算和公共网络拓扑的几种组合,我们的新分析改进了以前的每处理器下限,该下限仅指定最繁忙的单个处理器通信的字数。我们考虑环面和网格拓扑,通用脂肪树和超立方体;涵盖的算法包括经典矩阵乘法和直接数值线性代数、快速矩阵乘法算法、引用数组的程序、n体计算和FFT。例如,我们展示了在3D环面上运行的快速矩阵乘法算法(例如Strassen的)将遭受争用瓶颈。另一方面,这个网络对于经典的矩阵乘法算法来说可能是足够的。我们的新下界仅在极少数情况下与现有算法相匹配,这为网络和算法设计留下了许多开放的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DISP: Optimizations towards Scalable MPI Startup Topology and Affinity Aware Hierarchical and Distributed Load-Balancing in Charm++ Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1