网络拓扑与不可避免的争用

2016 First International Workshop on Communication Optimizations in HPC (COMHPC) Pub Date : 2016-11-13 DOI:10.1109/COM-HPC.2016.10

Grey Ballard, J. Demmel, A. Gearhart, Benjamin Lipshitz, Yishai Oltchik, O. Schwartz, Sivan Toledo

{"title":"网络拓扑与不可避免的争用","authors":"Grey Ballard, J. Demmel, A. Gearhart, Benjamin Lipshitz, Yishai Oltchik, O. Schwartz, Sivan Toledo","doi":"10.1109/COM-HPC.2016.10","DOIUrl":null,"url":null,"abstract":"Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network contention may inevitably become a bottleneck, even if algorithms are optimally designed so that each processor communicates as little as possible. We obtain novel contention lower bounds that are functions of the network and the computation graph parameters. For several combinations of fundamental computations and common network topologies, our new analysis improves upon previous per-processor lower bounds which only specify the number of words communicated by the busiest individual processor. We consider torus and mesh topologies, universal fat-trees, and hypercubes; algorithms covered include classical matrix multiplication and direct numerical linear algebra, fast matrix multiplication algorithms, programs that reference arrays, N-body computations, and the FFT. For example, we show that fast matrix multiplication algorithms (e.g., Strassen's) running on a 3D torus will suffer from contention bottlenecks. On the other hand, this network is likely sufficient for a classical matrix multiplication algorithm. Our new lower bounds are matched by existing algorithms only in very few cases, leaving many open problems for network and algorithmic design.","PeriodicalId":332852,"journal":{"name":"2016 First International Workshop on Communication Optimizations in HPC (COMHPC)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Network Topologies and Inevitable Contention\",\"authors\":\"Grey Ballard, J. Demmel, A. Gearhart, Benjamin Lipshitz, Yishai Oltchik, O. Schwartz, Sivan Toledo\",\"doi\":\"10.1109/COM-HPC.2016.10\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network contention may inevitably become a bottleneck, even if algorithms are optimally designed so that each processor communicates as little as possible. We obtain novel contention lower bounds that are functions of the network and the computation graph parameters. For several combinations of fundamental computations and common network topologies, our new analysis improves upon previous per-processor lower bounds which only specify the number of words communicated by the busiest individual processor. We consider torus and mesh topologies, universal fat-trees, and hypercubes; algorithms covered include classical matrix multiplication and direct numerical linear algebra, fast matrix multiplication algorithms, programs that reference arrays, N-body computations, and the FFT. For example, we show that fast matrix multiplication algorithms (e.g., Strassen's) running on a 3D torus will suffer from contention bottlenecks. On the other hand, this network is likely sufficient for a classical matrix multiplication algorithm. Our new lower bounds are matched by existing algorithms only in very few cases, leaving many open problems for network and algorithmic design.\",\"PeriodicalId\":332852,\"journal\":{\"name\":\"2016 First International Workshop on Communication Optimizations in HPC (COMHPC)\",\"volume\":\"142 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 First International Workshop on Communication Optimizations in HPC (COMHPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COM-HPC.2016.10\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 First International Workshop on Communication Optimizations in HPC (COMHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COM-HPC.2016.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

由于处理器间的通信，网络拓扑结构会对并行算法的执行成本产生重大影响。对于计算和网络拓扑的特定组合，代价高昂的网络争用可能不可避免地成为瓶颈，即使算法经过优化设计，使每个处理器尽可能少地通信。我们得到了新的竞争下界，它是网络和计算图参数的函数。对于基本计算和公共网络拓扑的几种组合，我们的新分析改进了以前的每处理器下限，该下限仅指定最繁忙的单个处理器通信的字数。我们考虑环面和网格拓扑，通用脂肪树和超立方体;涵盖的算法包括经典矩阵乘法和直接数值线性代数、快速矩阵乘法算法、引用数组的程序、n体计算和FFT。例如，我们展示了在3D环面上运行的快速矩阵乘法算法(例如Strassen的)将遭受争用瓶颈。另一方面，这个网络对于经典的矩阵乘法算法来说可能是足够的。我们的新下界仅在极少数情况下与现有算法相匹配，这为网络和算法设计留下了许多开放的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Network Topologies and Inevitable Contention

Network topologies can have significant effect on the execution costs of parallel algorithms due to inter-processor communication. For particular combinations of computations and network topologies, costly network contention may inevitably become a bottleneck, even if algorithms are optimally designed so that each processor communicates as little as possible. We obtain novel contention lower bounds that are functions of the network and the computation graph parameters. For several combinations of fundamental computations and common network topologies, our new analysis improves upon previous per-processor lower bounds which only specify the number of words communicated by the busiest individual processor. We consider torus and mesh topologies, universal fat-trees, and hypercubes; algorithms covered include classical matrix multiplication and direct numerical linear algebra, fast matrix multiplication algorithms, programs that reference arrays, N-body computations, and the FFT. For example, we show that fast matrix multiplication algorithms (e.g., Strassen's) running on a 3D torus will suffer from contention bottlenecks. On the other hand, this network is likely sufficient for a classical matrix multiplication algorithm. Our new lower bounds are matched by existing algorithms only in very few cases, leaving many open problems for network and algorithmic design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 First International Workshop on Communication Optimizations in HPC (COMHPC)

自引率

0.00%

发文量