A. Bhatele, Nikhil Jain, Katherine E. Isaacs, Ronak Buch, T. Gamblin, S. Langer, L. Kalé
{"title":"Optimizing the performance of parallel applications on a 5D torus via task mapping","authors":"A. Bhatele, Nikhil Jain, Katherine E. Isaacs, Ronak Buch, T. Gamblin, S. Langer, L. Kalé","doi":"10.1109/HiPC.2014.7116706","DOIUrl":null,"url":null,"abstract":"Six of the ten fastest supercomputers in the world in 2014 use a torus interconnection network for message passing between compute nodes. Torus networks provide high bandwidth links to near-neighbors and low latencies over multiple hops on the network. However, large diameters of such networks necessitate a careful placement of parallel tasks on the compute nodes to minimize network congestion. This paper presents a methodological study of optimizing application performance on a five-dimensional torus network via the technique of topology-aware task mapping. Task mapping refers to the placement of processes on compute nodes while carefully considering the network topology between the nodes and the communication behavior of the application. We focus on the IBM Blue Gene/Q machine and two production applications - a laser-plasma interaction code called pF3D and a lattice QCD application called MILC. Optimizations presented in the paper improve the communication performance of pF3D by 90% and that of MILC by up to 47%.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2014.7116706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Six of the ten fastest supercomputers in the world in 2014 use a torus interconnection network for message passing between compute nodes. Torus networks provide high bandwidth links to near-neighbors and low latencies over multiple hops on the network. However, large diameters of such networks necessitate a careful placement of parallel tasks on the compute nodes to minimize network congestion. This paper presents a methodological study of optimizing application performance on a five-dimensional torus network via the technique of topology-aware task mapping. Task mapping refers to the placement of processes on compute nodes while carefully considering the network topology between the nodes and the communication behavior of the application. We focus on the IBM Blue Gene/Q machine and two production applications - a laser-plasma interaction code called pF3D and a lattice QCD application called MILC. Optimizations presented in the paper improve the communication performance of pF3D by 90% and that of MILC by up to 47%.
2014年,世界上最快的10台超级计算机中有6台使用环面互连网络在计算节点之间传递消息。环面网络为近邻提供高带宽链路,并在网络上的多跳上提供低延迟。然而,这种网络的大直径需要在计算节点上小心地放置并行任务,以最小化网络拥塞。本文提出了一种利用拓扑感知任务映射技术优化五维环面网络应用程序性能的方法研究。任务映射是指在仔细考虑节点之间的网络拓扑和应用程序的通信行为的同时,在计算节点上放置进程。我们专注于IBM Blue Gene/Q机器和两个生产应用程序-一个称为pF3D的激光等离子体相互作用代码和一个称为MILC的晶格QCD应用程序。本文提出的优化方案使pF3D的通信性能提高了90%,MILC的通信性能提高了47%。