Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler
{"title":"Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects","authors":"Daniele De Sensi, Lorenzo Pichetti, Flavio Vella, Tiziano De Matteis, Zebin Ren, Luigi Fusco, Matteo Turisini, Daniele Cesarini, Kurt Lust, Animesh Trivedi, Duncan Roweth, Filippo Spiga, Salvatore Di Girolamo, Torsten Hoefler","doi":"arxiv-2408.14090","DOIUrl":null,"url":null,"abstract":"Multi-GPU nodes are increasingly common in the rapidly evolving landscape of\nexascale supercomputers. On these systems, GPUs on the same node are connected\nthrough dedicated networks, with bandwidths up to a few terabits per second.\nHowever, gauging performance expectations and maximizing system efficiency is\nchallenging due to different technologies, design options, and software layers.\nThis paper comprehensively characterizes three supercomputers - Alps, Leonardo,\nand LUMI - each with a unique architecture and design. We focus on performance\nevaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using\na mix of intra-node and inter-node benchmarks. By analyzing its limitations and\nopportunities, we aim to offer practical guidance to researchers, system\narchitects, and software developers dealing with multi-GPU supercomputing. Our\nresults show that there is untapped bandwidth, and there are still many\nopportunities for optimization, ranging from network to software optimization.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-GPU nodes are increasingly common in the rapidly evolving landscape of
exascale supercomputers. On these systems, GPUs on the same node are connected
through dedicated networks, with bandwidths up to a few terabits per second.
However, gauging performance expectations and maximizing system efficiency is
challenging due to different technologies, design options, and software layers.
This paper comprehensively characterizes three supercomputers - Alps, Leonardo,
and LUMI - each with a unique architecture and design. We focus on performance
evaluation of intra-node and inter-node interconnects on up to 4096 GPUs, using
a mix of intra-node and inter-node benchmarks. By analyzing its limitations and
opportunities, we aim to offer practical guidance to researchers, system
architects, and software developers dealing with multi-GPU supercomputing. Our
results show that there is untapped bandwidth, and there are still many
opportunities for optimization, ranging from network to software optimization.