Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov
{"title":"The Landscape of GPU-Centric Communication","authors":"Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov","doi":"arxiv-2409.09874","DOIUrl":null,"url":null,"abstract":"n recent years, GPUs have become the preferred accelerators for HPC and ML\napplications due to their parallelism and fast memory bandwidth. While GPUs\nboost computation, inter-GPU communication can create scalability bottlenecks,\nespecially as the number of GPUs per node and cluster grows. Traditionally, the\nCPU managed multi-GPU communication, but advancements in GPU-centric\ncommunication now challenge this CPU dominance by reducing its involvement,\ngranting GPUs more autonomy in communication tasks, and addressing mismatches\nin multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on\nvendor mechanisms and user-level library supports. It aims to clarify the\ncomplexities and diverse options in this field, define the terminology, and\ncategorize existing approaches within and across nodes. The paper discusses\nvendor-provided mechanisms for communication and memory management in multi-GPU\nexecution and reviews major communication libraries, their benefits,\nchallenges, and performance insights. Then, it explores key research paradigms,\nfuture outlooks, and open research questions. By extensively describing\nGPU-centric communication techniques across the software and hardware stacks,\nwe provide researchers, programmers, engineers, and library designers insights\non how to exploit multi-GPU systems at their best.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
n recent years, GPUs have become the preferred accelerators for HPC and ML
applications due to their parallelism and fast memory bandwidth. While GPUs
boost computation, inter-GPU communication can create scalability bottlenecks,
especially as the number of GPUs per node and cluster grows. Traditionally, the
CPU managed multi-GPU communication, but advancements in GPU-centric
communication now challenge this CPU dominance by reducing its involvement,
granting GPUs more autonomy in communication tasks, and addressing mismatches
in multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on
vendor mechanisms and user-level library supports. It aims to clarify the
complexities and diverse options in this field, define the terminology, and
categorize existing approaches within and across nodes. The paper discusses
vendor-provided mechanisms for communication and memory management in multi-GPU
execution and reviews major communication libraries, their benefits,
challenges, and performance insights. Then, it explores key research paradigms,
future outlooks, and open research questions. By extensively describing
GPU-centric communication techniques across the software and hardware stacks,
we provide researchers, programmers, engineers, and library designers insights
on how to exploit multi-GPU systems at their best.