The Landscape of GPU-Centric Communication

arXiv - CS - Performance Pub Date : 2024-09-15 DOI:arxiv-2409.09874

Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov

{"title":"The Landscape of GPU-Centric Communication","authors":"Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov","doi":"arxiv-2409.09874","DOIUrl":null,"url":null,"abstract":"n recent years, GPUs have become the preferred accelerators for HPC and ML\napplications due to their parallelism and fast memory bandwidth. While GPUs\nboost computation, inter-GPU communication can create scalability bottlenecks,\nespecially as the number of GPUs per node and cluster grows. Traditionally, the\nCPU managed multi-GPU communication, but advancements in GPU-centric\ncommunication now challenge this CPU dominance by reducing its involvement,\ngranting GPUs more autonomy in communication tasks, and addressing mismatches\nin multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on\nvendor mechanisms and user-level library supports. It aims to clarify the\ncomplexities and diverse options in this field, define the terminology, and\ncategorize existing approaches within and across nodes. The paper discusses\nvendor-provided mechanisms for communication and memory management in multi-GPU\nexecution and reviews major communication libraries, their benefits,\nchallenges, and performance insights. Then, it explores key research paradigms,\nfuture outlooks, and open research questions. By extensively describing\nGPU-centric communication techniques across the software and hardware stacks,\nwe provide researchers, programmers, engineers, and library designers insights\non how to exploit multi-GPU systems at their best.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

n recent years, GPUs have become the preferred accelerators for HPC and ML applications due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter-GPU communication can create scalability bottlenecks, especially as the number of GPUs per node and cluster grows. Traditionally, the CPU managed multi-GPU communication, but advancements in GPU-centric communication now challenge this CPU dominance by reducing its involvement, granting GPUs more autonomy in communication tasks, and addressing mismatches in multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on vendor mechanisms and user-level library supports. It aims to clarify the complexities and diverse options in this field, define the terminology, and categorize existing approaches within and across nodes. The paper discusses vendor-provided mechanisms for communication and memory management in multi-GPU execution and reviews major communication libraries, their benefits, challenges, and performance insights. Then, it explores key research paradigms, future outlooks, and open research questions. By extensively describing GPU-centric communication techniques across the software and hardware stacks, we provide researchers, programmers, engineers, and library designers insights on how to exploit multi-GPU systems at their best.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

以 GPU 为中心的通信格局

近年来，GPU 因其并行性和快速内存带宽而成为 HPC 和 ML 应用的首选加速器。虽然 GPU 可以提高计算能力，但 GPU 之间的通信会造成可扩展性瓶颈，尤其是当每个节点和集群的 GPU 数量增加时。传统上，CPU 负责管理多 GPU 通信，但现在以 GPU 为中心的通信技术的进步挑战了 CPU 的主导地位，减少了 CPU 的参与，赋予 GPU 在通信任务中更多的自主权，并解决了多 GPU 通信和计算中的不匹配问题。本文介绍了以 GPU 为中心的通信，重点是供应商机制和用户级库支持。本文旨在阐明该领域的复杂性和多种选择，定义术语，并对节点内和节点间的现有方法进行分类。本文讨论了供应商提供的多 GPU 执行中的通信和内存管理机制，回顾了主要的通信库、其优势、挑战和性能见解。然后，论文探讨了关键研究范例、未来展望和开放研究问题。通过广泛介绍软件和硬件堆栈中以 GPU 为中心的通信技术，我们为研究人员、程序员、工程师和库设计人员提供了如何以最佳方式利用多 GPU 系统的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量