The Landscape of GPU-Centric Communication

Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov
{"title":"The Landscape of GPU-Centric Communication","authors":"Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov","doi":"arxiv-2409.09874","DOIUrl":null,"url":null,"abstract":"n recent years, GPUs have become the preferred accelerators for HPC and ML\napplications due to their parallelism and fast memory bandwidth. While GPUs\nboost computation, inter-GPU communication can create scalability bottlenecks,\nespecially as the number of GPUs per node and cluster grows. Traditionally, the\nCPU managed multi-GPU communication, but advancements in GPU-centric\ncommunication now challenge this CPU dominance by reducing its involvement,\ngranting GPUs more autonomy in communication tasks, and addressing mismatches\nin multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on\nvendor mechanisms and user-level library supports. It aims to clarify the\ncomplexities and diverse options in this field, define the terminology, and\ncategorize existing approaches within and across nodes. The paper discusses\nvendor-provided mechanisms for communication and memory management in multi-GPU\nexecution and reviews major communication libraries, their benefits,\nchallenges, and performance insights. Then, it explores key research paradigms,\nfuture outlooks, and open research questions. By extensively describing\nGPU-centric communication techniques across the software and hardware stacks,\nwe provide researchers, programmers, engineers, and library designers insights\non how to exploit multi-GPU systems at their best.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

n recent years, GPUs have become the preferred accelerators for HPC and ML applications due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter-GPU communication can create scalability bottlenecks, especially as the number of GPUs per node and cluster grows. Traditionally, the CPU managed multi-GPU communication, but advancements in GPU-centric communication now challenge this CPU dominance by reducing its involvement, granting GPUs more autonomy in communication tasks, and addressing mismatches in multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on vendor mechanisms and user-level library supports. It aims to clarify the complexities and diverse options in this field, define the terminology, and categorize existing approaches within and across nodes. The paper discusses vendor-provided mechanisms for communication and memory management in multi-GPU execution and reviews major communication libraries, their benefits, challenges, and performance insights. Then, it explores key research paradigms, future outlooks, and open research questions. By extensively describing GPU-centric communication techniques across the software and hardware stacks, we provide researchers, programmers, engineers, and library designers insights on how to exploit multi-GPU systems at their best.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以 GPU 为中心的通信格局
近年来,GPU 因其并行性和快速内存带宽而成为 HPC 和 ML 应用的首选加速器。虽然 GPU 可以提高计算能力,但 GPU 之间的通信会造成可扩展性瓶颈,尤其是当每个节点和集群的 GPU 数量增加时。传统上,CPU 负责管理多 GPU 通信,但现在以 GPU 为中心的通信技术的进步挑战了 CPU 的主导地位,减少了 CPU 的参与,赋予 GPU 在通信任务中更多的自主权,并解决了多 GPU 通信和计算中的不匹配问题。本文介绍了以 GPU 为中心的通信,重点是供应商机制和用户级库支持。本文旨在阐明该领域的复杂性和多种选择,定义术语,并对节点内和节点间的现有方法进行分类。本文讨论了供应商提供的多 GPU 执行中的通信和内存管理机制,回顾了主要的通信库、其优势、挑战和性能见解。然后,论文探讨了关键研究范例、未来展望和开放研究问题。通过广泛介绍软件和硬件堆栈中以 GPU 为中心的通信技术,我们为研究人员、程序员、工程师和库设计人员提供了如何以最佳方式利用多 GPU 系统的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HRA: A Multi-Criteria Framework for Ranking Metaheuristic Optimization Algorithms Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures Can Graph Reordering Speed Up Graph Neural Network Training? An Experimental Study The Landscape of GPU-Centric Communication A Global Perspective on the Past, Present, and Future of Video Streaming over Starlink
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1