BGS: Accelerate GNN training on multiple GPUs

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-05-04 DOI:10.1016/j.sysarc.2024.103162
Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong
{"title":"BGS: Accelerate GNN training on multiple GPUs","authors":"Yujuan Tan ,&nbsp;Zhuoxin Bai ,&nbsp;Duo Liu ,&nbsp;Zhaoyang Zeng ,&nbsp;Yan Gan ,&nbsp;Ao Ren ,&nbsp;Xianzhang Chen ,&nbsp;Kan Zhong","doi":"10.1016/j.sysarc.2024.103162","DOIUrl":null,"url":null,"abstract":"<div><p>Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103162"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124000997","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BGS:在多个 GPU 上加速 GNN 训练
新兴的图神经网络(GNN)在处理图结构数据方面取得了重大进展,然而现有的图神经网络框架在使用多个 GPU 训练大规模图数据时面临着可扩展性问题。CPU 和 GPU 之间频繁的特征数据传输是一个主要瓶颈,而且当前的缓存方案没有充分考虑多 GPU 环境的特点,导致特征提取效率低下。为了应对这些挑战,我们提出了一个辅助框架 BGS,旨在从数据角度加速多 GPU 环境下的 GNN 训练。首先,我们引入了一种新颖的训练集分割算法,为每个 GPU 分配独立的训练子集,以增强节点访问的空间位置性,从而优化特征缓存策略的效率。其次,考虑到 GPU 可以通过 NVLink 连接进行高速通信,我们设计了一种适用于多 GPU 环境的特征缓存放置策略。该策略旨在通过在每个 GPU 上设置合理的冗余缓存来提高整体命中率。在两个具有代表性的 GNN 模型(GCN 和 GraphSAGE)上进行的评估表明,BGS 显著提高了多 GPU 环境下特征缓存策略的命中率,并大幅减少了数据加载的时间开销,与基线相比,性能提高了 1.5 到 6.2 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
期刊最新文献
SAMFL: Secure Aggregation Mechanism for Federated Learning with Byzantine-robustness by functional encryption ZNS-Cleaner: Enhancing lifespan by reducing empty erase in ZNS SSDs Using MAST for modeling and response-time analysis of real-time applications with GPUs Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN accelerators Function Placement Approaches in Serverless Computing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1