BGS：在多个 GPU 上加速 GNN 训练

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-05-04 DOI:10.1016/j.sysarc.2024.103162

Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong

{"title":"BGS：在多个 GPU 上加速 GNN 训练","authors":"Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong","doi":"10.1016/j.sysarc.2024.103162","DOIUrl":null,"url":null,"abstract":"<div><p>Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103162"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BGS: Accelerate GNN training on multiple GPUs\",\"authors\":\"Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong\",\"doi\":\"10.1016/j.sysarc.2024.103162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.</p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"153 \",\"pages\":\"Article 103162\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124000997\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124000997","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

新兴的图神经网络（GNN）在处理图结构数据方面取得了重大进展，然而现有的图神经网络框架在使用多个 GPU 训练大规模图数据时面临着可扩展性问题。CPU 和 GPU 之间频繁的特征数据传输是一个主要瓶颈，而且当前的缓存方案没有充分考虑多 GPU 环境的特点，导致特征提取效率低下。为了应对这些挑战，我们提出了一个辅助框架 BGS，旨在从数据角度加速多 GPU 环境下的 GNN 训练。首先，我们引入了一种新颖的训练集分割算法，为每个 GPU 分配独立的训练子集，以增强节点访问的空间位置性，从而优化特征缓存策略的效率。其次，考虑到 GPU 可以通过 NVLink 连接进行高速通信，我们设计了一种适用于多 GPU 环境的特征缓存放置策略。该策略旨在通过在每个 GPU 上设置合理的冗余缓存来提高整体命中率。在两个具有代表性的 GNN 模型（GCN 和 GraphSAGE）上进行的评估表明，BGS 显著提高了多 GPU 环境下特征缓存策略的命中率，并大幅减少了数据加载的时间开销，与基线相比，性能提高了 1.5 到 6.2 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BGS: Accelerate GNN training on multiple GPUs

Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.