Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong
{"title":"BGS:在多个 GPU 上加速 GNN 训练","authors":"Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong","doi":"10.1016/j.sysarc.2024.103162","DOIUrl":null,"url":null,"abstract":"<div><p>Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103162"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BGS: Accelerate GNN training on multiple GPUs\",\"authors\":\"Yujuan Tan , Zhuoxin Bai , Duo Liu , Zhaoyang Zeng , Yan Gan , Ao Ren , Xianzhang Chen , Kan Zhong\",\"doi\":\"10.1016/j.sysarc.2024.103162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.</p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"153 \",\"pages\":\"Article 103162\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124000997\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124000997","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Emerging Graph Neural Networks (GNNs) have made significant progress in processing graph-structured data, yet existing GNN frameworks face scalability issues when training large-scale graph data using multiple GPUs. Frequent feature data transfers between CPUs and GPUs are a major bottleneck, and current caching schemes have not fully considered the characteristics of multi-GPU environments, leading to inefficient feature extraction. To address these challenges, we propose BGS, an auxiliary framework designed to accelerate GNN training from a data perspective in multi-GPU environments. Firstly, we introduce a novel training set partition algorithm, assigning independent training subsets to each GPU to enhance the spatial locality of node access, thus optimizing the efficiency of the feature caching strategy. Secondly, considering that GPUs can communicate at high speeds via NVLink connections, we designed a feature caching placement strategy suitable for multi-GPU environments. This strategy aims to improve the overall hit rate by setting reasonable redundant caches on each GPU. Evaluations on two representative GNN models, GCN and GraphSAGE, show that BGS significantly improves the hit rate of feature caching strategies in multi-GPU environments and substantially reduces the time overhead of data loading, achieving a performance improvement of 1.5 to 6.2 times compared to the baseline.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.