{"title":"Rethinking graph data placement for graph neural network training on multiple GPUs","authors":"Shihui Song, Peng Jiang","doi":"10.1145/3503221.3508435","DOIUrl":null,"url":null,"abstract":"The existing Graph Neural Network (GNN) systems adopt graph partitioning to divide the graph data for multi-GPU training. Although they support large graphs, we find that the existing techniques lead to large data loading overhead. In this work, we for the first time model the data movement overhead among CPU and GPUs in GNN training. Based on the performance model, we provide an efficient algorithm to divide and distribute the graph data onto multiple GPUs so that the data loading time is minimized. The experiments show that our technique achieves smaller data loading time compared with the existing graph partitioning methods.","PeriodicalId":398609,"journal":{"name":"Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503221.3508435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The existing Graph Neural Network (GNN) systems adopt graph partitioning to divide the graph data for multi-GPU training. Although they support large graphs, we find that the existing techniques lead to large data loading overhead. In this work, we for the first time model the data movement overhead among CPU and GPUs in GNN training. Based on the performance model, we provide an efficient algorithm to divide and distribute the graph data onto multiple GPUs so that the data loading time is minimized. The experiments show that our technique achieves smaller data loading time compared with the existing graph partitioning methods.