HyScale-GNN:基于单节点异构架构的可扩展混合GNN训练系统

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-03-01 DOI:10.1109/IPDPS54959.2023.00062

Yi-Chien Lin, V. Prasanna

{"title":"HyScale-GNN:基于单节点异构架构的可扩展混合GNN训练系统","authors":"Yi-Chien Lin, V. Prasanna","doi":"10.1109/IPDPS54959.2023.00062","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) have shown success in many real-world applications that involve graph-structured data. Most of the existing single-node GNN training systems are capable of training medium-scale graphs with tens of millions of edges; however, scaling them to large-scale graphs with billions of edges remains challenging. In addition, it is challenging to map GNN training algorithms onto a computation node as state-of-the-art machines feature heterogeneous architecture consisting of multiple processors and a variety of accelerators.We propose HyScale-GNN, a novel system to train GNN models on a single-node heterogeneous architecture. HyScale-GNN performs hybrid training which utilizes both the processors and the accelerators to train a model collaboratively. Our system design overcomes the memory size limitation of existing works and is optimized for training GNNs on large-scale graphs. We propose a two-stage data pre-fetching scheme to reduce the communication overhead during GNN training. To improve task mapping efficiency, we propose a dynamic resource management mechanism, which adjusts the workload assignment and resource allocation during runtime. We evaluate HyScale-GNN on a CPU-GPU and a CPU-FPGA heterogeneous architecture. Using several large-scale datasets and two widely-used GNN models, we compare the performance of our design with a multi-GPU baseline implemented in PyTorch-Geometric. The CPU-GPU design and the CPU-FPGA design achieve up to 2.08× speedup and 12.6× speedup, respectively. Compared with the state-of-the-art large-scale multi-node GNN training systems such as P3 and DistDGL, our CPU-FPGA design achieves up to 5.27× speedup using a single node.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture\",\"authors\":\"Yi-Chien Lin, V. Prasanna\",\"doi\":\"10.1109/IPDPS54959.2023.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Neural Networks (GNNs) have shown success in many real-world applications that involve graph-structured data. Most of the existing single-node GNN training systems are capable of training medium-scale graphs with tens of millions of edges; however, scaling them to large-scale graphs with billions of edges remains challenging. In addition, it is challenging to map GNN training algorithms onto a computation node as state-of-the-art machines feature heterogeneous architecture consisting of multiple processors and a variety of accelerators.We propose HyScale-GNN, a novel system to train GNN models on a single-node heterogeneous architecture. HyScale-GNN performs hybrid training which utilizes both the processors and the accelerators to train a model collaboratively. Our system design overcomes the memory size limitation of existing works and is optimized for training GNNs on large-scale graphs. We propose a two-stage data pre-fetching scheme to reduce the communication overhead during GNN training. To improve task mapping efficiency, we propose a dynamic resource management mechanism, which adjusts the workload assignment and resource allocation during runtime. We evaluate HyScale-GNN on a CPU-GPU and a CPU-FPGA heterogeneous architecture. Using several large-scale datasets and two widely-used GNN models, we compare the performance of our design with a multi-GPU baseline implemented in PyTorch-Geometric. The CPU-GPU design and the CPU-FPGA design achieve up to 2.08× speedup and 12.6× speedup, respectively. Compared with the state-of-the-art large-scale multi-node GNN training systems such as P3 and DistDGL, our CPU-FPGA design achieves up to 5.27× speedup using a single node.\",\"PeriodicalId\":343684,\"journal\":{\"name\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"2017 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS54959.2023.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

图神经网络(gnn)已经在许多涉及图结构数据的实际应用中取得了成功。现有的大多数单节点GNN训练系统都能够训练具有数千万条边的中等规模图;然而，将它们扩展到具有数十亿条边的大规模图仍然具有挑战性。此外，将GNN训练算法映射到计算节点是具有挑战性的，因为最先进的机器具有由多个处理器和各种加速器组成的异构架构。我们提出了一种在单节点异构架构上训练GNN模型的新系统HyScale-GNN。HyScale-GNN执行混合训练，利用处理器和加速器协同训练模型。我们的系统设计克服了现有作品的内存大小限制，并针对大规模图训练gnn进行了优化。为了减少GNN训练过程中的通信开销，提出了一种两阶段数据预取方案。为了提高任务映射的效率，我们提出了一种动态资源管理机制，可以在运行时调整工作负载分配和资源分配。我们在CPU-GPU和CPU-FPGA异构架构上评估了HyScale-GNN。使用几个大规模数据集和两个广泛使用的GNN模型，我们将设计的性能与PyTorch-Geometric中实现的多gpu基线进行了比较。CPU-GPU设计和CPU-FPGA设计分别实现了高达2.08倍和12.6倍的加速。与P3和DistDGL等目前最先进的大规模多节点GNN训练系统相比，我们的CPU-FPGA设计在单个节点上实现了高达5.27倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HyScale-GNN: A Scalable Hybrid GNN Training System on Single-Node Heterogeneous Architecture

Graph Neural Networks (GNNs) have shown success in many real-world applications that involve graph-structured data. Most of the existing single-node GNN training systems are capable of training medium-scale graphs with tens of millions of edges; however, scaling them to large-scale graphs with billions of edges remains challenging. In addition, it is challenging to map GNN training algorithms onto a computation node as state-of-the-art machines feature heterogeneous architecture consisting of multiple processors and a variety of accelerators.We propose HyScale-GNN, a novel system to train GNN models on a single-node heterogeneous architecture. HyScale-GNN performs hybrid training which utilizes both the processors and the accelerators to train a model collaboratively. Our system design overcomes the memory size limitation of existing works and is optimized for training GNNs on large-scale graphs. We propose a two-stage data pre-fetching scheme to reduce the communication overhead during GNN training. To improve task mapping efficiency, we propose a dynamic resource management mechanism, which adjusts the workload assignment and resource allocation during runtime. We evaluate HyScale-GNN on a CPU-GPU and a CPU-FPGA heterogeneous architecture. Using several large-scale datasets and two widely-used GNN models, we compare the performance of our design with a multi-GPU baseline implemented in PyTorch-Geometric. The CPU-GPU design and the CPU-FPGA design achieve up to 2.08× speedup and 12.6× speedup, respectively. Compared with the state-of-the-art large-scale multi-node GNN training systems such as P3 and DistDGL, our CPU-FPGA design achieves up to 5.27× speedup using a single node.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量