{"title":"D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks","authors":"Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu","doi":"arxiv-2409.09079","DOIUrl":null,"url":null,"abstract":"Graph Neural Network (GNN) models on streaming graphs entail algorithmic\nchallenges to continuously capture its dynamic state, as well as systems\nchallenges to optimize latency, memory, and throughput during both inference\nand training. We present D3-GNN, the first distributed, hybrid-parallel,\nstreaming GNN system designed to handle real-time graph updates under online\nquery setting. Our system addresses data management, algorithmic, and systems\nchallenges, enabling continuous capturing of the dynamic state of the graph and\nupdating node representations with fault-tolerance and optimal latency,\nload-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an\nunrolled, distributed computation graph architecture to handle cascading graph\nupdates. To counteract data skew and neighborhood explosion issues, we\nintroduce inter-layer and intra-layer windowed forward pass solutions.\nExperiments on large-scale graph streams demonstrate that D3-GNN achieves high\nefficiency and scalability. Compared to DGL, D3-GNN achieves a significant\nthroughput improvement of about 76x for streaming workloads. The windowed\nenhancement further reduces running times by around 10x and message volumes by\nup to 15x at higher parallelism.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Graph Neural Network (GNN) models on streaming graphs entail algorithmic
challenges to continuously capture its dynamic state, as well as systems
challenges to optimize latency, memory, and throughput during both inference
and training. We present D3-GNN, the first distributed, hybrid-parallel,
streaming GNN system designed to handle real-time graph updates under online
query setting. Our system addresses data management, algorithmic, and systems
challenges, enabling continuous capturing of the dynamic state of the graph and
updating node representations with fault-tolerance and optimal latency,
load-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an
unrolled, distributed computation graph architecture to handle cascading graph
updates. To counteract data skew and neighborhood explosion issues, we
introduce inter-layer and intra-layer windowed forward pass solutions.
Experiments on large-scale graph streams demonstrate that D3-GNN achieves high
efficiency and scalability. Compared to DGL, D3-GNN achieves a significant
throughput improvement of about 76x for streaming workloads. The windowed
enhancement further reduces running times by around 10x and message volumes by
up to 15x at higher parallelism.