D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks

Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu
{"title":"D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural Networks","authors":"Rustam Guliyev, Aparajita Haldar, Hakan Ferhatosmanoglu","doi":"arxiv-2409.09079","DOIUrl":null,"url":null,"abstract":"Graph Neural Network (GNN) models on streaming graphs entail algorithmic\nchallenges to continuously capture its dynamic state, as well as systems\nchallenges to optimize latency, memory, and throughput during both inference\nand training. We present D3-GNN, the first distributed, hybrid-parallel,\nstreaming GNN system designed to handle real-time graph updates under online\nquery setting. Our system addresses data management, algorithmic, and systems\nchallenges, enabling continuous capturing of the dynamic state of the graph and\nupdating node representations with fault-tolerance and optimal latency,\nload-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an\nunrolled, distributed computation graph architecture to handle cascading graph\nupdates. To counteract data skew and neighborhood explosion issues, we\nintroduce inter-layer and intra-layer windowed forward pass solutions.\nExperiments on large-scale graph streams demonstrate that D3-GNN achieves high\nefficiency and scalability. Compared to DGL, D3-GNN achieves a significant\nthroughput improvement of about 76x for streaming workloads. The windowed\nenhancement further reduces running times by around 10x and message volumes by\nup to 15x at higher parallelism.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Graph Neural Network (GNN) models on streaming graphs entail algorithmic challenges to continuously capture its dynamic state, as well as systems challenges to optimize latency, memory, and throughput during both inference and training. We present D3-GNN, the first distributed, hybrid-parallel, streaming GNN system designed to handle real-time graph updates under online query setting. Our system addresses data management, algorithmic, and systems challenges, enabling continuous capturing of the dynamic state of the graph and updating node representations with fault-tolerance and optimal latency, load-balance, and throughput. D3-GNN utilizes streaming GNN aggregators and an unrolled, distributed computation graph architecture to handle cascading graph updates. To counteract data skew and neighborhood explosion issues, we introduce inter-layer and intra-layer windowed forward pass solutions. Experiments on large-scale graph streams demonstrate that D3-GNN achieves high efficiency and scalability. Compared to DGL, D3-GNN achieves a significant throughput improvement of about 76x for streaming workloads. The windowed enhancement further reduces running times by around 10x and message volumes by up to 15x at higher parallelism.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
D3-GNN:流图神经网络的动态分布式数据流
流图上的图神经网络(GNN)模型面临着持续捕捉其动态状态的算法挑战,以及在推理和训练过程中优化延迟、内存和吞吐量的系统挑战。我们介绍的 D3-GNN 是首个分布式混合并行流 GNN 系统,旨在处理在线查询设置下的实时图更新。我们的系统解决了数据管理、算法和系统方面的挑战,能够持续捕捉图的动态状态,并以容错、最佳延迟、负载平衡和吞吐量的方式更新节点表示。D3-GNN 利用流 GNN 聚合器和无滚动分布式计算图架构来处理层叠图更新。为了解决数据倾斜和邻域爆炸问题,我们引入了层间和层内窗口化前向传递解决方案。在大规模图流上的实验证明,D3-GNN 实现了高效率和可扩展性。与 DGL 相比,D3-GNN 对流工作负载的吞吐量显著提高了约 76 倍。在并行度较高的情况下,窗口化增强进一步将运行时间缩短了约 10 倍,将消息量减少了多达 15 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1