Scalable and Effective Graph Neural Networks via Trainable Random Walk Sampling

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-09 DOI:10.1109/TKDE.2024.3513533

Haipeng Ding;Zhewei Wei;Yuhang Ye

{"title":"Scalable and Effective Graph Neural Networks via Trainable Random Walk Sampling","authors":"Haipeng Ding;Zhewei Wei;Yuhang Ye","doi":"10.1109/TKDE.2024.3513533","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) have aroused increasing research attention for their effectiveness on graph mining tasks. However, full-batch training methods based on stochastic gradient descent (SGD) require substantial resources since all gradient-required computational processes are stored in the acceleration device. The bottleneck of storage challenges the training of classic GNNs on large-scale datasets within one acceleration device. Meanwhile, message-passing based (spatial) GNN designs usually necessitate the homophily hypothesis of the graph, which easily fails on heterophilous graphs. In this paper, we propose the random walk extension for those message-passing based GNNs, enriching them with spectral powers. We prove that our random walk sampling with appropriate correction coefficients generates an unbiased approximation of the \n<inline-formula><tex-math>$K$</tex-math></inline-formula>\n-order polynomial filter matrix, thus promoting the neighborhood aggregation of the central nodes. Node-wise sampling strategy and historical embedding allow the classic models to be trained with mini-batches, which extends the scalability of the basic models. To show the effectiveness of our method, we conduct a thorough experimental analysis on some frequently-used benchmarks with diverse homophily and scale. The empirical results show that our model achieves significant performance improvements in comparison with the corresponding base GNNs and some state-of-the-art baselines in node classification tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 2","pages":"896-909"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10786281/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Graph Neural Networks (GNNs) have aroused increasing research attention for their effectiveness on graph mining tasks. However, full-batch training methods based on stochastic gradient descent (SGD) require substantial resources since all gradient-required computational processes are stored in the acceleration device. The bottleneck of storage challenges the training of classic GNNs on large-scale datasets within one acceleration device. Meanwhile, message-passing based (spatial) GNN designs usually necessitate the homophily hypothesis of the graph, which easily fails on heterophilous graphs. In this paper, we propose the random walk extension for those message-passing based GNNs, enriching them with spectral powers. We prove that our random walk sampling with appropriate correction coefficients generates an unbiased approximation of the

$K$

-order polynomial filter matrix, thus promoting the neighborhood aggregation of the central nodes. Node-wise sampling strategy and historical embedding allow the classic models to be trained with mini-batches, which extends the scalability of the basic models. To show the effectiveness of our method, we conduct a thorough experimental analysis on some frequently-used benchmarks with diverse homophily and scale. The empirical results show that our model achieves significant performance improvements in comparison with the corresponding base GNNs and some state-of-the-art baselines in node classification tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于可训练随机行走抽样的可扩展和有效的图神经网络

图神经网络（GNNs）因其在图挖掘任务中的有效性而引起了越来越多的研究关注。然而，基于随机梯度下降（SGD）的全批训练方法需要大量的资源，因为所有需要梯度的计算过程都存储在加速设备中。存储瓶颈对经典gnn在一个加速设备内大规模数据集上的训练提出了挑战。同时，基于消息传递的（空间）GNN设计通常需要图的同态假设，这在异缘图上很容易失效。在本文中，我们提出了基于消息传递的gnn的随机漫步扩展，丰富了它们的谱幂。我们证明了我们的随机漫步抽样与适当的校正系数产生$K$阶多项式滤波器矩阵的无偏近似值，从而促进了中心节点的邻域聚集。节点智能采样策略和历史嵌入使得经典模型可以用小批量进行训练，从而扩展了基本模型的可扩展性。为了证明我们方法的有效性，我们对一些常用的具有不同同质性和尺度的基准进行了彻底的实验分析。实证结果表明，与相应的基本gnn和一些最先进的基线相比，我们的模型在节点分类任务中取得了显着的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.

期刊最新文献

2024 Reviewers List Web-FTP: A Feature Transferring-Based Pre-Trained Model for Web Attack Detection Network-to-Network: Self-Supervised Network Representation Learning via Position Prediction AEGK: Aligned Entropic Graph Kernels Through Continuous-Time Quantum Walks Contextual Inference From Sparse Shopping Transactions Based on Motif Patterns