Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs

arXiv - CS - Social and Information Networks Pub Date : 2024-09-12 DOI:arxiv-2409.07712

Hang Cui, Tarek Abdelzaher

{"title":"Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs","authors":"Hang Cui, Tarek Abdelzaher","doi":"arxiv-2409.07712","DOIUrl":null,"url":null,"abstract":"In the broader machine learning literature, data-generation methods\ndemonstrate promising results by generating additional informative training\nexamples via augmenting sparse labels. Such methods are less studied in graphs\ndue to the intricate dependencies among nodes in complex topology structures.\nThis paper presents a novel node generation method that infuses a small set of\nhigh-quality synthesized nodes into the graph as additional labeled nodes to\noptimally expand the propagation of labeled information. By simply infusing\nadditional nodes, the framework is orthogonal to the graph learning and\ndownstream classification techniques, and thus is compatible with most popular\ngraph pre-training (self-supervised learning), semi-supervised learning, and\nmeta-learning methods. The contribution lies in designing the generated node\nset by solving a novel optimization problem. The optimization places the\ngenerated nodes in a manner that: (1) minimizes the classification loss to\nguarantee training accuracy and (2) maximizes label propagation to\nlow-confidence nodes in the downstream task to ensure high-quality propagation.\nTheoretically, we show that the above dual optimization maximizes the global\nconfidence of node classification. Our Experiments demonstrate statistically\nsignificant performance improvements over 14 baselines on 10 publicly available\ndatasets.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the broader machine learning literature, data-generation methods demonstrate promising results by generating additional informative training examples via augmenting sparse labels. Such methods are less studied in graphs due to the intricate dependencies among nodes in complex topology structures. This paper presents a novel node generation method that infuses a small set of high-quality synthesized nodes into the graph as additional labeled nodes to optimally expand the propagation of labeled information. By simply infusing additional nodes, the framework is orthogonal to the graph learning and downstream classification techniques, and thus is compatible with most popular graph pre-training (self-supervised learning), semi-supervised learning, and meta-learning methods. The contribution lies in designing the generated node set by solving a novel optimization problem. The optimization places the generated nodes in a manner that: (1) minimizes the classification loss to guarantee training accuracy and (2) maximizes label propagation to low-confidence nodes in the downstream task to ensure high-quality propagation. Theoretically, we show that the above dual optimization maximizes the global confidence of node classification. Our Experiments demonstrate statistically significant performance improvements over 14 baselines on 10 publicly available datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为稀疏标记图中的节点分类生成虚拟节点

在更广泛的机器学习文献中，数据生成方法通过增加稀疏标签来生成额外的信息训练样本，从而展示了很有前景的结果。由于复杂拓扑结构中节点之间错综复杂的依赖关系，此类方法在图中的研究较少。本文提出了一种新颖的节点生成方法，该方法将一小部分高质量的合成节点作为附加标签节点注入图中，从而最大限度地扩大了标签信息的传播范围。通过简单地注入额外节点，该框架与图学习和下游分类技术是正交的，因此与大多数流行的图预训练（自我监督学习）、半监督学习和元学习方法是兼容的。它的贡献在于通过解决一个新颖的优化问题来设计生成的节点集。该优化方法将生成的节点以如下方式放置(从理论上讲，我们证明了上述双重优化能最大化节点分类的全局置信度。我们的实验表明，在 10 个公开可用的数据集上，与 14 个基线相比，我们的性能有了统计上的显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Social and Information Networks

自引率

0.00%

发文量