Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs

Hang Cui, Tarek Abdelzaher
{"title":"Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs","authors":"Hang Cui, Tarek Abdelzaher","doi":"arxiv-2409.07712","DOIUrl":null,"url":null,"abstract":"In the broader machine learning literature, data-generation methods\ndemonstrate promising results by generating additional informative training\nexamples via augmenting sparse labels. Such methods are less studied in graphs\ndue to the intricate dependencies among nodes in complex topology structures.\nThis paper presents a novel node generation method that infuses a small set of\nhigh-quality synthesized nodes into the graph as additional labeled nodes to\noptimally expand the propagation of labeled information. By simply infusing\nadditional nodes, the framework is orthogonal to the graph learning and\ndownstream classification techniques, and thus is compatible with most popular\ngraph pre-training (self-supervised learning), semi-supervised learning, and\nmeta-learning methods. The contribution lies in designing the generated node\nset by solving a novel optimization problem. The optimization places the\ngenerated nodes in a manner that: (1) minimizes the classification loss to\nguarantee training accuracy and (2) maximizes label propagation to\nlow-confidence nodes in the downstream task to ensure high-quality propagation.\nTheoretically, we show that the above dual optimization maximizes the global\nconfidence of node classification. Our Experiments demonstrate statistically\nsignificant performance improvements over 14 baselines on 10 publicly available\ndatasets.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the broader machine learning literature, data-generation methods demonstrate promising results by generating additional informative training examples via augmenting sparse labels. Such methods are less studied in graphs due to the intricate dependencies among nodes in complex topology structures. This paper presents a novel node generation method that infuses a small set of high-quality synthesized nodes into the graph as additional labeled nodes to optimally expand the propagation of labeled information. By simply infusing additional nodes, the framework is orthogonal to the graph learning and downstream classification techniques, and thus is compatible with most popular graph pre-training (self-supervised learning), semi-supervised learning, and meta-learning methods. The contribution lies in designing the generated node set by solving a novel optimization problem. The optimization places the generated nodes in a manner that: (1) minimizes the classification loss to guarantee training accuracy and (2) maximizes label propagation to low-confidence nodes in the downstream task to ensure high-quality propagation. Theoretically, we show that the above dual optimization maximizes the global confidence of node classification. Our Experiments demonstrate statistically significant performance improvements over 14 baselines on 10 publicly available datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为稀疏标记图中的节点分类生成虚拟节点
在更广泛的机器学习文献中,数据生成方法通过增加稀疏标签来生成额外的信息训练样本,从而展示了很有前景的结果。由于复杂拓扑结构中节点之间错综复杂的依赖关系,此类方法在图中的研究较少。本文提出了一种新颖的节点生成方法,该方法将一小部分高质量的合成节点作为附加标签节点注入图中,从而最大限度地扩大了标签信息的传播范围。通过简单地注入额外节点,该框架与图学习和下游分类技术是正交的,因此与大多数流行的图预训练(自我监督学习)、半监督学习和元学习方法是兼容的。它的贡献在于通过解决一个新颖的优化问题来设计生成的节点集。该优化方法将生成的节点以如下方式放置(从理论上讲,我们证明了上述双重优化能最大化节点分类的全局置信度。我们的实验表明,在 10 个公开可用的数据集上,与 14 个基线相比,我们的性能有了统计上的显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
My Views Do Not Reflect Those of My Employer: Differences in Behavior of Organizations' Official and Personal Social Media Accounts A novel DFS/BFS approach towards link prediction Community Shaping in the Digital Age: A Temporal Fusion Framework for Analyzing Discourse Fragmentation in Online Social Networks Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval "It Might be Technically Impressive, But It's Practically Useless to Us": Practices, Challenges, and Opportunities for Cross-Functional Collaboration around AI within the News Industry
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1