LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning

Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu
{"title":"LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning","authors":"Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu","doi":"arxiv-2409.01145","DOIUrl":null,"url":null,"abstract":"Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised\ngraph learning that has attracted attention across various application\nscenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet\nto be explored. Because conventional augmentation techniques like feature\nembedding masking cannot directly process textual attributes on TAGs. A naive\nstrategy for applying GCL to TAGs is to encode the textual attributes into\nfeature embeddings via a language model and then feed the embeddings into the\nfollowing GCL module for processing. Such a strategy faces three key\nchallenges: I) failure to avoid information loss, II) semantic loss during the\ntext encoding phase, and III) implicit augmentation constraints that lead to\nuncontrollable and incomprehensible results. In this paper, we propose a novel\nGCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to\nproduce textual augmentations and LLMs' powerful natural language processing\n(NLP) abilities to address the three limitations aforementioned to pave the way\nfor applying GCL to TAG tasks. Extensive experiments on four high-quality TAG\ndatasets illustrate the superiority of the proposed LATEX-GCL method. The\nsource codes and datasets are released to ease the reproducibility, which can\nbe accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model and then feed the embeddings into the following GCL module for processing. Such a strategy faces three key challenges: I) failure to avoid information loss, II) semantic loss during the text encoding phase, and III) implicit augmentation constraints that lead to uncontrollable and incomprehensible results. In this paper, we propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations and LLMs' powerful natural language processing (NLP) abilities to address the three limitations aforementioned to pave the way for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG datasets illustrate the superiority of the proposed LATEX-GCL method. The source codes and datasets are released to ease the reproducibility, which can be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LATEX-GCL:基于大语言模型(LLMs)的文本归因图对比学习数据扩展
图对比学习(GCL)是一种有效的自监督图学习范式,在各种应用场景中都备受关注。然而,用于文本属性图(TAG)学习的 GCL 还有待探索。因为传统的增强技术(如特征嵌入屏蔽)无法直接处理 TAG 上的文本属性。将 GCL 应用于 TAG 的一种原始策略是通过语言模型将文本属性编码为特征嵌入,然后将嵌入输入到后续的 GCL 模块中进行处理。这种策略面临三个主要挑战:I) 无法避免信息丢失;II) 文本编码阶段的语义丢失;III) 隐式扩增约束导致结果难以控制和理解。在本文中,我们提出了一种名为 LATEX-GCL 的新型 GCL 框架,利用大语言模型(LLM)生成文本增强,并利用 LLM 强大的自然语言处理(NLP)能力来解决上述三个局限性,从而为将 GCL 应用于 TAG 任务铺平道路。在四个高质量 TAG 数据集上进行的广泛实验证明了所提出的 LATEX-GCL 方法的优越性。为了便于重现,我们发布了源代码和数据集,可通过以下链接访问:https://anonymous.4open.science/r/LATEX-GCL-0712。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
My Views Do Not Reflect Those of My Employer: Differences in Behavior of Organizations' Official and Personal Social Media Accounts A novel DFS/BFS approach towards link prediction Community Shaping in the Digital Age: A Temporal Fusion Framework for Analyzing Discourse Fragmentation in Online Social Networks Skill matching at scale: freelancer-project alignment for efficient multilingual candidate retrieval "It Might be Technically Impressive, But It's Practically Useless to Us": Practices, Challenges, and Opportunities for Cross-Functional Collaboration around AI within the News Industry
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1