{"title":"LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning","authors":"Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu","doi":"arxiv-2409.01145","DOIUrl":null,"url":null,"abstract":"Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised\ngraph learning that has attracted attention across various application\nscenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet\nto be explored. Because conventional augmentation techniques like feature\nembedding masking cannot directly process textual attributes on TAGs. A naive\nstrategy for applying GCL to TAGs is to encode the textual attributes into\nfeature embeddings via a language model and then feed the embeddings into the\nfollowing GCL module for processing. Such a strategy faces three key\nchallenges: I) failure to avoid information loss, II) semantic loss during the\ntext encoding phase, and III) implicit augmentation constraints that lead to\nuncontrollable and incomprehensible results. In this paper, we propose a novel\nGCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to\nproduce textual augmentations and LLMs' powerful natural language processing\n(NLP) abilities to address the three limitations aforementioned to pave the way\nfor applying GCL to TAG tasks. Extensive experiments on four high-quality TAG\ndatasets illustrate the superiority of the proposed LATEX-GCL method. The\nsource codes and datasets are released to ease the reproducibility, which can\nbe accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Social and Information Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised
graph learning that has attracted attention across various application
scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet
to be explored. Because conventional augmentation techniques like feature
embedding masking cannot directly process textual attributes on TAGs. A naive
strategy for applying GCL to TAGs is to encode the textual attributes into
feature embeddings via a language model and then feed the embeddings into the
following GCL module for processing. Such a strategy faces three key
challenges: I) failure to avoid information loss, II) semantic loss during the
text encoding phase, and III) implicit augmentation constraints that lead to
uncontrollable and incomprehensible results. In this paper, we propose a novel
GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to
produce textual augmentations and LLMs' powerful natural language processing
(NLP) abilities to address the three limitations aforementioned to pave the way
for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG
datasets illustrate the superiority of the proposed LATEX-GCL method. The
source codes and datasets are released to ease the reproducibility, which can
be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.