Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised graph learning that has attracted attention across various application scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet to be explored. Because conventional augmentation techniques like feature embedding masking cannot directly process textual attributes on TAGs. A naive strategy for applying GCL to TAGs is to encode the textual attributes into feature embeddings via a language model and then feed the embeddings into the following GCL module for processing. Such a strategy faces three key challenges: I) failure to avoid information loss, II) semantic loss during the text encoding phase, and III) implicit augmentation constraints that lead to uncontrollable and incomprehensible results. In this paper, we propose a novel GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to produce textual augmentations and LLMs' powerful natural language processing (NLP) abilities to address the three limitations aforementioned to pave the way for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG datasets illustrate the superiority of the proposed LATEX-GCL method. The source codes and datasets are released to ease the reproducibility, which can be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.
图对比学习(GCL)是一种有效的自监督图学习范式,在各种应用场景中都备受关注。然而,用于文本属性图(TAG)学习的 GCL 还有待探索。因为传统的增强技术(如特征嵌入屏蔽)无法直接处理 TAG 上的文本属性。将 GCL 应用于 TAG 的一种原始策略是通过语言模型将文本属性编码为特征嵌入,然后将嵌入输入到后续的 GCL 模块中进行处理。这种策略面临三个主要挑战:I) 无法避免信息丢失;II) 文本编码阶段的语义丢失;III) 隐式扩增约束导致结果难以控制和理解。在本文中,我们提出了一种名为 LATEX-GCL 的新型 GCL 框架,利用大语言模型(LLM)生成文本增强,并利用 LLM 强大的自然语言处理(NLP)能力来解决上述三个局限性,从而为将 GCL 应用于 TAG 任务铺平道路。在四个高质量 TAG 数据集上进行的广泛实验证明了所提出的 LATEX-GCL 方法的优越性。为了便于重现,我们发布了源代码和数据集,可通过以下链接访问:https://anonymous.4open.science/r/LATEX-GCL-0712。
{"title":"LATEX-GCL: Large Language Models (LLMs)-Based Data Augmentation for Text-Attributed Graph Contrastive Learning","authors":"Haoran Yang, Xiangyu Zhao, Sirui Huang, Qing Li, Guandong Xu","doi":"arxiv-2409.01145","DOIUrl":"https://doi.org/arxiv-2409.01145","url":null,"abstract":"Graph Contrastive Learning (GCL) is a potent paradigm for self-supervised\u0000graph learning that has attracted attention across various application\u0000scenarios. However, GCL for learning on Text-Attributed Graphs (TAGs) has yet\u0000to be explored. Because conventional augmentation techniques like feature\u0000embedding masking cannot directly process textual attributes on TAGs. A naive\u0000strategy for applying GCL to TAGs is to encode the textual attributes into\u0000feature embeddings via a language model and then feed the embeddings into the\u0000following GCL module for processing. Such a strategy faces three key\u0000challenges: I) failure to avoid information loss, II) semantic loss during the\u0000text encoding phase, and III) implicit augmentation constraints that lead to\u0000uncontrollable and incomprehensible results. In this paper, we propose a novel\u0000GCL framework named LATEX-GCL to utilize Large Language Models (LLMs) to\u0000produce textual augmentations and LLMs' powerful natural language processing\u0000(NLP) abilities to address the three limitations aforementioned to pave the way\u0000for applying GCL to TAG tasks. Extensive experiments on four high-quality TAG\u0000datasets illustrate the superiority of the proposed LATEX-GCL method. The\u0000source codes and datasets are released to ease the reproducibility, which can\u0000be accessed via this link: https://anonymous.4open.science/r/LATEX-GCL-0712.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unsupervised heterogeneous graph representation learning (UHGRL) has gained increasing attention due to its significance in handling practical graphs without labels. However, heterophily has been largely ignored, despite its ubiquitous presence in real-world heterogeneous graphs. In this paper, we define semantic heterophily and propose an innovative framework called Latent Graphs Guided Unsupervised Representation Learning (LatGRL) to handle this problem. First, we develop a similarity mining method that couples global structures and attributes, enabling the construction of fine-grained homophilic and heterophilic latent graphs to guide the representation learning. Moreover, we propose an adaptive dual-frequency semantic fusion mechanism to address the problem of node-level semantic heterophily. To cope with the massive scale of real-world data, we further design a scalable implementation. Extensive experiments on benchmark datasets validate the effectiveness and efficiency of our proposed framework. The source code and datasets have been made available at https://github.com/zxlearningdeep/LatGRL.
{"title":"When Heterophily Meets Heterogeneous Graphs: Latent Graphs Guided Unsupervised Representation Learning","authors":"Zhixiang Shen, Zhao Kang","doi":"arxiv-2409.00687","DOIUrl":"https://doi.org/arxiv-2409.00687","url":null,"abstract":"Unsupervised heterogeneous graph representation learning (UHGRL) has gained\u0000increasing attention due to its significance in handling practical graphs\u0000without labels. However, heterophily has been largely ignored, despite its\u0000ubiquitous presence in real-world heterogeneous graphs. In this paper, we\u0000define semantic heterophily and propose an innovative framework called Latent\u0000Graphs Guided Unsupervised Representation Learning (LatGRL) to handle this\u0000problem. First, we develop a similarity mining method that couples global\u0000structures and attributes, enabling the construction of fine-grained homophilic\u0000and heterophilic latent graphs to guide the representation learning. Moreover,\u0000we propose an adaptive dual-frequency semantic fusion mechanism to address the\u0000problem of node-level semantic heterophily. To cope with the massive scale of\u0000real-world data, we further design a scalable implementation. Extensive\u0000experiments on benchmark datasets validate the effectiveness and efficiency of\u0000our proposed framework. The source code and datasets have been made available\u0000at https://github.com/zxlearningdeep/LatGRL.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony Bonato, Juan Sebastian Chavez Palan, Adam Szava
The global banking system has faced increasing challenges in combating money laundering, necessitating advanced methods for detecting suspicious transactions. Anti-money laundering (or AML) approaches have often relied on predefined thresholds and machine learning algorithms using flagged transaction data, which are limited by the availability and accuracy of existing datasets. In this paper, we introduce a novel algorithm that leverages network analysis to detect potential money laundering activities within large-scale transaction data. Utilizing an anonymized transactional dataset from Co"operatieve Rabobank U.A., our method combines community detection via the Louvain algorithm and small cycle detection to identify suspicious transaction patterns below the regulatory reporting thresholds. Our approach successfully identifies cycles of transactions that may indicate layering steps in money laundering, providing a valuable tool for financial institutions to enhance their AML efforts. The results suggest the efficacy of our algorithm in pinpointing potentially illicit activities that evade current detection methods.
{"title":"Enhancing Anti-Money Laundering Efforts with Network-Based Algorithms","authors":"Anthony Bonato, Juan Sebastian Chavez Palan, Adam Szava","doi":"arxiv-2409.00823","DOIUrl":"https://doi.org/arxiv-2409.00823","url":null,"abstract":"The global banking system has faced increasing challenges in combating money\u0000laundering, necessitating advanced methods for detecting suspicious\u0000transactions. Anti-money laundering (or AML) approaches have often relied on\u0000predefined thresholds and machine learning algorithms using flagged transaction\u0000data, which are limited by the availability and accuracy of existing datasets.\u0000In this paper, we introduce a novel algorithm that leverages network analysis\u0000to detect potential money laundering activities within large-scale transaction\u0000data. Utilizing an anonymized transactional dataset from Co\"operatieve\u0000Rabobank U.A., our method combines community detection via the Louvain\u0000algorithm and small cycle detection to identify suspicious transaction patterns\u0000below the regulatory reporting thresholds. Our approach successfully identifies\u0000cycles of transactions that may indicate layering steps in money laundering,\u0000providing a valuable tool for financial institutions to enhance their AML\u0000efforts. The results suggest the efficacy of our algorithm in pinpointing\u0000potentially illicit activities that evade current detection methods.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Qin, Chaorui Zhang, Yu Gao, Yibin Ding, Weipeng Jiang, Weixi Zhang, Wei Han, Bo Bai
Graph partitioning (GP) is a classic problem that divides the node set of a graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge and recent advances in pre-training techniques (e.g., large-language models), we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel pre-training & refinement paradigm. We first conduct the offline pre-training of a deep graph learning (DGL) model on small synthetic graphs with various topology properties. By using the inductive inference of DGL, one can directly generalize the pre-trained model (with frozen model parameters) to large graphs and derive feasible GP results. We also use the derived partition as a good initialization of an efficient GP method (e.g., InfoMap) to further refine the quality of partitioning. In this setting, the online generalization and refinement of PR-GPT can not only benefit from the transfer ability regarding quality but also ensure high inference efficiency without re-training. Based on a mechanism of reducing the scale of a graph to be processed by the refinement method, PR-GPT also has the potential to support streaming GP. Experiments on the Graph Challenge benchmark demonstrate that PR-GPT can ensure faster GP on large-scale graphs without significant quality degradation, compared with running a refinement method from scratch. We will make our code public at https://github.com/KuroginQin/PRGPT.
图分割(GP)是一个经典问题,它将图的节点集分割成密集连接的块。继 IEEE HPEC Graph Challenge 和预训练技术(如大型语言模型)的最新进展之后,我们提出了基于新颖的预训练和精炼范式的 PR-GPT(Pre-trained & Refined Graph ParTitioning)。我们首先在具有不同拓扑特性的小型合成图上对深度图学习(DGL)模型进行离线预训练。通过使用 DGL 的归纳推理,我们可以直接将预训练模型(模型参数冻结)推广到大型图,并得出可行的 GP 结果。我们还将得出的分区作为高效 GP 方法(如 InfoMap)的良好初始化,以进一步完善分区的质量。在这种情况下,PR-GPT 的在线泛化和细化不仅能从质量转移能力中获益,还能在无需重新训练的情况下确保较高的推理效率。PR-GPT 的机制是缩小待处理图的规模,在此基础上,PR-GPT 还具有支持流式 GP 的潜力。在 Graph Challenge 基准上的实验表明,与从头开始运行细化方法相比,PR-GPT 可以确保在大规模图上更快地实现 GP,而不会出现明显的质量下降。我们将在 https://github.com/KuroginQin/PRGPT 公开我们的代码。
{"title":"Towards Faster Graph Partitioning via Pre-training and Inductive Inference","authors":"Meng Qin, Chaorui Zhang, Yu Gao, Yibin Ding, Weipeng Jiang, Weixi Zhang, Wei Han, Bo Bai","doi":"arxiv-2409.00670","DOIUrl":"https://doi.org/arxiv-2409.00670","url":null,"abstract":"Graph partitioning (GP) is a classic problem that divides the node set of a\u0000graph into densely-connected blocks. Following the IEEE HPEC Graph Challenge\u0000and recent advances in pre-training techniques (e.g., large-language models),\u0000we propose PR-GPT (Pre-trained & Refined Graph ParTitioning) based on a novel\u0000pre-training & refinement paradigm. We first conduct the offline pre-training\u0000of a deep graph learning (DGL) model on small synthetic graphs with various\u0000topology properties. By using the inductive inference of DGL, one can directly\u0000generalize the pre-trained model (with frozen model parameters) to large graphs\u0000and derive feasible GP results. We also use the derived partition as a good\u0000initialization of an efficient GP method (e.g., InfoMap) to further refine the\u0000quality of partitioning. In this setting, the online generalization and\u0000refinement of PR-GPT can not only benefit from the transfer ability regarding\u0000quality but also ensure high inference efficiency without re-training. Based on\u0000a mechanism of reducing the scale of a graph to be processed by the refinement\u0000method, PR-GPT also has the potential to support streaming GP. Experiments on\u0000the Graph Challenge benchmark demonstrate that PR-GPT can ensure faster GP on\u0000large-scale graphs without significant quality degradation, compared with\u0000running a refinement method from scratch. We will make our code public at\u0000https://github.com/KuroginQin/PRGPT.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Zhang, Wenchuan Yang, Jiawei Feng, Bitao Dai, Tianci Bu, Xin Lu
Identifying structures in common forms the basis for networked systems design and optimization. However, real structures represented by graphs are often of varying sizes, leading to the low accuracy of traditional graph classification methods. These graphs are called cross-scale graphs. To overcome this limitation, in this study, we propose GSpect, an advanced spectral graph filtering model for cross-scale graph classification tasks. Compared with other methods, we use graph wavelet neural networks for the convolution layer of the model, which aggregates multi-scale messages to generate graph representations. We design a spectral-pooling layer which aggregates nodes to one node to reduce the cross-scale graphs to the same size. We collect and construct the cross-scale benchmark data set, MSG (Multi Scale Graphs). Experiments reveal that, on open data sets, GSpect improves the performance of classification accuracy by 1.62% on average, and for a maximum of 3.33% on PROTEINS. On MSG, GSpect improves the performance of classification accuracy by 15.55% on average. GSpect fills the gap in cross-scale graph classification studies and has potential to provide assistance in application research like diagnosis of brain disease by predicting the brain network's label and developing new drugs with molecular structures learned from their counterparts in other systems.
{"title":"GSpect: Spectral Filtering for Cross-Scale Graph Classification","authors":"Xiaoyu Zhang, Wenchuan Yang, Jiawei Feng, Bitao Dai, Tianci Bu, Xin Lu","doi":"arxiv-2409.00338","DOIUrl":"https://doi.org/arxiv-2409.00338","url":null,"abstract":"Identifying structures in common forms the basis for networked systems design\u0000and optimization. However, real structures represented by graphs are often of\u0000varying sizes, leading to the low accuracy of traditional graph classification\u0000methods. These graphs are called cross-scale graphs. To overcome this\u0000limitation, in this study, we propose GSpect, an advanced spectral graph\u0000filtering model for cross-scale graph classification tasks. Compared with other\u0000methods, we use graph wavelet neural networks for the convolution layer of the\u0000model, which aggregates multi-scale messages to generate graph representations.\u0000We design a spectral-pooling layer which aggregates nodes to one node to reduce\u0000the cross-scale graphs to the same size. We collect and construct the\u0000cross-scale benchmark data set, MSG (Multi Scale Graphs). Experiments reveal\u0000that, on open data sets, GSpect improves the performance of classification\u0000accuracy by 1.62% on average, and for a maximum of 3.33% on PROTEINS. On MSG,\u0000GSpect improves the performance of classification accuracy by 15.55% on\u0000average. GSpect fills the gap in cross-scale graph classification studies and\u0000has potential to provide assistance in application research like diagnosis of\u0000brain disease by predicting the brain network's label and developing new drugs\u0000with molecular structures learned from their counterparts in other systems.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linda Hirsch, Florian Müller, Mari Kruse, Andreas Butz, Robin Welsch
Augmented Reality (AR) is evolving to become the next frontier in social media, merging physical and virtual reality into a living metaverse, a Social MediARverse. With this transition, we must understand how different contexts (public, semi-public, and private) affect user engagement with AR content. We address this gap in current research by conducting an online survey with 110 participants, showcasing 36 AR videos, and polling them about the content's fit and appropriateness. Specifically, we manipulated these three spaces, two forms of dynamism (dynamic vs. static), and two dimensionalities (2D vs. 3D). Our findings reveal that dynamic AR content is generally more favorably received than static content. Additionally, users find sharing and engaging with AR content in private settings more comfortable than in others. By this, the study offers valuable insights for designing and implementing future Social MediARverses and guides industry and academia on content visualization and contextual considerations.
增强现实(AR)正在发展成为社交媒体的下一个前沿领域,它将物理和虚拟现实融合成一个活生生的元宇宙,即社交媒体宇宙(SocialMediARverse)。在这一转变过程中,我们必须了解不同情境(公共、半公共和私人)如何影响用户对 AR 内容的参与。我们对 110 名参与者进行了在线调查,展示了 36 个 AR 视频,并就内容的适宜性和适当性进行了民意调查,从而弥补了当前研究中的这一空白。具体来说,我们操纵了三个空间、两种动态形式(动态与静态)和两个维度(2D 与 3D )。我们的研究结果表明,动态 AR 内容通常比静态内容更受欢迎。此外,用户认为在私人环境中分享和参与 AR 内容比在其他环境中更舒适。因此,这项研究为设计和实施未来的社交媒体穿越提供了宝贵的见解,并为业界和学术界在内容可视化和语境考虑方面提供了指导。
{"title":"Social MediARverse Investigating Users Social Media Content Sharing and Consuming Intentions with Location-Based AR","authors":"Linda Hirsch, Florian Müller, Mari Kruse, Andreas Butz, Robin Welsch","doi":"arxiv-2409.00211","DOIUrl":"https://doi.org/arxiv-2409.00211","url":null,"abstract":"Augmented Reality (AR) is evolving to become the next frontier in social\u0000media, merging physical and virtual reality into a living metaverse, a Social\u0000MediARverse. With this transition, we must understand how different contexts\u0000(public, semi-public, and private) affect user engagement with AR content. We\u0000address this gap in current research by conducting an online survey with 110\u0000participants, showcasing 36 AR videos, and polling them about the content's fit\u0000and appropriateness. Specifically, we manipulated these three spaces, two forms\u0000of dynamism (dynamic vs. static), and two dimensionalities (2D vs. 3D). Our\u0000findings reveal that dynamic AR content is generally more favorably received\u0000than static content. Additionally, users find sharing and engaging with AR\u0000content in private settings more comfortable than in others. By this, the study\u0000offers valuable insights for designing and implementing future Social\u0000MediARverses and guides industry and academia on content visualization and\u0000contextual considerations.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Argyrios Deligkas, Michelle Döring, Eduard Eiben, Tiger-Lily Goldsmith, George Skretas, Georg Tennigkeit
Logistics and transportation networks require a large amount of resources to realize necessary connections between locations and minimizing these resources is a vital aspect of planning research. Since such networks have dynamic connections that are only available at specific times, intricate models are needed to portray them accurately. In this paper, we study the problem of minimizing the number of resources needed to realize a dynamic network, using the temporal graphs model. In a temporal graph, edges appear at specific points in time. Given a temporal graph and a natural number k, we ask whether we can cover every temporal edge exactly once using at most k temporal journeys; in a temporal journey consecutive edges have to adhere to the order of time. We conduct a thorough investigation of the complexity of the problem with respect to four dimensions: (a) whether the type of the temporal journey is a walk, a trail, or a path; (b) whether the chronological order of edges in the journey is strict or non-strict; (c) whether the temporal graph is directed or undirected; (d) whether the start and end points of each journey are given or not. We almost completely resolve the complexity of all these problems and provide dichotomies for each one of them with respect to k.
物流和运输网络需要大量资源来实现不同地点之间的必要连接,最大限度地减少这些资源是规划研究的一个重要方面。由于此类网络具有仅在特定时间可用的动态连接,因此需要复杂的模型来准确描述它们。在本文中,我们利用时间图模型研究了最大限度减少实现动态网络所需资源数量的问题。在时序图中,边出现在特定的时间点上。给定一个时序图和一个自然数 k,我们要问的是,我们是否能用至多 k 个时序旅程将每条时序边精确地取消一次;在时序旅程中,连续的边必须遵守时间顺序。我们从四个方面对问题的复杂性进行了深入研究:(a)时间旅程的类型是步行、轨道还是路径;(b)旅程中边的时间顺序是严格的还是非严格的;(c)时间图是有向的还是无向的;(d)每个旅程的起点和终点是给定的还是非给定的。我们几乎完全解决了所有这些问题的复杂性,并为每个问题提供了与 k 有关的二分法。
{"title":"How Many Lines to Paint the City: Exact Edge-Cover in Temporal Graphs","authors":"Argyrios Deligkas, Michelle Döring, Eduard Eiben, Tiger-Lily Goldsmith, George Skretas, Georg Tennigkeit","doi":"arxiv-2408.17107","DOIUrl":"https://doi.org/arxiv-2408.17107","url":null,"abstract":"Logistics and transportation networks require a large amount of resources to\u0000realize necessary connections between locations and minimizing these resources\u0000is a vital aspect of planning research. Since such networks have dynamic\u0000connections that are only available at specific times, intricate models are\u0000needed to portray them accurately. In this paper, we study the problem of\u0000minimizing the number of resources needed to realize a dynamic network, using\u0000the temporal graphs model. In a temporal graph, edges appear at specific points\u0000in time. Given a temporal graph and a natural number k, we ask whether we can\u0000cover every temporal edge exactly once using at most k temporal journeys; in a\u0000temporal journey consecutive edges have to adhere to the order of time. We\u0000conduct a thorough investigation of the complexity of the problem with respect\u0000to four dimensions: (a) whether the type of the temporal journey is a walk, a\u0000trail, or a path; (b) whether the chronological order of edges in the journey\u0000is strict or non-strict; (c) whether the temporal graph is directed or\u0000undirected; (d) whether the start and end points of each journey are given or\u0000not. We almost completely resolve the complexity of all these problems and\u0000provide dichotomies for each one of them with respect to k.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is known that LLMs do hallucinate, that is, they return incorrect information as facts. In this paper, we introduce the possibility to study these hallucinations under a structured form: graphs. Hallucinations in this context are incorrect outputs when prompted for well known graphs from the literature (e.g. Karate club, Les Mis'erables, graph atlas). These hallucinated graphs have the advantage of being much richer than the factual accuracy -- or not -- of a fact; this paper thus argues that such rich hallucinations can be used to characterize the outputs of LLMs. Our first contribution observes the diversity of topological hallucinations from major modern LLMs. Our second contribution is the proposal of a metric for the amplitude of such hallucinations: the Graph Atlas Distance, that is the average graph edit distance from several graphs in the graph atlas set. We compare this metric to the Hallucination Leaderboard, a hallucination rank that leverages 10,000 times more prompts to obtain its ranking.
{"title":"LLMs hallucinate graphs too: a structural perspective","authors":"Erwan Le Merrer, Gilles Tredan","doi":"arxiv-2409.00159","DOIUrl":"https://doi.org/arxiv-2409.00159","url":null,"abstract":"It is known that LLMs do hallucinate, that is, they return incorrect\u0000information as facts. In this paper, we introduce the possibility to study\u0000these hallucinations under a structured form: graphs. Hallucinations in this\u0000context are incorrect outputs when prompted for well known graphs from the\u0000literature (e.g. Karate club, Les Mis'erables, graph atlas). These\u0000hallucinated graphs have the advantage of being much richer than the factual\u0000accuracy -- or not -- of a fact; this paper thus argues that such rich\u0000hallucinations can be used to characterize the outputs of LLMs. Our first\u0000contribution observes the diversity of topological hallucinations from major\u0000modern LLMs. Our second contribution is the proposal of a metric for the\u0000amplitude of such hallucinations: the Graph Atlas Distance, that is the average\u0000graph edit distance from several graphs in the graph atlas set. We compare this\u0000metric to the Hallucination Leaderboard, a hallucination rank that leverages\u000010,000 times more prompts to obtain its ranking.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To achieve truly seamless global intelligent connectivity, non-terrestrial networks (NTN) mainly composed of low earth orbit (LEO) satellites and drones are recognized as important components of the future 6G network architecture. Meanwhile, the rapid advancement of the Internet of Things (IoT) has led to the proliferation of numerous applications with stringent requirements for timely information delivery. The Age of Information (AoI), a critical performance metric for assessing the freshness of data in information update systems, has gained significant importance in this context. However, existing modeling and analysis work on AoI mainly focuses on terrestrial networks, and the distribution characteristics of ground nodes and the high dynamics of satellites have not been fully considered, which poses challenges for more accurate evaluation. Against this background, we model the ground nodes as a hybrid distribution of Poisson point process (PPP) and Poisson cluster process (PCP) to capture the impact of ground node distribution on the AoI of status update packet transmission supported by UAVs and satellites in NTN, and the visibility and cross-traffic characteristics of satellites are additionally considered. We derived the average AoI for the system in these two different situations and examined the impact of various network parameters on AoI performance.
{"title":"Service-Oriented AoI Modeling and Analysis for Non-Terrestrial Networks","authors":"Zheng Guo, Qian Chen, Weixiao Meng","doi":"arxiv-2408.17051","DOIUrl":"https://doi.org/arxiv-2408.17051","url":null,"abstract":"To achieve truly seamless global intelligent connectivity, non-terrestrial\u0000networks (NTN) mainly composed of low earth orbit (LEO) satellites and drones\u0000are recognized as important components of the future 6G network architecture.\u0000Meanwhile, the rapid advancement of the Internet of Things (IoT) has led to the\u0000proliferation of numerous applications with stringent requirements for timely\u0000information delivery. The Age of Information (AoI), a critical performance\u0000metric for assessing the freshness of data in information update systems, has\u0000gained significant importance in this context. However, existing modeling and\u0000analysis work on AoI mainly focuses on terrestrial networks, and the\u0000distribution characteristics of ground nodes and the high dynamics of\u0000satellites have not been fully considered, which poses challenges for more\u0000accurate evaluation. Against this background, we model the ground nodes as a\u0000hybrid distribution of Poisson point process (PPP) and Poisson cluster process\u0000(PCP) to capture the impact of ground node distribution on the AoI of status\u0000update packet transmission supported by UAVs and satellites in NTN, and the\u0000visibility and cross-traffic characteristics of satellites are additionally\u0000considered. We derived the average AoI for the system in these two different\u0000situations and examined the impact of various network parameters on AoI\u0000performance.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Victor Brabant, Yasaman Asgari, Pierre Borgnat, Angela Bonifati, Remy Cazabet
Temporal networks are commonly used to model real-life phenomena. When these phenomena represent interactions and are captured at a fine-grained temporal resolution, they are modeled as link streams. Community detection is an essential network analysis task. Although many methods exist for static networks, and some methods have been developed for temporal networks represented as sequences of snapshots, few works can handle link streams. This article introduces the first adaptation of the well-known Modularity quality function to link streams. Unlike existing methods, it is independent of the time scale of analysis. After introducing the quality function, and its relation to existing static and dynamic definitions of Modularity, we show experimentally its relevance for dynamic community evaluation.
{"title":"Longitudinal Modularity, a Modularity for Link Streams","authors":"Victor Brabant, Yasaman Asgari, Pierre Borgnat, Angela Bonifati, Remy Cazabet","doi":"arxiv-2408.16877","DOIUrl":"https://doi.org/arxiv-2408.16877","url":null,"abstract":"Temporal networks are commonly used to model real-life phenomena. When these\u0000phenomena represent interactions and are captured at a fine-grained temporal\u0000resolution, they are modeled as link streams. Community detection is an\u0000essential network analysis task. Although many methods exist for static\u0000networks, and some methods have been developed for temporal networks\u0000represented as sequences of snapshots, few works can handle link streams. This\u0000article introduces the first adaptation of the well-known Modularity quality\u0000function to link streams. Unlike existing methods, it is independent of the\u0000time scale of analysis. After introducing the quality function, and its\u0000relation to existing static and dynamic definitions of Modularity, we show\u0000experimentally its relevance for dynamic community evaluation.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}