MHGC: Multi-scale hard sample mining for contrastive deep graph clustering

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2025-07-01 Epub Date: 2025-02-13 DOI:10.1016/j.ipm.2025.104084

Tao Ren , Haodong Zhang , Yifan Wang , Wei Ju , Chengwu Liu , Fanchun Meng , Siyu Yi , Xiao Luo

{"title":"MHGC: Multi-scale hard sample mining for contrastive deep graph clustering","authors":"Tao Ren , Haodong Zhang , Yifan Wang , Wei Ju , Chengwu Liu , Fanchun Meng , Siyu Yi , Xiao Luo","doi":"10.1016/j.ipm.2025.104084","DOIUrl":null,"url":null,"abstract":"<div><div>Contrastive graph clustering holds significant importance for numerous real-world applications and yields encouraging performance. However, current efforts often overlook hierarchical high-order semantic information and treat all contrastive pairs equally during optimization. Consequently, the abundance of well sample pairs overwhelms the critical structural context learning process, limiting the accumulation of information and deteriorating the network’s learning capability. To address this concern, a novel contrastive deep graph clustering method termed MHGC is proposed by conducting hard sample mining in contrastive learning with multi-granularity. Specifically, random walk with restart is utilized to sample subgraphs centered around anchor nodes. Then, an attribute encoder to learn node representations is designed to obtain subgraph embeddings. Subsequently, hard and easy sample pairs within high-confidence clusters is identified by applying a two-component beta mixture model to the clustering loss. Building upon this, a weight regulator is then elaborated to adaptively tune the weights of sample pairs and a multi-scale contrastive loss framework is proposed to leverage structural context information in a hierarchical contrastive manner. Comprehensive experiments conducted on six widely used datasets confirm the comparable performance of our MHGC relative to the state-of-the-art baselines, demonstrating an average increase of 1.54% in accuracy. Additionally, the ablation study further proves that our proposed multi-scale learning scheme and BMM-based hard mining strategy are effective approaches for the graph clustering task. The source code is available at <span><span>https://github.com/sodarin/MHGC</span><svg><path></path></svg></span></div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104084"},"PeriodicalIF":6.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325000263","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Contrastive graph clustering holds significant importance for numerous real-world applications and yields encouraging performance. However, current efforts often overlook hierarchical high-order semantic information and treat all contrastive pairs equally during optimization. Consequently, the abundance of well sample pairs overwhelms the critical structural context learning process, limiting the accumulation of information and deteriorating the network’s learning capability. To address this concern, a novel contrastive deep graph clustering method termed MHGC is proposed by conducting hard sample mining in contrastive learning with multi-granularity. Specifically, random walk with restart is utilized to sample subgraphs centered around anchor nodes. Then, an attribute encoder to learn node representations is designed to obtain subgraph embeddings. Subsequently, hard and easy sample pairs within high-confidence clusters is identified by applying a two-component beta mixture model to the clustering loss. Building upon this, a weight regulator is then elaborated to adaptively tune the weights of sample pairs and a multi-scale contrastive loss framework is proposed to leverage structural context information in a hierarchical contrastive manner. Comprehensive experiments conducted on six widely used datasets confirm the comparable performance of our MHGC relative to the state-of-the-art baselines, demonstrating an average increase of 1.54% in accuracy. Additionally, the ablation study further proves that our proposed multi-scale learning scheme and BMM-based hard mining strategy are effective approaches for the graph clustering task. The source code is available at https://github.com/sodarin/MHGC

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MHGC：用于对比深度图聚类的多尺度硬样本挖掘

对比图聚类对于许多实际应用程序具有重要意义，并产生令人鼓舞的性能。然而，目前的研究往往忽略了分层的高阶语义信息，并在优化过程中平等地对待所有对比对。因此，井样对的丰富程度压倒了关键的结构上下文学习过程，限制了信息的积累，降低了网络的学习能力。为了解决这一问题，本文通过在多粒度对比学习中进行硬样本挖掘，提出了一种新的对比深度图聚类方法MHGC。具体来说，利用随机行走重新开始对以锚节点为中心的子图进行采样。然后，设计了一个学习节点表示的属性编码器来获得子图嵌入。随后，采用双组分β混合模型对聚类损失进行识别，确定了高置信度聚类中的难易样本对。在此基础上，阐述了一个权重调节器来自适应调整样本对的权重，并提出了一个多尺度对比损失框架，以分层对比的方式利用结构上下文信息。在六个广泛使用的数据集上进行的综合实验证实了我们的MHGC相对于最先进的基线的可比性，表明准确率平均提高了1.54%。此外，消融研究进一步证明了我们提出的多尺度学习方案和基于hmm的硬挖掘策略是解决图聚类任务的有效方法。源代码可从https://github.com/sodarin/MHGC获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.