Yu Duan , Zhanxuan Hu , Rong Wang , Zhensheng Sun , Feiping Nie , Xuelong Li
{"title":"Mutual-support generalized category discovery","authors":"Yu Duan , Zhanxuan Hu , Rong Wang , Zhensheng Sun , Feiping Nie , Xuelong Li","doi":"10.1016/j.inffus.2025.103020","DOIUrl":null,"url":null,"abstract":"<div><div>This work focuses on the problem of Generalized Category Discovery (GCD), a more realistic and challenging semi-supervised learning setting where unlabeled data may belong to either previously known or unseen categories. Recent advancements have demonstrated the efficacy of both pseudo-label-based parametric classification methods and representation-based non-parametric classification methods in tackling this problem. However, there exists a gap in the literature concerning the integration of their respective advantages. The former tends to be biased towards the ’Old’ categories, making it easier to classify samples into the ’Old’ groups. The latter cannot learn discriminative representations, decreasing the clustering performance. To this end, we propose Mutual-Support Generalized Category Discovery (MSGCD), a framework that unifies these two paradigms, leveraging their strengths in a mutually reinforcing manner. It simultaneously learns high-quality pseudo-labels and discriminative representations. It incorporates a novel <em>Mutual-Support mechanism</em> to facilitate symbiotic enhancement. Specifically, high-quality pseudo-labels furnish valuable weakly supervised information for learning discriminative representations, while discriminative representations enable the estimation of semantic similarity between samples, guiding the model in generating more reliable pseudo-labels. MSGCD is remarkably effective, achieving state-of-the-art results on several datasets. Moreover, <em>Mutual-Support mechanism</em> is not only effective in image classification tasks, but also provides intuition for cross-modal representation learning, open-world image segmentation, and recognition. The codes is available at <span><span>https://github.com/DuannYu/MSGCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103020"},"PeriodicalIF":14.7000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000934","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This work focuses on the problem of Generalized Category Discovery (GCD), a more realistic and challenging semi-supervised learning setting where unlabeled data may belong to either previously known or unseen categories. Recent advancements have demonstrated the efficacy of both pseudo-label-based parametric classification methods and representation-based non-parametric classification methods in tackling this problem. However, there exists a gap in the literature concerning the integration of their respective advantages. The former tends to be biased towards the ’Old’ categories, making it easier to classify samples into the ’Old’ groups. The latter cannot learn discriminative representations, decreasing the clustering performance. To this end, we propose Mutual-Support Generalized Category Discovery (MSGCD), a framework that unifies these two paradigms, leveraging their strengths in a mutually reinforcing manner. It simultaneously learns high-quality pseudo-labels and discriminative representations. It incorporates a novel Mutual-Support mechanism to facilitate symbiotic enhancement. Specifically, high-quality pseudo-labels furnish valuable weakly supervised information for learning discriminative representations, while discriminative representations enable the estimation of semantic similarity between samples, guiding the model in generating more reliable pseudo-labels. MSGCD is remarkably effective, achieving state-of-the-art results on several datasets. Moreover, Mutual-Support mechanism is not only effective in image classification tasks, but also provides intuition for cross-modal representation learning, open-world image segmentation, and recognition. The codes is available at https://github.com/DuannYu/MSGCD.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.