LSPC：基于本地语义信息和原型的对比聚类探索

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Systems Pub Date : 2024-03-01 Epub Date: 2023-12-13 DOI:10.1016/j.is.2023.102336

Jun-Fen Chen, Lang Sun, Bo-Jun Xie

{"title":"LSPC：基于本地语义信息和原型的对比聚类探索","authors":"Jun-Fen Chen, Lang Sun, Bo-Jun Xie","doi":"10.1016/j.is.2023.102336","DOIUrl":null,"url":null,"abstract":"<div><p>Recently years, several prominent contrastive learning<span><span> algorithms, a kind of self-supervised learning methods, have been extensively studied that can efficiently extract useful feature representations from input images by means of data augmentation techniques. How to further partition the representations into meaningful clusters is the issue that deep clustering is addressing. In this work, a deep </span>clustering algorithm based on local semantic information and prototype is proposed referring to LSPC that aims at learning a group of representative prototypes. Rather than learning the distinguishing characteristics between different images, more attention is given to the essential characteristics of images that are maybe from a potential category. On the training framework, contrastive learning is skillfully combined with k-means clustering algorithm. The prediction is transformed into soft assignments for end-to-end training. In order to enable the model to accurately capture the semantic information between images, we mine similar samples of training samples in the embedded space as local semantic information to effectively increase the similarity between samples belonging to the same cluster. Experimental results show that our algorithm achieves state-of-the-art performance on several commonly used public datasets, and additional experiments prove that this superior clustering performance can also be extended to large datasets such as ImageNet.</span></p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"121 ","pages":"Article 102336"},"PeriodicalIF":3.4000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LSPC: Exploring contrastive clustering based on local semantic information and prototype\",\"authors\":\"Jun-Fen Chen, Lang Sun, Bo-Jun Xie\",\"doi\":\"10.1016/j.is.2023.102336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recently years, several prominent contrastive learning<span><span> algorithms, a kind of self-supervised learning methods, have been extensively studied that can efficiently extract useful feature representations from input images by means of data augmentation techniques. How to further partition the representations into meaningful clusters is the issue that deep clustering is addressing. In this work, a deep </span>clustering algorithm based on local semantic information and prototype is proposed referring to LSPC that aims at learning a group of representative prototypes. Rather than learning the distinguishing characteristics between different images, more attention is given to the essential characteristics of images that are maybe from a potential category. On the training framework, contrastive learning is skillfully combined with k-means clustering algorithm. The prediction is transformed into soft assignments for end-to-end training. In order to enable the model to accurately capture the semantic information between images, we mine similar samples of training samples in the embedded space as local semantic information to effectively increase the similarity between samples belonging to the same cluster. Experimental results show that our algorithm achieves state-of-the-art performance on several commonly used public datasets, and additional experiments prove that this superior clustering performance can also be extended to large datasets such as ImageNet.</span></p></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"121 \",\"pages\":\"Article 102336\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437923001722\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/12/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437923001722","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

近年来，对比学习算法作为一种自监督学习方法得到了广泛的研究，它可以通过数据增强技术有效地从输入图像中提取有用的特征表示。如何将表示进一步划分为有意义的聚类是深度聚类要解决的问题。本文在LSPC的基础上，提出了一种基于局部语义信息和原型的深度聚类算法，旨在学习一组具有代表性的原型。比起学习不同图像之间的区别特征，更多的是关注可能来自潜在类别的图像的本质特征。在训练框架上，将对比学习与k-means聚类算法巧妙结合。将预测转化为端到端训练的软任务。为了使模型能够准确地捕获图像之间的语义信息，我们在嵌入空间中挖掘训练样本的相似样本作为局部语义信息，有效地增加了属于同一聚类的样本之间的相似度。实验结果表明，我们的算法在几个常用的公共数据集上达到了最先进的性能，另外的实验证明，这种优越的聚类性能也可以扩展到像ImageNet这样的大型数据集上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LSPC: Exploring contrastive clustering based on local semantic information and prototype

Recently years, several prominent contrastive learning algorithms, a kind of self-supervised learning methods, have been extensively studied that can efficiently extract useful feature representations from input images by means of data augmentation techniques. How to further partition the representations into meaningful clusters is the issue that deep clustering is addressing. In this work, a deep clustering algorithm based on local semantic information and prototype is proposed referring to LSPC that aims at learning a group of representative prototypes. Rather than learning the distinguishing characteristics between different images, more attention is given to the essential characteristics of images that are maybe from a potential category. On the training framework, contrastive learning is skillfully combined with k-means clustering algorithm. The prediction is transformed into soft assignments for end-to-end training. In order to enable the model to accurately capture the semantic information between images, we mine similar samples of training samples in the embedded space as local semantic information to effectively increase the similarity between samples belonging to the same cluster. Experimental results show that our algorithm achieves state-of-the-art performance on several commonly used public datasets, and additional experiments prove that this superior clustering performance can also be extended to large datasets such as ImageNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.