图像聚类的多级跨模态对齐

AAAI Conference on Artificial Intelligence Pub Date : 2024-01-22 DOI:10.48550/arXiv.2401.11740

Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai

{"title":"图像聚类的多级跨模态对齐","authors":"Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai","doi":"10.48550/arXiv.2401.11740","DOIUrl":null,"url":null,"abstract":"Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"290 17","pages":"14695-14703"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-level Cross-modal Alignment for Image Clustering\",\"authors\":\"Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai\",\"doi\":\"10.48550/arXiv.2401.11740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.\",\"PeriodicalId\":518480,\"journal\":{\"name\":\"AAAI Conference on Artificial Intelligence\",\"volume\":\"290 17\",\"pages\":\"14695-14703\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AAAI Conference on Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2401.11740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.11740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，跨模态预训练模型被用来生成有意义的伪标签，以监督图像聚类模型的训练。然而，跨模态预训练模型中的大量错误配准可能会产生劣质的伪标签并降低聚类性能。为了解决上述问题，我们提出了一种新颖的多层次跨模态对齐方法，通过建立一个更小但更好的语义空间，并在三个层次（即实例层次、原型层次和语义层次）上对图像和文本进行对齐，从而改进下游任务的跨模态预训练模型中的对齐。理论结果表明，我们提出的方法是收敛的，并提出了有效的方法来降低我们方法的预期聚类风险。在五个基准数据集上的实验结果清楚地表明了我们的新方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-level Cross-modal Alignment for Image Clustering

Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AAAI Conference on Artificial Intelligence

自引率

0.00%

发文量