图像聚类的多级跨模态对齐

Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai
{"title":"图像聚类的多级跨模态对齐","authors":"Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai","doi":"10.48550/arXiv.2401.11740","DOIUrl":null,"url":null,"abstract":"Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.","PeriodicalId":518480,"journal":{"name":"AAAI Conference on Artificial Intelligence","volume":"290 17","pages":"14695-14703"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-level Cross-modal Alignment for Image Clustering\",\"authors\":\"Liping Qiu, Qin Zhang, Xiaojun Chen, Shao-Qian Cai\",\"doi\":\"10.48550/arXiv.2401.11740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.\",\"PeriodicalId\":518480,\"journal\":{\"name\":\"AAAI Conference on Artificial Intelligence\",\"volume\":\"290 17\",\"pages\":\"14695-14703\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AAAI Conference on Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2401.11740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.11740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近,跨模态预训练模型被用来生成有意义的伪标签,以监督图像聚类模型的训练。然而,跨模态预训练模型中的大量错误配准可能会产生劣质的伪标签并降低聚类性能。为了解决上述问题,我们提出了一种新颖的多层次跨模态对齐方法,通过建立一个更小但更好的语义空间,并在三个层次(即实例层次、原型层次和语义层次)上对图像和文本进行对齐,从而改进下游任务的跨模态预训练模型中的对齐。理论结果表明,我们提出的方法是收敛的,并提出了有效的方法来降低我们方法的预期聚类风险。在五个基准数据集上的实验结果清楚地表明了我们的新方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-level Cross-modal Alignment for Image Clustering
Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction Learning to Stop Cut Generation for Efficient Mixed-Integer Linear Programming Knowledge-Aware Neuron Interpretation for Scene Classification Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing Sketch and Refine: Towards Fast and Accurate Lane Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1