基于跨模态变换的弱监督深度图像哈希

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI:10.23919/MVA57639.2023.10216160

Ching-Ching Yang, W. Chu, S. Dubey

{"title":"基于跨模态变换的弱监督深度图像哈希","authors":"Ching-Ching Yang, W. Chu, S. Dubey","doi":"10.23919/MVA57639.2023.10216160","DOIUrl":null,"url":null,"abstract":"Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer\",\"authors\":\"Ching-Ching Yang, W. Chu, S. Dubey\",\"doi\":\"10.23919/MVA57639.2023.10216160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.\",\"PeriodicalId\":338734,\"journal\":{\"name\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/MVA57639.2023.10216160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10216160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于与上下文文本或标签相关的网络图像大量存在，弱监督图像哈希算法应运而生。与图像弱相关的文本信息可以用来指导深度哈希网络的学习。本文提出了一种基于跨模态变换的弱监督深度哈希算法(WHCMT)。首先，发现图像块之间的跨尺度关注可以形成更有效的视觉表征。采用基线转换器寻找标签的自关注，形成标签表示。其次，利用所提出的跨模态转换器发现图像和标签之间的跨模态关注。然后通过嵌入层生成有效的哈希码。WHCMT在语义图像检索上进行了测试，我们展示了在MIRFLICKR-25K数据集和NUS-WIDE数据集上可以获得新的最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer

Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量