Hierarchical Composition Learning for Composed Query Image Retrieval

ACM Multimedia Asia Pub Date : 2021-12-01 DOI:10.1145/3469877.3490601

Yahui Xu, Yi Bin, Guoqing Wang, Yang Yang

{"title":"Hierarchical Composition Learning for Composed Query Image Retrieval","authors":"Yahui Xu, Yi Bin, Guoqing Wang, Yang Yang","doi":"10.1145/3469877.3490601","DOIUrl":null,"url":null,"abstract":"Composed query image retrieval is a growing research topic. The object is to retrieve images not only generally resemble the reference image, but differ according to the desired modification text. Existing methods mainly explore composing modification text with global feature or local entity descriptor of reference image. However, they ignore the fact that modification text is indeed diverse and arbitrary. It not only relates to abstractive global feature or concrete local entity transformation, but also often associates with the fine-grained structured visual adjustment. Thus, it is insufficient to emphasize the global or local entity visual for the query composition. In this work, we tackle this task by hierarchical composition learning. Specifically, the proposed method first encodes images into three representations consisting of global, entity and structure level representations. Structure level representation is richly explicable, which explicitly describes entities as well as attributes and relationships in the image with a directed graph. Based on these, we naturally perform hierarchical composition learning by fusing modification text and reference image in the global-entity-structure manner. It can transform the visual feature conditioned on modification text to target image in a coarse-to-fine manner, which takes advantage of the complementary information among three levels. Moreover, we introduce a hybrid space matching to explore global, entity and structure alignments which can get high performance and good interpretability.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Composed query image retrieval is a growing research topic. The object is to retrieve images not only generally resemble the reference image, but differ according to the desired modification text. Existing methods mainly explore composing modification text with global feature or local entity descriptor of reference image. However, they ignore the fact that modification text is indeed diverse and arbitrary. It not only relates to abstractive global feature or concrete local entity transformation, but also often associates with the fine-grained structured visual adjustment. Thus, it is insufficient to emphasize the global or local entity visual for the query composition. In this work, we tackle this task by hierarchical composition learning. Specifically, the proposed method first encodes images into three representations consisting of global, entity and structure level representations. Structure level representation is richly explicable, which explicitly describes entities as well as attributes and relationships in the image with a directed graph. Based on these, we naturally perform hierarchical composition learning by fusing modification text and reference image in the global-entity-structure manner. It can transform the visual feature conditioned on modification text to target image in a coarse-to-fine manner, which takes advantage of the complementary information among three levels. Moreover, we introduce a hybrid space matching to explore global, entity and structure alignments which can get high performance and good interpretability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向组合查询图像检索的分层组合学习

组合查询图像检索是一个新兴的研究课题。目标是检索图像不仅一般类似于参考图像，而且根据所需的修改文本有所不同。现有方法主要探索利用参考图像的全局特征或局部实体描述符构成修改文本。然而，他们忽略了一个事实，即修改文本确实是多样和任意的。它不仅涉及抽象的全局特征或具体的局部实体转换，而且往往涉及细粒度的结构化视觉调整。因此，强调查询组合的全局或局部实体可视化是不够的。在这项工作中，我们通过分层作文学习来解决这个问题。具体而言，该方法首先将图像编码为全局级、实体级和结构级三种表示。结构级表示具有丰富的可解释性，它用有向图显式地描述图像中的实体以及属性和关系。在此基础上，我们自然地以全局实体-结构的方式融合修改文本和参考图像进行分层组成学习。它利用三层间的互补信息，将以修改文本为条件的视觉特征以粗到精的方式转化为目标图像。此外，我们还引入了一种混合空间匹配方法来探索全局、实体和结构对齐，以获得高性能和良好的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量