基于细粒度草图的图像检索的深度空间语义关注

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI:10.1109/ICCV.2017.592

Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales

{"title":"基于细粒度草图的图像检索的深度空间语义关注","authors":"Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales","doi":"10.1109/ICCV.2017.592","DOIUrl":null,"url":null,"abstract":"Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"5552-5561"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"196","resultStr":"{\"title\":\"Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval\",\"authors\":\"Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales\",\"doi\":\"10.1109/ICCV.2017.592\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.\",\"PeriodicalId\":6559,\"journal\":{\"name\":\"2017 IEEE International Conference on Computer Vision (ICCV)\",\"volume\":\"35 1\",\"pages\":\"5552-5561\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"196\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Computer Vision (ICCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2017.592\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2017.592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 196

摘要

人类素描在捕捉视觉对象的空间拓扑结构以及其微妙的外观细节方面是独一无二的。基于细粒度草图的图像检索(FG-SBIR)重要地利用了草图的这种细粒度特征来执行照片的实例级检索。然而，人类草图往往是高度抽象和标志性的，导致与候选照片严重错位，这反过来又使微妙的视觉细节匹配困难。现有的FG-SBIR方法仅侧重于通过深度跨域表示学习进行粗整体匹配，而忽略了对细粒度细节及其空间上下文的明确考虑。本文提出了一种新的深度FG-SBIR模型，该模型与现有模型有很大的不同:(1)通过引入对视觉细节空间位置敏感的注意模块实现空间感知;(2)通过一个快捷连接融合块将粗、精语义信息结合起来;(3)通过引入一种新的基于高阶可学习能量函数(HOLEF)的损失，对特征相关性进行建模，并对两个域中提取的特征之间的不对准具有鲁棒性。大量的实验表明，所提出的深度空间语义注意模型明显优于现有的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助