Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-12-25 DOI:10.1109/ICCV.2017.592

Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales

{"title":"Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval","authors":"Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales","doi":"10.1109/ICCV.2017.592","DOIUrl":null,"url":null,"abstract":"Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"5552-5561"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"196","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2017.592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 196

Abstract

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于细粒度草图的图像检索的深度空间语义关注

人类素描在捕捉视觉对象的空间拓扑结构以及其微妙的外观细节方面是独一无二的。基于细粒度草图的图像检索(FG-SBIR)重要地利用了草图的这种细粒度特征来执行照片的实例级检索。然而，人类草图往往是高度抽象和标志性的，导致与候选照片严重错位，这反过来又使微妙的视觉细节匹配困难。现有的FG-SBIR方法仅侧重于通过深度跨域表示学习进行粗整体匹配，而忽略了对细粒度细节及其空间上下文的明确考虑。本文提出了一种新的深度FG-SBIR模型，该模型与现有模型有很大的不同:(1)通过引入对视觉细节空间位置敏感的注意模块实现空间感知;(2)通过一个快捷连接融合块将粗、精语义信息结合起来;(3)通过引入一种新的基于高阶可学习能量函数(HOLEF)的损失，对特征相关性进行建模，并对两个域中提取的特征之间的不对准具有鲁棒性。大量的实验表明，所提出的深度空间语义注意模型明显优于现有的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量

期刊最新文献

Visual Odometry for Pixel Processor Arrays Rolling Shutter Correction in Manhattan World Sketching with Style: Visual Search with Sketches and Aesthetic Context Active Learning for Human Pose Estimation Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks