首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Active Learning for Human Pose Estimation 基于主动学习的人体姿态估计
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.468
Buyu Liu, V. Ferrari
Annotating human poses in realistic scenes is very time consuming, yet necessary for training human pose estimators. We propose to address this problem in an active learning framework, which alternates between requesting the most useful annotations among a large set of unlabelled images, and re-training the pose estimator. To this end, (1) we propose an uncertainty estimator specific for body joint predictions, which takes into account the spatial distribution of the responses of the current pose estimator on the unlabelled images; (2) we propose a dynamic combination of influence and uncertainty cues, where their weights vary during the active learning process according to the reliability of the current pose estimator; (3) we introduce a computer assisted annotation interface, which reduces the time necessary for a human annotator to click on a joint by discretizing the image into regions generated by the current pose estimator. Experiments using the MPII and LSP datasets with both simulated and real annotators show that (1) the proposed active selection scheme outperforms several baselines; (2) our computer-assisted interface can further reduce annotation effort; and (3) our technique can further improve the performance of a pose estimator even when starting from an already strong one.
在真实场景中标注人体姿势非常耗时,但对于训练人体姿势估计器是必要的。我们建议在主动学习框架中解决这个问题,该框架在请求大量未标记图像中最有用的注释和重新训练姿态估计器之间交替进行。为此,(1)我们提出了一种针对身体关节预测的不确定性估计器,该估计器考虑了当前姿态估计器在未标记图像上响应的空间分布;(2)我们提出了影响和不确定性线索的动态组合,其中它们的权重在主动学习过程中根据当前姿态估计器的可靠性而变化;(3)引入了一种计算机辅助标注界面,通过将图像离散到当前姿态估计器生成的区域,减少了人类标注者点击关节所需的时间。在MPII和LSP数据集上使用模拟和真实标注器进行的实验表明:(1)主动选择方案优于多个基线;(2)我们的计算机辅助界面可以进一步减少标注工作量;(3)我们的技术可以进一步提高姿态估计器的性能,即使从一个已经很强的姿态估计器开始。
{"title":"Active Learning for Human Pose Estimation","authors":"Buyu Liu, V. Ferrari","doi":"10.1109/ICCV.2017.468","DOIUrl":"https://doi.org/10.1109/ICCV.2017.468","url":null,"abstract":"Annotating human poses in realistic scenes is very time consuming, yet necessary for training human pose estimators. We propose to address this problem in an active learning framework, which alternates between requesting the most useful annotations among a large set of unlabelled images, and re-training the pose estimator. To this end, (1) we propose an uncertainty estimator specific for body joint predictions, which takes into account the spatial distribution of the responses of the current pose estimator on the unlabelled images; (2) we propose a dynamic combination of influence and uncertainty cues, where their weights vary during the active learning process according to the reliability of the current pose estimator; (3) we introduce a computer assisted annotation interface, which reduces the time necessary for a human annotator to click on a joint by discretizing the image into regions generated by the current pose estimator. Experiments using the MPII and LSP datasets with both simulated and real annotators show that (1) the proposed active selection scheme outperforms several baselines; (2) our computer-assisted interface can further reduce annotation effort; and (3) our technique can further improve the performance of a pose estimator even when starting from an already strong one.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"28 1","pages":"4373-4382"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84362176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras 走风景路线到3D:从移动摄像机优化重建
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.501
Oscar Alejandro Mendez Maldonado, Simon Hadfield, N. Pugeault, R. Bowden
Reconstruction of 3D environments is a problem that has been widely addressed in the literature. While many approaches exist to perform reconstruction, few of them take an active role in deciding where the next observations should come from. Furthermore, the problem of travelling from the camera's current position to the next, known as pathplanning, usually focuses on minimising path length. This approach is ill-suited for reconstruction applications, where learning about the environment is more valuable than speed of traversal. We present a novel Scenic Route Planner that selects paths which maximise information gain, both in terms of total map coverage and reconstruction accuracy. We also introduce a new type of collaborative behaviour into the planning stage called opportunistic collaboration, which allows sensors to switch between acting as independent Structure from Motion (SfM) agents or as a variable baseline stereo pair. We show that Scenic Planning enables similar performance to state-of-the-art batch approaches using less than 0.00027% of the possible stereo pairs (3% of the views). Comparison against length-based pathplanning approaches show that our approach produces more complete and more accurate maps with fewer frames. Finally, we demonstrate the Scenic Pathplanner's ability to generalise to live scenarios by mounting cameras on autonomous ground-based sensor platforms and exploring an environment.
三维环境的重建是一个在文献中被广泛讨论的问题。虽然有许多方法可以进行重建,但很少有方法在决定下一次观察的来源方面发挥积极作用。此外,从摄像机当前位置移动到下一个位置的问题,即路径规划,通常关注于最小化路径长度。这种方法不适合重建应用程序,因为在重建应用程序中,了解环境比遍历速度更有价值。我们提出了一种新颖的风景路线规划器,它可以选择在总地图覆盖和重建精度方面最大化信息增益的路径。我们还在规划阶段引入了一种新型的协作行为,称为机会协作,它允许传感器在作为独立的结构从运动(SfM)代理或作为可变基线立体对之间切换。我们表明,景观规划可以使用少于0.00027%的可能立体对(3%的视图)实现与最先进的批处理方法相似的性能。与基于长度的路径规划方法的比较表明,我们的方法可以用更少的帧生成更完整、更准确的地图。最后,我们通过在自主地面传感器平台上安装摄像头和探索环境,展示了Scenic Pathplanner归纳到现场场景的能力。
{"title":"Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras","authors":"Oscar Alejandro Mendez Maldonado, Simon Hadfield, N. Pugeault, R. Bowden","doi":"10.1109/ICCV.2017.501","DOIUrl":"https://doi.org/10.1109/ICCV.2017.501","url":null,"abstract":"Reconstruction of 3D environments is a problem that has been widely addressed in the literature. While many approaches exist to perform reconstruction, few of them take an active role in deciding where the next observations should come from. Furthermore, the problem of travelling from the camera's current position to the next, known as pathplanning, usually focuses on minimising path length. This approach is ill-suited for reconstruction applications, where learning about the environment is more valuable than speed of traversal. We present a novel Scenic Route Planner that selects paths which maximise information gain, both in terms of total map coverage and reconstruction accuracy. We also introduce a new type of collaborative behaviour into the planning stage called opportunistic collaboration, which allows sensors to switch between acting as independent Structure from Motion (SfM) agents or as a variable baseline stereo pair. We show that Scenic Planning enables similar performance to state-of-the-art batch approaches using less than 0.00027% of the possible stereo pairs (3% of the views). Comparison against length-based pathplanning approaches show that our approach produces more complete and more accurate maps with fewer frames. Finally, we demonstrate the Scenic Pathplanner's ability to generalise to live scenarios by mounting cameras on autonomous ground-based sensor platforms and exploring an environment.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"4 1","pages":"4687-4695"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87564983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks 基于神经张量融合网络的属性增强人脸识别
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.404
Guosheng Hu, Yang Hua, Yang Yuan, Zhihong Zhang, Zheng Lu, S. Mukherjee, Timothy M. Hospedales, N. Robertson, Yongxin Yang
Deep learning has achieved great success in face recognition, however deep-learned features still have limited invariance to strong intra-personal variations such as large pose changes. It is observed that some facial attributes (e.g. eyebrow thickness, gender) are robust to such variations. We present the first work to systematically explore how the fusion of face recognition features (FRF) and facial attribute features (FAF) can enhance face recognition performance in various challenging scenarios. Despite the promise of FAF, we find that in practice existing fusion methods fail to leverage FAF to boost face recognition performance in some challenging scenarios. Thus, we develop a powerful tensor-based framework which formulates feature fusion as a tensor optimisation problem. It is nontrivial to directly optimise this tensor due to the large number of parameters to optimise. To solve this problem, we establish a theoretical equivalence between low-rank tensor optimisation and a two-stream gated neural network. This equivalence allows tractable learning using standard neural network optimisation tools, leading to accurate and stable optimisation. Experimental results show the fused feature works better than individual features, thus proving for the first time that facial attributes aid face recognition. We achieve state-of-the-art performance on three popular databases: MultiPIE (cross pose, lighting and expression), CASIA NIR-VIS2.0 (cross-modality environment) and LFW (uncontrolled environment).
深度学习在人脸识别方面取得了很大的成功,但是深度学习的特征对强烈的个人内部变化(如大的姿势变化)的不变性仍然有限。可以观察到,某些面部属性(如眉毛粗细、性别)对这种变化具有鲁棒性。我们首次系统地探讨了人脸识别特征(FRF)和人脸属性特征(FAF)的融合如何在各种具有挑战性的场景中提高人脸识别性能。尽管FAF很有前景,但我们发现在实践中,现有的融合方法无法在一些具有挑战性的场景中利用FAF来提高人脸识别性能。因此,我们开发了一个强大的基于张量的框架,它将特征融合表述为张量优化问题。由于需要优化的参数很多,直接优化这个张量是不平凡的。为了解决这个问题,我们建立了低秩张量优化和双流门控神经网络之间的理论等价。这种等效性允许使用标准神经网络优化工具进行可处理的学习,从而实现准确和稳定的优化。实验结果表明,融合特征比单个特征效果更好,首次证明了人脸属性对人脸识别的辅助作用。我们在三个流行的数据库上实现了最先进的性能:MultiPIE(交叉姿势,灯光和表情),CASIA NIR-VIS2.0(交叉模态环境)和LFW(非受控环境)。
{"title":"Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks","authors":"Guosheng Hu, Yang Hua, Yang Yuan, Zhihong Zhang, Zheng Lu, S. Mukherjee, Timothy M. Hospedales, N. Robertson, Yongxin Yang","doi":"10.1109/ICCV.2017.404","DOIUrl":"https://doi.org/10.1109/ICCV.2017.404","url":null,"abstract":"Deep learning has achieved great success in face recognition, however deep-learned features still have limited invariance to strong intra-personal variations such as large pose changes. It is observed that some facial attributes (e.g. eyebrow thickness, gender) are robust to such variations. We present the first work to systematically explore how the fusion of face recognition features (FRF) and facial attribute features (FAF) can enhance face recognition performance in various challenging scenarios. Despite the promise of FAF, we find that in practice existing fusion methods fail to leverage FAF to boost face recognition performance in some challenging scenarios. Thus, we develop a powerful tensor-based framework which formulates feature fusion as a tensor optimisation problem. It is nontrivial to directly optimise this tensor due to the large number of parameters to optimise. To solve this problem, we establish a theoretical equivalence between low-rank tensor optimisation and a two-stream gated neural network. This equivalence allows tractable learning using standard neural network optimisation tools, leading to accurate and stable optimisation. Experimental results show the fused feature works better than individual features, thus proving for the first time that facial attributes aid face recognition. We achieve state-of-the-art performance on three popular databases: MultiPIE (cross pose, lighting and expression), CASIA NIR-VIS2.0 (cross-modality environment) and LFW (uncontrolled environment).","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"31 1","pages":"3764-3773"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85155509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Rolling Shutter Correction in Manhattan World 曼哈顿世界的卷帘门修正
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.101
Pulak Purkait, C. Zach, A. Leonardis
A vast majority of consumer cameras operate the rolling shutter mechanism, which often produces distorted images due to inter-row delay while capturing an image. Recent methods for monocular rolling shutter compensation utilize blur kernel, straightness of line segments, as well as angle and length preservation. However, they do not incorporate scene geometry explicitly for rolling shutter correction, therefore, information about the 3D scene geometry is often distorted by the correction process. In this paper we propose a novel method which leverages geometric properties of the scene—in particular vanishing directions—to estimate the camera motion during rolling shutter exposure from a single distorted image. The proposed method jointly estimates the orthogonal vanishing directions and the rolling shutter camera motion. We performed extensive experiments on synthetic and real datasets which demonstrate the benefits of our approach both in terms of qualitative and quantitative results (in terms of a geometric structure fitting) as well as with respect to computation time.
绝大多数消费类相机使用的是卷帘式快门机制,在拍摄图像时,由于行间延迟,通常会产生失真的图像。最近的单眼卷帘门补偿方法利用模糊核、线段直线度以及角度和长度保留。然而,它们没有明确地将场景几何形状合并到滚动快门校正中,因此,关于3D场景几何形状的信息经常在校正过程中被扭曲。在本文中,我们提出了一种新的方法,该方法利用场景的几何特性-特别是消失方向-来估计在滚动快门曝光期间从单个扭曲图像中的相机运动。该方法联合估计正交消失方向和滚动快门相机运动。我们在合成和真实数据集上进行了广泛的实验,这些实验证明了我们的方法在定性和定量结果(在几何结构拟合方面)以及计算时间方面的好处。
{"title":"Rolling Shutter Correction in Manhattan World","authors":"Pulak Purkait, C. Zach, A. Leonardis","doi":"10.1109/ICCV.2017.101","DOIUrl":"https://doi.org/10.1109/ICCV.2017.101","url":null,"abstract":"A vast majority of consumer cameras operate the rolling shutter mechanism, which often produces distorted images due to inter-row delay while capturing an image. Recent methods for monocular rolling shutter compensation utilize blur kernel, straightness of line segments, as well as angle and length preservation. However, they do not incorporate scene geometry explicitly for rolling shutter correction, therefore, information about the 3D scene geometry is often distorted by the correction process. In this paper we propose a novel method which leverages geometric properties of the scene—in particular vanishing directions—to estimate the camera motion during rolling shutter exposure from a single distorted image. The proposed method jointly estimates the orthogonal vanishing directions and the rolling shutter camera motion. We performed extensive experiments on synthetic and real datasets which demonstrate the benefits of our approach both in terms of qualitative and quantitative results (in terms of a geometric structure fitting) as well as with respect to computation time.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"53 1","pages":"882-890"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83098527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Visual Odometry for Pixel Processor Arrays 像素处理器阵列的视觉里程计
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.493
Laurie Bose, Jianing Chen, S. Carey, P. Dudek, W. Mayol-Cuevas
We present an approach of estimating constrained egomotion on a Pixel Processor Array (PPA). These devices embed processing and data storage capability into the pixels of the image sensor, allowing for fast and low power parallel computation directly on the image-plane. Rather than the standard visual pipeline whereby whole images are transferred to an external general processing unit, our approach performs all computation upon the PPA itself, with the camera's estimated motion as the only information output. Our approach estimates 3D rotation and a 1D scale-less estimate of translation. We introduce methods of image scaling, rotation and alignment which are performed solely upon the PPA itself and form the basis for conducting motion estimation. We demonstrate the algorithms on a SCAMP-5 vision chip, achieving frame rates >1000Hz at ~2W power consumption.
提出了一种在像素处理器阵列(PPA)上估计约束自运动的方法。这些设备将处理和数据存储能力嵌入到图像传感器的像素中,允许直接在图像平面上进行快速和低功耗的并行计算。与将整个图像传输到外部通用处理单元的标准视觉管道不同,我们的方法在PPA本身上执行所有计算,相机的估计运动作为唯一的信息输出。我们的方法估计三维旋转和一维无尺度估计平移。我们介绍了图像缩放、旋转和对齐的方法,这些方法仅在PPA本身上执行,并形成了进行运动估计的基础。我们在SCAMP-5视觉芯片上演示了该算法,在~2W功耗下实现了>1000Hz的帧率。
{"title":"Visual Odometry for Pixel Processor Arrays","authors":"Laurie Bose, Jianing Chen, S. Carey, P. Dudek, W. Mayol-Cuevas","doi":"10.1109/ICCV.2017.493","DOIUrl":"https://doi.org/10.1109/ICCV.2017.493","url":null,"abstract":"We present an approach of estimating constrained egomotion on a Pixel Processor Array (PPA). These devices embed processing and data storage capability into the pixels of the image sensor, allowing for fast and low power parallel computation directly on the image-plane. Rather than the standard visual pipeline whereby whole images are transferred to an external general processing unit, our approach performs all computation upon the PPA itself, with the camera's estimated motion as the only information output. Our approach estimates 3D rotation and a 1D scale-less estimate of translation. We introduce methods of image scaling, rotation and alignment which are performed solely upon the PPA itself and form the basis for conducting motion estimation. We demonstrate the algorithms on a SCAMP-5 vision chip, achieving frame rates >1000Hz at ~2W power consumption.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"99 1","pages":"4614-4622"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77212750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Joint Learning of Object and Action Detectors 对象和动作检测器的联合学习
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.219
Vicky S. Kalogeiton, Philippe Weinzaepfel, V. Ferrari, C. Schmid
While most existing approaches for detection in videos focus on objects or human actions separately, we aim at jointly detecting objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting objects-actions in videos, and show that both tasks of object and action detection benefit from this joint learning. Moreover, the proposed architecture can be used for zero-shot learning of actions: our multitask objective leverages the commonalities of an action performed by different objects, e.g. dog and cat jumping, enabling to detect actions of an object without training with these object-actions pairs. In experiments on the A2D dataset [50], we obtain state-of-the-art results on segmentation of object-action pairs. We finally apply our multitask architecture to detect visual relationships between objects in images of the VRD dataset [24].
虽然大多数现有的视频检测方法分别关注物体或人类动作,但我们的目标是共同检测执行动作的物体,例如猫吃或狗跳。我们引入了一个端到端的多任务目标,共同学习对象-动作关系。我们将其与不同的训练目标进行了比较,验证了其在视频中检测物体动作的有效性,并表明物体和动作检测任务都受益于这种联合学习。此外,所提出的架构可用于动作的零射击学习:我们的多任务目标利用不同对象执行的动作的共性,例如狗和猫跳跃,使检测对象的动作无需训练这些对象-动作对。在A2D数据集上的实验[50]中,我们获得了最先进的对象-动作对分割结果。最后,我们应用我们的多任务架构来检测VRD数据集图像中物体之间的视觉关系[24]。
{"title":"Joint Learning of Object and Action Detectors","authors":"Vicky S. Kalogeiton, Philippe Weinzaepfel, V. Ferrari, C. Schmid","doi":"10.1109/ICCV.2017.219","DOIUrl":"https://doi.org/10.1109/ICCV.2017.219","url":null,"abstract":"While most existing approaches for detection in videos focus on objects or human actions separately, we aim at jointly detecting objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting objects-actions in videos, and show that both tasks of object and action detection benefit from this joint learning. Moreover, the proposed architecture can be used for zero-shot learning of actions: our multitask objective leverages the commonalities of an action performed by different objects, e.g. dog and cat jumping, enabling to detect actions of an object without training with these object-actions pairs. In experiments on the A2D dataset [50], we obtain state-of-the-art results on segmentation of object-action pairs. We finally apply our multitask architecture to detect visual relationships between objects in images of the VRD dataset [24].","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"36 1","pages":"2001-2010"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85857365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Sketching with Style: Visual Search with Sketches and Aesthetic Context 与风格素描:视觉搜索与草图和美学背景
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.290
J. Collomosse, Tu Bui, Michael J. Wilber, Chen Fang, Hailin Jin
We propose a novel measure of visual similarity for image retrieval that incorporates both structural and aesthetic (style) constraints. Our algorithm accepts a query as sketched shape, and a set of one or more contextual images specifying the desired visual aesthetic. A triplet network is used to learn a feature embedding capable of measuring style similarity independent of structure, delivering significant gains over previous networks for style discrimination. We incorporate this model within a hierarchical triplet network to unify and learn a joint space from two discriminatively trained streams for style and structure. We demonstrate that this space enables, for the first time, styleconstrained sketch search over a diverse domain of digital artwork comprising graphics, paintings and drawings. We also briefly explore alternative query modalities.
我们提出了一种新的图像检索视觉相似性测量方法,该方法结合了结构和美学(风格)约束。我们的算法接受一个作为草图形状的查询,以及一组指定所需视觉美感的一个或多个上下文图像。使用三元网络学习能够独立于结构测量风格相似性的特征嵌入,在风格识别方面比以前的网络有显著的提高。我们将该模型整合到一个分层三重网络中,以统一和学习两个判别训练流的联合空间,以获得风格和结构。我们首次展示了这个空间能够在包括图形、绘画和素描在内的数字艺术作品的不同领域中进行风格限制的素描搜索。我们还简要探讨了其他查询方式。
{"title":"Sketching with Style: Visual Search with Sketches and Aesthetic Context","authors":"J. Collomosse, Tu Bui, Michael J. Wilber, Chen Fang, Hailin Jin","doi":"10.1109/ICCV.2017.290","DOIUrl":"https://doi.org/10.1109/ICCV.2017.290","url":null,"abstract":"We propose a novel measure of visual similarity for image retrieval that incorporates both structural and aesthetic (style) constraints. Our algorithm accepts a query as sketched shape, and a set of one or more contextual images specifying the desired visual aesthetic. A triplet network is used to learn a feature embedding capable of measuring style similarity independent of structure, delivering significant gains over previous networks for style discrimination. We incorporate this model within a hierarchical triplet network to unify and learn a joint space from two discriminatively trained streams for style and structure. We demonstrate that this space enables, for the first time, styleconstrained sketch search over a diverse domain of digital artwork comprising graphics, paintings and drawings. We also briefly explore alternative query modalities.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"8 1","pages":"2679-2687"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84133489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval 基于细粒度草图的图像检索的深度空间语义关注
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.592
Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales
Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.
人类素描在捕捉视觉对象的空间拓扑结构以及其微妙的外观细节方面是独一无二的。基于细粒度草图的图像检索(FG-SBIR)重要地利用了草图的这种细粒度特征来执行照片的实例级检索。然而,人类草图往往是高度抽象和标志性的,导致与候选照片严重错位,这反过来又使微妙的视觉细节匹配困难。现有的FG-SBIR方法仅侧重于通过深度跨域表示学习进行粗整体匹配,而忽略了对细粒度细节及其空间上下文的明确考虑。本文提出了一种新的深度FG-SBIR模型,该模型与现有模型有很大的不同:(1)通过引入对视觉细节空间位置敏感的注意模块实现空间感知;(2)通过一个快捷连接融合块将粗、精语义信息结合起来;(3)通过引入一种新的基于高阶可学习能量函数(HOLEF)的损失,对特征相关性进行建模,并对两个域中提取的特征之间的不对准具有鲁棒性。大量的实验表明,所提出的深度空间语义注意模型明显优于现有的模型。
{"title":"Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval","authors":"Jifei Song, Qian Yu, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales","doi":"10.1109/ICCV.2017.592","DOIUrl":"https://doi.org/10.1109/ICCV.2017.592","url":null,"abstract":"Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details: (2) It combines coarse and fine semantic information via a shortcut connection fusion block: and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"5552-5561"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89354597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 196
Corner-Based Geometric Calibration of Multi-focus Plenoptic Cameras 基于角点的多焦全光相机几何定标
Pub Date : 2017-12-25 DOI: 10.1109/ICCV.2017.109
Sotiris Nousias, F. Chadebecq, Jonas Pichat, P. Keane, S. Ourselin, C. Bergeles
We propose a method for geometric calibration of multifocus plenoptic cameras using raw images. Multi-focus plenoptic cameras feature several types of micro-lenses spatially aligned in front of the camera sensor to generate micro-images at different magnifications. This multi-lens arrangement provides computational-photography benefits but complicates calibration. Our methodology achieves the detection of the type of micro-lenses, the retrieval of their spatial arrangement, and the estimation of intrinsic and extrinsic camera parameters therefore fully characterising this specialised camera class. Motivated from classic pinhole camera calibration, our algorithm operates on a checker-board’s corners, retrieved by a custom microimage corner detector. This approach enables the introduction of a reprojection error that is used in a minimisation framework. Our algorithm compares favourably to the state-of-the-art, as demonstrated by controlled and freehand experiments, making it a first step towards accurate 3D reconstruction and Structure-from-Motion.
提出了一种基于原始图像的多焦全光相机几何定标方法。多焦全光学相机的特点是在相机传感器前空间排列几种微镜头,以产生不同放大倍数的微图像。这种多镜头排列提供了计算摄影的好处,但使校准复杂化。我们的方法实现了微透镜类型的检测,它们的空间排列的检索,以及内在和外在相机参数的估计,因此充分表征了这种专业相机类。受经典针孔相机校准的启发,我们的算法在棋盘的角上运行,由自定义的微图像角检测器检索。这种方法可以引入在最小化框架中使用的重投影误差。我们的算法与最先进的算法相比,如控制和徒手实验所示,使其成为迈向精确3D重建和运动结构的第一步。
{"title":"Corner-Based Geometric Calibration of Multi-focus Plenoptic Cameras","authors":"Sotiris Nousias, F. Chadebecq, Jonas Pichat, P. Keane, S. Ourselin, C. Bergeles","doi":"10.1109/ICCV.2017.109","DOIUrl":"https://doi.org/10.1109/ICCV.2017.109","url":null,"abstract":"We propose a method for geometric calibration of multifocus plenoptic cameras using raw images. Multi-focus plenoptic cameras feature several types of micro-lenses spatially aligned in front of the camera sensor to generate micro-images at different magnifications. This multi-lens arrangement provides computational-photography benefits but complicates calibration. Our methodology achieves the detection of the type of micro-lenses, the retrieval of their spatial arrangement, and the estimation of intrinsic and extrinsic camera parameters therefore fully characterising this specialised camera class. Motivated from classic pinhole camera calibration, our algorithm operates on a checker-board’s corners, retrieved by a custom microimage corner detector. This approach enables the introduction of a reprojection error that is used in a minimisation framework. Our algorithm compares favourably to the state-of-the-art, as demonstrated by controlled and freehand experiments, making it a first step towards accurate 3D reconstruction and Structure-from-Motion.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"957-965"},"PeriodicalIF":0.0,"publicationDate":"2017-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89027106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Learning Action Recognition Model from Depth and Skeleton Videos 从深度和骨架视频学习动作识别模型
Pub Date : 2017-12-22 DOI: 10.1109/ICCV.2017.621
H. Rahmani, Bennamoun
Depth sensors open up possibilities of dealing with the human action recognition problem by providing 3D human skeleton data and depth images of the scene. Analysis of human actions based on 3D skeleton data has become popular recently, due to its robustness and view-invariant representation. However, the skeleton alone is insufficient to distinguish actions which involve human-object interactions. In this paper, we propose a deep model which efficiently models human-object interactions and intra-class variations under viewpoint changes. First, a human body-part model is introduced to transfer the depth appearances of body-parts to a shared view-invariant space. Second, an end-to-end learning framework is proposed which is able to effectively combine the view-invariant body-part representation from skeletal and depth images, and learn the relations between the human body-parts and the environmental objects, the interactions between different human body-parts, and the temporal structure of human actions. We have evaluated the performance of our proposed model against 15 existing techniques on two large benchmark human action recognition datasets including NTU RGB+D and UWA3DII. The Experimental results show that our technique provides a significant improvement over state-of-the-art methods.
深度传感器通过提供3D人体骨骼数据和场景深度图像,为处理人体动作识别问题提供了可能性。基于三维骨骼数据的人体动作分析由于其鲁棒性和视图不变性而成为近年来流行的一种方法。然而,骨骼本身不足以区分涉及人机交互的动作。在本文中,我们提出了一个深度模型,该模型可以有效地模拟人与对象之间的相互作用和视点变化下的类内变化。首先,引入人体部位模型,将人体部位的深度外观转移到共享的视图不变空间中;其次,提出了一种端到端学习框架,该框架能够有效地将骨骼图像和深度图像的视图不变的身体部位表示结合起来,学习人体部位与环境对象的关系、人体不同部位之间的相互作用以及人体动作的时间结构。我们在两个大型基准人类动作识别数据集(包括NTU RGB+D和UWA3DII)上对我们提出的模型与15种现有技术的性能进行了评估。实验结果表明,我们的技术比目前最先进的方法有了显著的改进。
{"title":"Learning Action Recognition Model from Depth and Skeleton Videos","authors":"H. Rahmani, Bennamoun","doi":"10.1109/ICCV.2017.621","DOIUrl":"https://doi.org/10.1109/ICCV.2017.621","url":null,"abstract":"Depth sensors open up possibilities of dealing with the human action recognition problem by providing 3D human skeleton data and depth images of the scene. Analysis of human actions based on 3D skeleton data has become popular recently, due to its robustness and view-invariant representation. However, the skeleton alone is insufficient to distinguish actions which involve human-object interactions. In this paper, we propose a deep model which efficiently models human-object interactions and intra-class variations under viewpoint changes. First, a human body-part model is introduced to transfer the depth appearances of body-parts to a shared view-invariant space. Second, an end-to-end learning framework is proposed which is able to effectively combine the view-invariant body-part representation from skeletal and depth images, and learn the relations between the human body-parts and the environmental objects, the interactions between different human body-parts, and the temporal structure of human actions. We have evaluated the performance of our proposed model against 15 existing techniques on two large benchmark human action recognition datasets including NTU RGB+D and UWA3DII. The Experimental results show that our technique provides a significant improvement over state-of-the-art methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"5833-5842"},"PeriodicalIF":0.0,"publicationDate":"2017-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80713334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1