首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Exploiting Spatial Structure for Localizing Manipulated Image Regions 利用空间结构定位被操纵图像区域
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.532
Jawadul H. Bappy, A. Roy-Chowdhury, Jason Bunk, L. Nataraj, B. S. Manjunath
The advent of high-tech journaling tools facilitates an image to be manipulated in a way that can easily evade state-of-the-art image tampering detection approaches. The recent success of the deep learning approaches in different recognition tasks inspires us to develop a high confidence detection framework which can localize manipulated regions in an image. Unlike semantic object segmentation where all meaningful regions (objects) are segmented, the localization of image manipulation focuses only the possible tampered region which makes the problem even more challenging. In order to formulate the framework, we employ a hybrid CNN-LSTM model to capture discriminative features between manipulated and non-manipulated regions. One of the key properties of manipulated regions is that they exhibit discriminative features in boundaries shared with neighboring non-manipulated pixels. Our motivation is to learn the boundary discrepancy, i.e., the spatial structure, between manipulated and non-manipulated regions with the combination of LSTM and convolution layers. We perform end-to-end training of the network to learn the parameters through back-propagation given ground-truth mask information. The overall framework is capable of detecting different types of image manipulations, including copy-move, removal and splicing. Our model shows promising results in localizing manipulated regions, which is demonstrated through rigorous experimentation on three diverse datasets.
高科技日志工具的出现促进了图像被操纵的方式,可以很容易地逃避最先进的图像篡改检测方法。最近深度学习方法在不同识别任务中的成功激发了我们开发一种高置信度的检测框架,该框架可以定位图像中的被操纵区域。与语义对象分割不同,图像处理的定位只关注可能被篡改的区域,这使得问题更具挑战性。为了构建框架,我们采用了一种混合CNN-LSTM模型来捕获被操纵区域和非被操纵区域之间的判别特征。操纵区域的关键特性之一是它们在与相邻非操纵像素共享的边界上表现出区别性特征。我们的动机是通过LSTM和卷积层的结合来学习被操纵区域和非被操纵区域之间的边界差异,即空间结构。我们对网络进行端到端训练,通过给定真值掩码信息的反向传播来学习参数。整个框架能够检测不同类型的图像操作,包括复制-移动,移除和拼接。我们的模型在定位被操纵区域方面显示出有希望的结果,这是通过在三个不同数据集上的严格实验证明的。
{"title":"Exploiting Spatial Structure for Localizing Manipulated Image Regions","authors":"Jawadul H. Bappy, A. Roy-Chowdhury, Jason Bunk, L. Nataraj, B. S. Manjunath","doi":"10.1109/ICCV.2017.532","DOIUrl":"https://doi.org/10.1109/ICCV.2017.532","url":null,"abstract":"The advent of high-tech journaling tools facilitates an image to be manipulated in a way that can easily evade state-of-the-art image tampering detection approaches. The recent success of the deep learning approaches in different recognition tasks inspires us to develop a high confidence detection framework which can localize manipulated regions in an image. Unlike semantic object segmentation where all meaningful regions (objects) are segmented, the localization of image manipulation focuses only the possible tampered region which makes the problem even more challenging. In order to formulate the framework, we employ a hybrid CNN-LSTM model to capture discriminative features between manipulated and non-manipulated regions. One of the key properties of manipulated regions is that they exhibit discriminative features in boundaries shared with neighboring non-manipulated pixels. Our motivation is to learn the boundary discrepancy, i.e., the spatial structure, between manipulated and non-manipulated regions with the combination of LSTM and convolution layers. We perform end-to-end training of the network to learn the parameters through back-propagation given ground-truth mask information. The overall framework is capable of detecting different types of image manipulations, including copy-move, removal and splicing. Our model shows promising results in localizing manipulated regions, which is demonstrated through rigorous experimentation on three diverse datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"101 1","pages":"4980-4989"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87005583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 179
Delving into Salient Object Subitizing and Detection 突出对象的细分与检测研究
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.120
Shengfeng He, Jianbo Jiao, Xiaodan Zhang, Guoqiang Han, Rynson W. H. Lau
Subitizing (i.e., instant judgement on the number) and detection of salient objects are human inborn abilities. These two tasks influence each other in the human visual system. In this paper, we delve into the complementarity of these two tasks. We propose a multi-task deep neural network with weight prediction for salient object detection, where the parameters of an adaptive weight layer are dynamically determined by an auxiliary subitizing network. The numerical representation of salient objects is therefore embedded into the spatial representation. The proposed joint network can be trained end-to-end using backpropagation. Experiments show the proposed multi-task network outperforms existing multi-task architectures, and the auxiliary subitizing network provides strong guidance to salient object detection by reducing false positives and producing coherent saliency maps. Moreover, the proposed method is an unconstrained method able to handle images with/without salient objects. Finally, we show state-of-theart performance on different salient object datasets.
主观化(即对数字的即时判断)和发现显著物体是人类与生俱来的能力。这两项任务在人类视觉系统中相互影响。在本文中,我们深入研究了这两个任务的互补性。提出了一种具有权重预测的多任务深度神经网络用于显著目标检测,其中自适应权重层的参数由辅助子化网络动态确定。因此,突出对象的数字表示被嵌入到空间表示中。该联合网络可以通过反向传播进行端到端训练。实验表明,本文提出的多任务网络优于现有的多任务架构,并且辅助细分网络通过减少误报和生成连贯的显著性图,为显著性目标检测提供了强有力的指导。此外,该方法是一种无约束的方法,能够处理具有/不具有显著目标的图像。最后,我们展示了在不同显著对象数据集上的最新性能。
{"title":"Delving into Salient Object Subitizing and Detection","authors":"Shengfeng He, Jianbo Jiao, Xiaodan Zhang, Guoqiang Han, Rynson W. H. Lau","doi":"10.1109/ICCV.2017.120","DOIUrl":"https://doi.org/10.1109/ICCV.2017.120","url":null,"abstract":"Subitizing (i.e., instant judgement on the number) and detection of salient objects are human inborn abilities. These two tasks influence each other in the human visual system. In this paper, we delve into the complementarity of these two tasks. We propose a multi-task deep neural network with weight prediction for salient object detection, where the parameters of an adaptive weight layer are dynamically determined by an auxiliary subitizing network. The numerical representation of salient objects is therefore embedded into the spatial representation. The proposed joint network can be trained end-to-end using backpropagation. Experiments show the proposed multi-task network outperforms existing multi-task architectures, and the auxiliary subitizing network provides strong guidance to salient object detection by reducing false positives and producing coherent saliency maps. Moreover, the proposed method is an unconstrained method able to handle images with/without salient objects. Finally, we show state-of-theart performance on different salient object datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"1059-1067"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91115782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
A Microfacet-Based Reflectance Model for Photometric Stereo with Highly Specular Surfaces 基于微面的高镜面光度计立体反射模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.343
Lixiong Chen, Yinqiang Zheng, Boxin Shi, Art Subpa-Asa, Imari Sato
A precise, stable and invertible model for surface reflectance is the key to the success of photometric stereo with real world materials. Recent developments in the field have enabled shape recovery techniques for surfaces of various types, but an effective solution to directly estimating the surface normal in the presence of highly specular reflectance remains elusive. In this paper, we derive an analytical isotropic microfacet-based reflectance model, based on which a physically interpretable approximate is tailored for highly specular surfaces. With this approximate, we identify the equivalence between the surface recovery problem and the ellipsoid of revolution fitting problem, where the latter can be described as a system of polynomials. Additionally, we devise a fast, non-iterative and globally optimal solver for this problem. Experimental results on both synthetic and real images validate our model and demonstrate that our solution can stably deliver superior performance in its targeted application domain.
一个精确、稳定和可逆的表面反射率模型是利用真实材料进行光度立体成像成功的关键。该领域的最新发展使各种类型表面的形状恢复技术成为可能,但在高镜面反射存在的情况下,直接估计表面法线的有效解决方案仍然难以捉摸。在本文中,我们推导了一个基于解析各向同性微面的反射模型,在此基础上,为高镜面定制了一个物理可解释的近似。通过这种近似,我们确定了表面恢复问题与椭球旋转拟合问题之间的等价性,其中后者可以被描述为多项式系统。此外,我们还设计了一个快速、非迭代、全局最优的求解器。在合成图像和真实图像上的实验结果验证了我们的模型,并表明我们的解决方案可以在目标应用领域稳定地提供优越的性能。
{"title":"A Microfacet-Based Reflectance Model for Photometric Stereo with Highly Specular Surfaces","authors":"Lixiong Chen, Yinqiang Zheng, Boxin Shi, Art Subpa-Asa, Imari Sato","doi":"10.1109/ICCV.2017.343","DOIUrl":"https://doi.org/10.1109/ICCV.2017.343","url":null,"abstract":"A precise, stable and invertible model for surface reflectance is the key to the success of photometric stereo with real world materials. Recent developments in the field have enabled shape recovery techniques for surfaces of various types, but an effective solution to directly estimating the surface normal in the presence of highly specular reflectance remains elusive. In this paper, we derive an analytical isotropic microfacet-based reflectance model, based on which a physically interpretable approximate is tailored for highly specular surfaces. With this approximate, we identify the equivalence between the surface recovery problem and the ellipsoid of revolution fitting problem, where the latter can be described as a system of polynomials. Additionally, we devise a fast, non-iterative and globally optimal solver for this problem. Experimental results on both synthetic and real images validate our model and demonstrate that our solution can stably deliver superior performance in its targeted application domain.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"96 1","pages":"3181-3189"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85862376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Going Unconstrained with Rolling Shutter Deblurring 不受约束地使用滚动快门去模糊
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.432
R. MaheshMohanM., A. Rajagopalan
Most present-day imaging devices are equipped with CMOS sensors. Motion blur is a common artifact in handheld cameras. Because CMOS sensors mostly employ a rolling shutter (RS), the motion deblurring problem takes on a new dimension. Although few works have recently addressed this problem, they suffer from many constraints including heavy computational cost, need for precise sensor information, and inability to deal with wide-angle systems (which most cell-phone and drone cameras are) and irregular camera trajectory. In this work, we propose a model for RS blind motion deblurring that mitigates these issues significantly. Comprehensive comparisons with state-of-the-art methods reveal that our approach not only exhibits significant computational gains and unconstrained functionality but also leads to improved deblurring performance.
目前大多数成像设备都配备了CMOS传感器。运动模糊是手持相机中常见的人工制品。由于CMOS传感器大多采用滚动快门(RS),运动去模糊问题就有了一个新的维度。尽管最近解决这个问题的工作很少,但它们受到许多限制,包括沉重的计算成本,需要精确的传感器信息,无法处理广角系统(大多数手机和无人机相机都是)和不规则的相机轨迹。在这项工作中,我们提出了一个RS盲运动去模糊模型,显著减轻了这些问题。与最先进的方法的综合比较表明,我们的方法不仅表现出显着的计算增益和不受约束的功能,而且还导致改进的去模糊性能。
{"title":"Going Unconstrained with Rolling Shutter Deblurring","authors":"R. MaheshMohanM., A. Rajagopalan","doi":"10.1109/ICCV.2017.432","DOIUrl":"https://doi.org/10.1109/ICCV.2017.432","url":null,"abstract":"Most present-day imaging devices are equipped with CMOS sensors. Motion blur is a common artifact in handheld cameras. Because CMOS sensors mostly employ a rolling shutter (RS), the motion deblurring problem takes on a new dimension. Although few works have recently addressed this problem, they suffer from many constraints including heavy computational cost, need for precise sensor information, and inability to deal with wide-angle systems (which most cell-phone and drone cameras are) and irregular camera trajectory. In this work, we propose a model for RS blind motion deblurring that mitigates these issues significantly. Comprehensive comparisons with state-of-the-art methods reveal that our approach not only exhibits significant computational gains and unconstrained functionality but also leads to improved deblurring performance.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"12 1","pages":"4030-4038"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86680476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
FLaME: Fast Lightweight Mesh Estimation Using Variational Smoothing on Delaunay Graphs FLaME:基于Delaunay图变分平滑的快速轻量级网格估计
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.502
W. N. Greene, N. Roy
We propose a lightweight method for dense online monocular depth estimation capable of reconstructing 3D meshes on computationally constrained platforms. Our main contribution is to pose the reconstruction problem as a non-local variational optimization over a time-varying Delaunay graph of the scene geometry, which allows for an efficient, keyframeless approach to depth estimation. The graph can be tuned to favor reconstruction quality or speed and is continuously smoothed and augmented as the camera explores the scene. Unlike keyframe-based approaches, the optimized surface is always available at the current pose, which is necessary for low-latency obstacle avoidance. FLaME (Fast Lightweight Mesh Estimation) can generate mesh reconstructions at upwards of 230 Hz using less than one Intel i7 CPU core, which enables operation on size, weight, and power-constrained platforms. We present results from both benchmark datasets and experiments running FLaME in-the-loop onboard a small flying quadrotor.
我们提出了一种轻量级的密集在线单目深度估计方法,能够在计算受限的平台上重建三维网格。我们的主要贡献是将重建问题作为场景几何的时变Delaunay图上的非局部变分优化,这允许一种有效的,无关键帧的深度估计方法。该图形可以调整以支持重建质量或速度,并随着相机探索场景而不断平滑和增强。与基于关键帧的方法不同,优化的表面总是在当前姿态下可用,这对于低延迟避障是必要的。FLaME(快速轻量级网格估计)可以使用不到一个Intel i7 CPU内核以高达230 Hz的速度生成网格重建,这使得可以在尺寸,重量和功率受限的平台上运行。我们目前的结果从两个基准数据集和实验运行火焰在一个小型飞行四旋翼。
{"title":"FLaME: Fast Lightweight Mesh Estimation Using Variational Smoothing on Delaunay Graphs","authors":"W. N. Greene, N. Roy","doi":"10.1109/ICCV.2017.502","DOIUrl":"https://doi.org/10.1109/ICCV.2017.502","url":null,"abstract":"We propose a lightweight method for dense online monocular depth estimation capable of reconstructing 3D meshes on computationally constrained platforms. Our main contribution is to pose the reconstruction problem as a non-local variational optimization over a time-varying Delaunay graph of the scene geometry, which allows for an efficient, keyframeless approach to depth estimation. The graph can be tuned to favor reconstruction quality or speed and is continuously smoothed and augmented as the camera explores the scene. Unlike keyframe-based approaches, the optimized surface is always available at the current pose, which is necessary for low-latency obstacle avoidance. FLaME (Fast Lightweight Mesh Estimation) can generate mesh reconstructions at upwards of 230 Hz using less than one Intel i7 CPU core, which enables operation on size, weight, and power-constrained platforms. We present results from both benchmark datasets and experiments running FLaME in-the-loop onboard a small flying quadrotor.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"312 1","pages":"4696-4704"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85462833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization VegFru:用于细粒度视觉分类的特定领域数据集
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.66
Saihui Hou, Yushan Feng, Zilei Wang
In this paper, we propose a novel domain-specific dataset named VegFru for fine-grained visual categorization (FGVC). While the existing datasets for FGVC are mainly focused on animal breeds or man-made objects with limited labelled data, VegFru is a larger dataset consisting of vegetables and fruits which are closely associated with the daily life of everyone. Aiming at domestic cooking and food management, VegFru categorizes vegetables and fruits according to their eating characteristics, and each image contains at least one edible part of vegetables or fruits with the same cooking usage. Particularly, all the images are labelled hierarchically. The current version covers vegetables and fruits of 25 upper-level categories and 292 subordinate classes. And it contains more than 160,000 images in total and at least 200 images for each subordinate class. Accompanying the dataset, we also propose an effective framework called HybridNet to exploit the label hierarchy for FGVC. Specifically, multiple granularity features are first extracted by dealing with the hierarchical labels separately. And then they are fused through explicit operation, e.g., Compact Bilinear Pooling, to form a unified representation for the ultimate recognition. The experimental results on the novel VegFru, the public FGVC-Aircraft and CUB-200-2011 indicate that HybridNet achieves one of the top performance on these datasets. The dataset and code are available at https://github.com/ustc-vim/vegfru.
在本文中,我们提出了一个新的领域特定数据集VegFru用于细粒度视觉分类(FGVC)。现有的FGVC数据集主要集中在动物品种或人造物体上,标签数据有限,而VegFru是一个更大的数据集,包括与每个人的日常生活密切相关的蔬菜和水果。VegFru以家庭烹饪和食品管理为目标,根据蔬菜和水果的食用特征进行分类,每张图片至少包含一个烹饪方法相同的蔬菜或水果的可食用部分。特别地,所有的图像都是分层标记的。目前的版本涵盖了蔬菜和水果的25个上级类和292个下级类。它总共包含超过16万张图片,每个从属类至少包含200张图片。伴随着数据集,我们还提出了一个称为HybridNet的有效框架来利用FGVC的标签层次结构。具体来说,首先通过分别处理分层标签来提取多个粒度特征。然后通过显式运算(如Compact Bilinear Pooling)将它们融合,形成一个统一的表示,用于最终识别。在新型VegFru、公共FGVC-Aircraft和CUB-200-2011上的实验结果表明,HybridNet在这些数据集上取得了最好的性能之一。数据集和代码可在https://github.com/ustc-vim/vegfru上获得。
{"title":"VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization","authors":"Saihui Hou, Yushan Feng, Zilei Wang","doi":"10.1109/ICCV.2017.66","DOIUrl":"https://doi.org/10.1109/ICCV.2017.66","url":null,"abstract":"In this paper, we propose a novel domain-specific dataset named VegFru for fine-grained visual categorization (FGVC). While the existing datasets for FGVC are mainly focused on animal breeds or man-made objects with limited labelled data, VegFru is a larger dataset consisting of vegetables and fruits which are closely associated with the daily life of everyone. Aiming at domestic cooking and food management, VegFru categorizes vegetables and fruits according to their eating characteristics, and each image contains at least one edible part of vegetables or fruits with the same cooking usage. Particularly, all the images are labelled hierarchically. The current version covers vegetables and fruits of 25 upper-level categories and 292 subordinate classes. And it contains more than 160,000 images in total and at least 200 images for each subordinate class. Accompanying the dataset, we also propose an effective framework called HybridNet to exploit the label hierarchy for FGVC. Specifically, multiple granularity features are first extracted by dealing with the hierarchical labels separately. And then they are fused through explicit operation, e.g., Compact Bilinear Pooling, to form a unified representation for the ultimate recognition. The experimental results on the novel VegFru, the public FGVC-Aircraft and CUB-200-2011 indicate that HybridNet achieves one of the top performance on these datasets. The dataset and code are available at https://github.com/ustc-vim/vegfru.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"68 1","pages":"541-549"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84197544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks 伪三维残差网络的时空表征学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.590
Zhaofan Qiu, Ting Yao, Tao Mei
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3 convolutional filters on spatial domain (equivalent to 2D CNN) plus 3 × 1 × 1 convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%, respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques.
卷积神经网络(CNN)被认为是图像识别问题的一个强大的模型类别。然而,当使用CNN来学习时空视频表示时,这并不是微不足道的。一些研究表明,执行3D卷积是捕获视频中空间和时间维度的有益方法。然而,从零开始开发一个非常深的3D CNN会导致昂贵的计算成本和内存需求。一个有效的问题是,为什么不回收现成的2D网络来制作3D CNN。在本文中,我们在残差学习框架中设计了瓶颈构建块的多种变体,通过在空间域上使用1 × 3 × 3卷积滤波器(相当于2D CNN)模拟3 × 3 × 3卷积,再加上3 × 1 × 1卷积,在相邻的特征映射上及时构建时间连接。此外,我们提出了一种名为Pseudo-3D Residual Net (P3D ResNet)的新架构,它利用了所有块的变体,但将每个块组合在ResNet的不同位置,遵循通过深入增强结构多样性可以提高神经网络能力的理念。与3D CNN和基于帧的2D CNN相比,我们的P3D ResNet在Sports-1M视频分类数据集上分别实现了5.3%和1.8%的明显改进。我们进一步研究了我们的预训练P3D ResNet在五个不同的基准和三个不同的任务上产生的视频表示的泛化性能,展示了几种最先进的技术的卓越性能。
{"title":"Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks","authors":"Zhaofan Qiu, Ting Yao, Tao Mei","doi":"10.1109/ICCV.2017.590","DOIUrl":"https://doi.org/10.1109/ICCV.2017.590","url":null,"abstract":"Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3 convolutional filters on spatial domain (equivalent to 2D CNN) plus 3 × 1 × 1 convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%, respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"7 1","pages":"5534-5542"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80574257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1420
DeepCD: Learning Deep Complementary Descriptors for Patch Representations DeepCD:学习补丁表示的深度互补描述符
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.359
Tsun-Yi Yang, Jo-Han Hsu, Yen-Yu Lin, Yung-Yu Chuang
This paper presents the DeepCD framework which learns a pair of complementary descriptors jointly for image patch representation by employing deep learning techniques. It can be achieved by taking any descriptor learning architecture for learning a leading descriptor and augmenting the architecture with an additional network stream for learning a complementary descriptor. To enforce the complementary property, a new network layer, called data-dependent modulation (DDM) layer, is introduced for adaptively learning the augmented network stream with the emphasis on the training data that are not well handled by the leading stream. By optimizing the proposed joint loss function with late fusion, the obtained descriptors are complementary to each other and their fusion improves performance. Experiments on several problems and datasets show that the proposed method1 is simple yet effective, outperforming state-of-the-art methods.
本文提出了一种深度学习框架,该框架利用深度学习技术,共同学习一对互补描述符用于图像patch表示。它可以通过采用任何描述符学习体系结构来学习主要描述符,并使用额外的网络流来扩展该体系结构以学习补充描述符来实现。为了增强互补性,引入了一个新的网络层,称为数据依赖调制(DDM)层,用于自适应学习增强的网络流,重点是对未被领先流处理好的训练数据进行学习。通过对所提出的联合损失函数进行后期融合优化,得到的描述符相互补充,它们的融合提高了性能。在几个问题和数据集上的实验表明,所提出的方法1简单而有效,优于目前最先进的方法。
{"title":"DeepCD: Learning Deep Complementary Descriptors for Patch Representations","authors":"Tsun-Yi Yang, Jo-Han Hsu, Yen-Yu Lin, Yung-Yu Chuang","doi":"10.1109/ICCV.2017.359","DOIUrl":"https://doi.org/10.1109/ICCV.2017.359","url":null,"abstract":"This paper presents the DeepCD framework which learns a pair of complementary descriptors jointly for image patch representation by employing deep learning techniques. It can be achieved by taking any descriptor learning architecture for learning a leading descriptor and augmenting the architecture with an additional network stream for learning a complementary descriptor. To enforce the complementary property, a new network layer, called data-dependent modulation (DDM) layer, is introduced for adaptively learning the augmented network stream with the emphasis on the training data that are not well handled by the leading stream. By optimizing the proposed joint loss function with late fusion, the obtained descriptors are complementary to each other and their fusion improves performance. Experiments on several problems and datasets show that the proposed method1 is simple yet effective, outperforming state-of-the-art methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"52 1","pages":"3334-3342"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81053342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Personalized Image Aesthetics 个性化形象美学
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.76
Jian Ren, Xiaohui Shen, Zhe L. Lin, R. Mech, D. Foran
Automatic image aesthetics rating has received a growing interest with the recent breakthrough in deep learning. Although many studies exist for learning a generic or universal aesthetics model, investigation of aesthetics models incorporating individual user’s preference is quite limited. We address this personalized aesthetics problem by showing that individual’s aesthetic preferences exhibit strong correlations with content and aesthetic attributes, and hence the deviation of individual’s perception from generic image aesthetics is predictable. To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners. We then propose a new approach to personalized aesthetics learning that can be trained even with a small set of annotated images from a user. The approach is based on a residual-based model adaptation scheme which learns an offset to compensate for the generic aesthetics score. Finally, we introduce an active learning algorithm to optimize personalized aesthetics prediction for real-world application scenarios. Experiments demonstrate that our approach can effectively learn personalized aesthetics preferences, and outperforms existing methods on quantitative comparisons.
随着近年来深度学习的突破,自动图像美学评价受到了越来越多的关注。尽管已有许多研究致力于学习通用或通用的美学模型,但对包含个体用户偏好的美学模型的研究却相当有限。我们通过展示个人的审美偏好与内容和审美属性表现出强烈的相关性来解决这一个性化美学问题,因此个人对一般图像美学的感知偏差是可以预测的。为了适应我们的研究,我们首先收集了两个不同的数据集,一个来自Flickr并由Amazon Mechanical Turk注释的大型图像数据集,以及一个由所有者评级的真实个人相册的小数据集。然后,我们提出了一种个性化美学学习的新方法,即使使用一小组来自用户的注释图像也可以进行训练。该方法基于基于残差的模型自适应方案,该方案学习偏移量来补偿通用美学评分。最后,我们引入了一种主动学习算法来优化个性化美学预测,以适应现实应用场景。实验表明,我们的方法可以有效地学习个性化的审美偏好,并且在定量比较方面优于现有的方法。
{"title":"Personalized Image Aesthetics","authors":"Jian Ren, Xiaohui Shen, Zhe L. Lin, R. Mech, D. Foran","doi":"10.1109/ICCV.2017.76","DOIUrl":"https://doi.org/10.1109/ICCV.2017.76","url":null,"abstract":"Automatic image aesthetics rating has received a growing interest with the recent breakthrough in deep learning. Although many studies exist for learning a generic or universal aesthetics model, investigation of aesthetics models incorporating individual user’s preference is quite limited. We address this personalized aesthetics problem by showing that individual’s aesthetic preferences exhibit strong correlations with content and aesthetic attributes, and hence the deviation of individual’s perception from generic image aesthetics is predictable. To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners. We then propose a new approach to personalized aesthetics learning that can be trained even with a small set of annotated images from a user. The approach is based on a residual-based model adaptation scheme which learns an offset to compensate for the generic aesthetics score. Finally, we introduce an active learning algorithm to optimize personalized aesthetics prediction for real-world application scenarios. Experiments demonstrate that our approach can effectively learn personalized aesthetics preferences, and outperforms existing methods on quantitative comparisons.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"638-647"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89508891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Composite Focus Measure for High Quality Depth Maps 用于高质量深度图的复合焦点测量
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.179
P. Sakurikar, P J Narayanan
Depth from focus is a highly accessible method to estimate the 3D structure of everyday scenes. Today’s DSLR and mobile cameras facilitate the easy capture of multiple focused images of a scene. Focus measures (FMs) that estimate the amount of focus at each pixel form the basis of depth-from-focus methods. Several FMs have been proposed in the past and new ones will emerge in the future, each with their own strengths. We estimate a weighted combination of standard FMs that outperforms others on a wide range of scene types. The resulting composite focus measure consists of FMs that are in consensus with one another but not in chorus. Our two-stage pipeline first estimates fine depth at each pixel using the composite focus measure. A cost-volume propagation step then assigns depths from confident pixels to others. We can generate high quality depth maps using just the top five FMs from our composite focus measure. This is a positive step towards depth estimation of everyday scenes with no special equipment.
聚焦深度是估计日常场景三维结构的一种高度可访问的方法。今天的数码单反相机和移动相机可以轻松捕捉场景的多个聚焦图像。焦距测量(FMs)是估计每个像素的焦距的方法的基础。过去已经提出了几种FMs,未来还会出现新的FMs,每种FMs都有自己的优势。我们估计标准FMs的加权组合在广泛的场景类型上优于其他FMs。由此产生的合成对焦测量由相互一致但不一致的FMs组成。我们的两阶段管道首先使用复合焦点测量来估计每个像素的精细深度。然后,成本-体积传播步骤将自信像素的深度分配给其他像素。我们可以仅使用复合对焦测量中的前5个FMs生成高质量的深度图。这是在没有特殊设备的情况下对日常场景进行深度估计的积极步骤。
{"title":"Composite Focus Measure for High Quality Depth Maps","authors":"P. Sakurikar, P J Narayanan","doi":"10.1109/ICCV.2017.179","DOIUrl":"https://doi.org/10.1109/ICCV.2017.179","url":null,"abstract":"Depth from focus is a highly accessible method to estimate the 3D structure of everyday scenes. Today’s DSLR and mobile cameras facilitate the easy capture of multiple focused images of a scene. Focus measures (FMs) that estimate the amount of focus at each pixel form the basis of depth-from-focus methods. Several FMs have been proposed in the past and new ones will emerge in the future, each with their own strengths. We estimate a weighted combination of standard FMs that outperforms others on a wide range of scene types. The resulting composite focus measure consists of FMs that are in consensus with one another but not in chorus. Our two-stage pipeline first estimates fine depth at each pixel using the composite focus measure. A cost-volume propagation step then assigns depths from confident pixels to others. We can generate high quality depth maps using just the top five FMs from our composite focus measure. This is a positive step towards depth estimation of everyday scenes with no special equipment.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"15 2","pages":"1623-1631"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91401826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1