首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos 具有高分辨率图像和多摄像头视频的多视图立体基准
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.272
Thomas Schöps, Johannes L. Schönberger, S. Galliani, Torsten Sattler, K. Schindler, M. Pollefeys, Andreas Geiger
Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task. Towards this goal, we recorded a variety of indoor and outdoor scenes using a high-precision laser scanner and captured both high-resolution DSLR imagery as well as synchronized low-resolution stereo videos with varying fields-of-view. To align the images with the laser scans, we propose a robust technique which minimizes photometric errors conditioned on the geometry. In contrast to previous datasets, our benchmark provides novel challenges and covers a diverse set of viewpoints and scene types, ranging from natural scenes to man-made indoor and outdoor environments. Furthermore, we provide data at significantly higher temporal and spatial resolution. Our benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images. We make our datasets and an online evaluation server available at http://www.eth3d.net.
由于现有多视角立体基准测试的局限性,我们提出了一个新的数据集。为了实现这一目标,我们使用高精度激光扫描仪记录了各种室内和室外场景,并捕获了高分辨率单反图像以及具有不同视场的同步低分辨率立体视频。为了使图像与激光扫描对齐,我们提出了一种鲁棒技术,该技术可以最大限度地减少几何形状的光度误差。与以前的数据集相比,我们的基准提供了新的挑战,涵盖了从自然场景到人造室内和室外环境的多种视角和场景类型。此外,我们提供的数据具有更高的时间和空间分辨率。我们的基准是第一个涵盖手持移动设备的重要用例,同时还提供高分辨率单反相机图像的基准。我们在http://www.eth3d.net上提供我们的数据集和在线评估服务器。
{"title":"A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos","authors":"Thomas Schöps, Johannes L. Schönberger, S. Galliani, Torsten Sattler, K. Schindler, M. Pollefeys, Andreas Geiger","doi":"10.1109/CVPR.2017.272","DOIUrl":"https://doi.org/10.1109/CVPR.2017.272","url":null,"abstract":"Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task. Towards this goal, we recorded a variety of indoor and outdoor scenes using a high-precision laser scanner and captured both high-resolution DSLR imagery as well as synchronized low-resolution stereo videos with varying fields-of-view. To align the images with the laser scans, we propose a robust technique which minimizes photometric errors conditioned on the geometry. In contrast to previous datasets, our benchmark provides novel challenges and covers a diverse set of viewpoints and scene types, ranging from natural scenes to man-made indoor and outdoor environments. Furthermore, we provide data at significantly higher temporal and spatial resolution. Our benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images. We make our datasets and an online evaluation server available at http://www.eth3d.net.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"42 1","pages":"2538-2547"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78663564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 518
On Compressing Deep Models by Low Rank and Sparse Decomposition 基于低秩和稀疏分解的深度模型压缩
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.15
Xiyu Yu, Tongliang Liu, Xinchao Wang, D. Tao
Deep compression refers to removing the redundancy of parameters and feature maps for deep learning models. Low-rank approximation and pruning for sparse structures play a vital role in many compression works. However, weight filters tend to be both low-rank and sparse. Neglecting either part of these structure information in previous methods results in iteratively retraining, compromising accuracy, and low compression rates. Here we propose a unified framework integrating the low-rank and sparse decomposition of weight matrices with the feature map reconstructions. Our model includes methods like pruning connections as special cases, and is optimized by a fast SVD-free algorithm. It has been theoretically proven that, with a small sample, due to its generalizability, our model can well reconstruct the feature maps on both training and test data, which results in less compromising accuracy prior to the subsequent retraining. With such a warm start to retrain, the compression method always possesses several merits: (a) higher compression rates, (b) little loss of accuracy, and (c) fewer rounds to compress deep models. The experimental results on several popular models such as AlexNet, VGG-16, and GoogLeNet show that our model can significantly reduce the parameters for both convolutional and fully-connected layers. As a result, our model reduces the size of VGG-16 by 15×, better than other recent compression methods that use a single strategy.
深度压缩是指去除深度学习模型中参数和特征映射的冗余。稀疏结构的低秩逼近和剪枝在许多压缩工作中起着至关重要的作用。然而,权重过滤器往往是低秩和稀疏的。在以前的方法中忽略这些结构信息的任何一部分都会导致迭代再训练,影响准确性和低压缩率。在此,我们提出了一种将权矩阵的低秩稀疏分解与特征映射重构相结合的统一框架。我们的模型将修剪连接等方法作为特殊情况,并通过快速无奇异值分解算法进行优化。从理论上证明,在小样本情况下,由于其可泛化性,我们的模型可以很好地重建训练和测试数据上的特征映射,从而在随后的再训练之前降低准确性。有了这样一个温暖的开始再训练,压缩方法总是具有几个优点:(a)更高的压缩率,(b)精度损失小,(c)压缩深度模型的回合更少。在AlexNet、VGG-16和GoogLeNet等几个流行模型上的实验结果表明,我们的模型可以显著降低卷积层和全连接层的参数。因此,我们的模型将VGG-16的大小减少了15×,比最近使用单一策略的其他压缩方法要好。
{"title":"On Compressing Deep Models by Low Rank and Sparse Decomposition","authors":"Xiyu Yu, Tongliang Liu, Xinchao Wang, D. Tao","doi":"10.1109/CVPR.2017.15","DOIUrl":"https://doi.org/10.1109/CVPR.2017.15","url":null,"abstract":"Deep compression refers to removing the redundancy of parameters and feature maps for deep learning models. Low-rank approximation and pruning for sparse structures play a vital role in many compression works. However, weight filters tend to be both low-rank and sparse. Neglecting either part of these structure information in previous methods results in iteratively retraining, compromising accuracy, and low compression rates. Here we propose a unified framework integrating the low-rank and sparse decomposition of weight matrices with the feature map reconstructions. Our model includes methods like pruning connections as special cases, and is optimized by a fast SVD-free algorithm. It has been theoretically proven that, with a small sample, due to its generalizability, our model can well reconstruct the feature maps on both training and test data, which results in less compromising accuracy prior to the subsequent retraining. With such a warm start to retrain, the compression method always possesses several merits: (a) higher compression rates, (b) little loss of accuracy, and (c) fewer rounds to compress deep models. The experimental results on several popular models such as AlexNet, VGG-16, and GoogLeNet show that our model can significantly reduce the parameters for both convolutional and fully-connected layers. As a result, our model reduces the size of VGG-16 by 15×, better than other recent compression methods that use a single strategy.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"300 1","pages":"67-76"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77421516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 349
Efficient Multiple Instance Metric Learning Using Weakly Supervised Data 基于弱监督数据的高效多实例度量学习
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.630
M. Law, Yaoliang Yu, R. Urtasun, R. Zemel, E. Xing
We consider learning a distance metric in a weakly supervised setting where bags (or sets) of instances are labeled with bags of labels. A general approach is to formulate the problem as a Multiple Instance Learning (MIL) problem where the metric is learned so that the distances between instances inferred to be similar are smaller than the distances between instances inferred to be dissimilar. Classic approaches alternate the optimization over the learned metric and the assignment of similar instances. In this paper, we propose an efficient method that jointly learns the metric and the assignment of instances. In particular, our model is learned by solving an extension of k-means for MIL problems where instances are assigned to categories depending on annotations provided at bag-level. Our learning algorithm is much faster than existing metric learning methods for MIL problems and obtains state-of-the-art recognition performance in automated image annotation and instance classification for face identification.
我们考虑在弱监督设置中学习距离度量,其中袋(或集)的实例被标记为袋的标签。一般的方法是将问题表述为多实例学习(MIL)问题,其中度量被学习,以便推断出相似的实例之间的距离小于推断出不相似的实例之间的距离。经典的方法交替优化学习度量和分配相似的实例。在本文中,我们提出了一种联合学习度量和实例分配的有效方法。特别是,我们的模型是通过解决MIL问题的k-means扩展来学习的,在MIL问题中,实例根据包级提供的注释被分配到类别。我们的学习算法比现有的MIL问题的度量学习方法快得多,并且在人脸识别的自动图像标注和实例分类中获得了最先进的识别性能。
{"title":"Efficient Multiple Instance Metric Learning Using Weakly Supervised Data","authors":"M. Law, Yaoliang Yu, R. Urtasun, R. Zemel, E. Xing","doi":"10.1109/CVPR.2017.630","DOIUrl":"https://doi.org/10.1109/CVPR.2017.630","url":null,"abstract":"We consider learning a distance metric in a weakly supervised setting where bags (or sets) of instances are labeled with bags of labels. A general approach is to formulate the problem as a Multiple Instance Learning (MIL) problem where the metric is learned so that the distances between instances inferred to be similar are smaller than the distances between instances inferred to be dissimilar. Classic approaches alternate the optimization over the learned metric and the assignment of similar instances. In this paper, we propose an efficient method that jointly learns the metric and the assignment of instances. In particular, our model is learned by solving an extension of k-means for MIL problems where instances are assigned to categories depending on annotations provided at bag-level. Our learning algorithm is much faster than existing metric learning methods for MIL problems and obtains state-of-the-art recognition performance in automated image annotation and instance classification for face identification.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"356 1","pages":"5948-5956"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80146968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features 基于尺度不变多模态多分辨率滤波特征的快速增强检测
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.112
A. Costea, R. Varga, S. Nedevschi
In this paper we propose a novel boosting-based sliding window solution for object detection which can keep up with the precision of the state-of-the art deep learning approaches, while being 10 to 100 times faster. The solution takes advantage of multisensorial perception and exploits information from color, motion and depth. We introduce multimodal multiresolution filtering of signal intensity, gradient magnitude and orientation channels, in order to capture structure at multiple scales and orientations. To achieve scale invariant classification features, we analyze the effect of scale change on features for different filter types and propose a correction scheme. To improve recognition we incorporate 2D and 3D context by generating spatial, geometric and symmetrical channels. Finally, we evaluate the proposed solution on multiple benchmarks for the detection of pedestrians, cars and bicyclists. We achieve competitive results at over 25 frames per second.
在本文中,我们提出了一种新的基于增强的滑动窗口对象检测解决方案,该解决方案可以跟上最先进的深度学习方法的精度,同时速度快10到100倍。该解决方案利用了多感官感知,并利用了来自颜色、运动和深度的信息。我们引入了信号强度、梯度幅度和方向通道的多模态多分辨率滤波,以便在多个尺度和方向上捕获结构。为了实现尺度不变的分类特征,我们分析了尺度变化对不同滤波类型特征的影响,并提出了一种校正方案。为了提高识别,我们通过生成空间、几何和对称通道来结合2D和3D上下文。最后,我们在检测行人、汽车和骑自行车的人的多个基准上评估了所提出的解决方案。我们在每秒超过25帧的速度下取得了有竞争力的结果。
{"title":"Fast Boosting Based Detection Using Scale Invariant Multimodal Multiresolution Filtered Features","authors":"A. Costea, R. Varga, S. Nedevschi","doi":"10.1109/CVPR.2017.112","DOIUrl":"https://doi.org/10.1109/CVPR.2017.112","url":null,"abstract":"In this paper we propose a novel boosting-based sliding window solution for object detection which can keep up with the precision of the state-of-the art deep learning approaches, while being 10 to 100 times faster. The solution takes advantage of multisensorial perception and exploits information from color, motion and depth. We introduce multimodal multiresolution filtering of signal intensity, gradient magnitude and orientation channels, in order to capture structure at multiple scales and orientations. To achieve scale invariant classification features, we analyze the effect of scale change on features for different filter types and propose a correction scheme. To improve recognition we incorporate 2D and 3D context by generating spatial, geometric and symmetrical channels. Finally, we evaluate the proposed solution on multiple benchmarks for the detection of pedestrians, cars and bicyclists. We achieve competitive results at over 25 frames per second.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"40 1","pages":"993-1002"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79299884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Scene Parsing through ADE20K Dataset 通过ADE20K数据集进行场景解析
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.544
Bolei Zhou, Hang Zhao, Xavier Puig, S. Fidler, Adela Barriuso, A. Torralba
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the communitys efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with 150 object and stuff classes included. Several segmentation baseline models are evaluated on the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines. We further show that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis1.
场景解析,或识别和分割图像中的物体和材料,是计算机视觉的关键问题之一。尽管社区在数据收集方面做出了努力,但仍然很少有涵盖广泛场景和对象类别的图像数据集具有密集和详细的场景解析注释。在本文中,我们介绍并分析了ADE20K数据集,该数据集涵盖了场景、对象、对象的部分以及某些情况下甚至部分的部分的各种注释。场景解析基准是建立在ADE20K上的,包含150个对象和材料类。在基准上对几种分割基线模型进行了评估。提出了一种新的网络设计,称为级联分割模块,将场景解析为级联的材料,对象和对象部分,并在基线上进行改进。我们进一步证明,经过训练的场景解析网络可以用于图像内容去除和场景合成等应用。
{"title":"Scene Parsing through ADE20K Dataset","authors":"Bolei Zhou, Hang Zhao, Xavier Puig, S. Fidler, Adela Barriuso, A. Torralba","doi":"10.1109/CVPR.2017.544","DOIUrl":"https://doi.org/10.1109/CVPR.2017.544","url":null,"abstract":"Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the communitys efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with 150 object and stuff classes included. Several segmentation baseline models are evaluated on the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines. We further show that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis1.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"5122-5130"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80888441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2156
Switching Convolutional Neural Network for Crowd Counting 切换卷积神经网络用于人群计数
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.429
Deepak Babu Sam, Shiv Surya, R. Venkatesh Babu
We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence better performance compared to current state-of-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.
我们提出了一个新的人群计数模型,将给定的人群场景映射到它的密度。人群分析是由无数因素组成的,比如由于极度拥挤而导致的人与人之间的相互遮挡,人与背景元素之间的外观高度相似,以及相机视角的巨大可变性。目前最先进的方法是通过使用多尺度CNN架构、循环网络和具有不同接受域的多列CNN特征的后期融合来解决这些因素。我们提出了切换卷积神经网络,利用图像中人群密度的变化来提高预测人群数量的准确性和定位。基于训练过程中建立的CNN的人群计数预测质量,将来自人群场景中的网格的patch传递给独立的CNN回归器。独立的CNN回归器被设计为具有不同的接受域,并且训练切换分类器将人群场景补丁传递给最佳CNN回归器。我们在所有主要的人群计数数据集上进行了广泛的实验,并证明与当前最先进的方法相比,性能更好。我们提供了从切换推断的人群场景补丁空间的多切分的可解释表示。可以观察到,开关根据人群密度将图像patch中继到特定的CNN列。
{"title":"Switching Convolutional Neural Network for Crowd Counting","authors":"Deepak Babu Sam, Shiv Surya, R. Venkatesh Babu","doi":"10.1109/CVPR.2017.429","DOIUrl":"https://doi.org/10.1109/CVPR.2017.429","url":null,"abstract":"We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence better performance compared to current state-of-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"4031-4039"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85418611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 792
On the Effectiveness of Visible Watermarks 关于可见水印的有效性
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.726
Tali Dekel, Michael Rubinstein, Ce Liu, W. Freeman
Visible watermarking is a widely-used technique for marking and protecting copyrights of many millions of images on the web, yet it suffers from an inherent security flaw—watermarks are typically added in a consistent manner to many images. We show that this consistency allows to automatically estimate the watermark and recover the original images with high accuracy. Specifically, we present a generalized multi-image matting algorithm that takes a watermarked image collection as input and automatically estimates the foreground (watermark), its alpha matte, and the background (original) images. Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured. We demonstrate the algorithm on stock imagery available on the web, and provide extensive quantitative analysis on synthetic watermarked data. A key takeaway message of this paper is that visible watermarks should be designed to not only be robust against removal from a single image, but to be more resistant to mass-scale removal from image collections as well.
可见水印是一种广泛使用的技术,用于标记和保护网络上数百万图像的版权,但它存在固有的安全缺陷—水印通常以一致的方式添加到许多图像中。研究表明,这种一致性可以自动估计水印并以较高的精度恢复原始图像。具体来说,我们提出了一种广义的多图像抠图算法,该算法将带水印的图像集合作为输入,并自动估计前景(水印),其alpha哑光和背景(原始)图像。由于这种攻击依赖于图像集合中水印的一致性,我们探索和评估了水印嵌入中各种类型的不一致性如何影响它,这些不一致性可能被用来使水印更安全。我们在网上可用的库存图像上演示了该算法,并对合成水印数据进行了广泛的定量分析。本文的一个关键信息是,可见水印的设计不仅要对单个图像的移除具有鲁棒性,而且要对图像集合中的大规模移除具有更强的抵抗力。
{"title":"On the Effectiveness of Visible Watermarks","authors":"Tali Dekel, Michael Rubinstein, Ce Liu, W. Freeman","doi":"10.1109/CVPR.2017.726","DOIUrl":"https://doi.org/10.1109/CVPR.2017.726","url":null,"abstract":"Visible watermarking is a widely-used technique for marking and protecting copyrights of many millions of images on the web, yet it suffers from an inherent security flaw—watermarks are typically added in a consistent manner to many images. We show that this consistency allows to automatically estimate the watermark and recover the original images with high accuracy. Specifically, we present a generalized multi-image matting algorithm that takes a watermarked image collection as input and automatically estimates the foreground (watermark), its alpha matte, and the background (original) images. Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured. We demonstrate the algorithm on stock imagery available on the web, and provide extensive quantitative analysis on synthetic watermarked data. A key takeaway message of this paper is that visible watermarks should be designed to not only be robust against removal from a single image, but to be more resistant to mass-scale removal from image collections as well.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"103 1","pages":"6864-6872"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85844360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Deeply Supervised Salient Object Detection with Short Connections 基于短连接的深度监督显著目标检测
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.563
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, A. Borji, Z. Tu, Philip H. S. Torr
Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and saliency detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holisitcally-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new saliency method by introducing short connections to the skip-layer structures within the HED architecture. Our framework provides rich multi-scale feature maps at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms.
近年来,显著性检测取得了长足的进展,主要得益于卷积神经网络(cnn)的爆炸性发展。近年来发展起来的语义分割和显著性检测算法大多是基于全卷积神经网络的。与没有明确处理尺度空间问题的通用FCN模型相比,仍有很大的改进空间。整体嵌套边缘检测器(HED)为边缘和边界检测提供了一种具有深度监督的跳跃层结构,但其在显著性检测上的性能提升并不明显。在本文中,我们提出了一种新的显著性方法,将短连接引入到HED体系结构中的跨层结构中。我们的框架在每一层都提供了丰富的多尺度特征图,这是执行片段检测所迫切需要的属性。我们的方法在5个广泛测试的显著目标检测基准上产生了最先进的结果,与现有算法相比,在效率(每张图像0.08秒)、有效性和简单性方面具有优势。
{"title":"Deeply Supervised Salient Object Detection with Short Connections","authors":"Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, A. Borji, Z. Tu, Philip H. S. Torr","doi":"10.1109/CVPR.2017.563","DOIUrl":"https://doi.org/10.1109/CVPR.2017.563","url":null,"abstract":"Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and saliency detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holisitcally-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new saliency method by introducing short connections to the skip-layer structures within the HED architecture. Our framework provides rich multi-scale feature maps at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"93 1","pages":"5300-5309"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76029187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 125
A Dataset for Benchmarking Image-Based Localization 基于图像的定位基准数据集
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.598
Xun Sun, Yuanfan Xie, Peiwen Luo, Liang Wang
A novel dataset for benchmarking image-based localization is presented. With increasing research interests in visual place recognition and localization, several datasets have been published in the past few years. One of the evident limitations of existing datasets is that precise ground truth camera poses of query images are not available in a meaningful 3D metric system. This is in part due to the underlying 3D models of these datasets are reconstructed from Structure from Motion methods. So far little attention has been paid to metric evaluations of localization accuracy. In this paper we address the problem of whether state-of-the-art visual localization techniques can be applied to tasks with demanding accuracy requirements. We acquired training data for a large indoor environment with cameras and a LiDAR scanner. In addition, we collected over 2000 query images with cell phone cameras. Using LiDAR point clouds as a reference, we employed a semi-automatic approach to estimate the 6 degrees of freedom camera poses precisely in the world coordinate system. The proposed dataset enables us to quantitatively assess the performance of various algorithms using a fair and intuitive metric.
提出了一种新的基于图像的定位基准数据集。随着人们对视觉位置识别和定位的研究兴趣的增加,在过去的几年里已经发表了一些数据集。现有数据集的一个明显限制是,查询图像的精确地真相机姿态无法在有意义的三维度量系统中获得。这部分是由于这些数据集的底层3D模型是从结构运动方法重建的。迄今为止,定位精度的度量评价很少受到重视。在本文中,我们讨论了最先进的视觉定位技术是否可以应用于具有苛刻精度要求的任务。我们通过摄像头和激光雷达扫描仪获得了大型室内环境的训练数据。此外,我们用手机相机收集了2000多张查询图像。以LiDAR点云为参考,采用半自动方法在世界坐标系下精确估计6个自由度的相机姿态。提出的数据集使我们能够使用公平和直观的度量来定量评估各种算法的性能。
{"title":"A Dataset for Benchmarking Image-Based Localization","authors":"Xun Sun, Yuanfan Xie, Peiwen Luo, Liang Wang","doi":"10.1109/CVPR.2017.598","DOIUrl":"https://doi.org/10.1109/CVPR.2017.598","url":null,"abstract":"A novel dataset for benchmarking image-based localization is presented. With increasing research interests in visual place recognition and localization, several datasets have been published in the past few years. One of the evident limitations of existing datasets is that precise ground truth camera poses of query images are not available in a meaningful 3D metric system. This is in part due to the underlying 3D models of these datasets are reconstructed from Structure from Motion methods. So far little attention has been paid to metric evaluations of localization accuracy. In this paper we address the problem of whether state-of-the-art visual localization techniques can be applied to tasks with demanding accuracy requirements. We acquired training data for a large indoor environment with cameras and a LiDAR scanner. In addition, we collected over 2000 query images with cell phone cameras. Using LiDAR point clouds as a reference, we employed a semi-automatic approach to estimate the 6 degrees of freedom camera poses precisely in the world coordinate system. The proposed dataset enables us to quantitatively assess the performance of various algorithms using a fair and intuitive metric.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"5641-5649"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87882579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Exclusivity-Consistency Regularized Multi-view Subspace Clustering 排他-一致性正则化多视图子空间聚类
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.8
Xiaobo Wang, Xiaojie Guo, Zhen Lei, Changqing Zhang, S. Li
Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in recent years, but with rare exploitation of the representation complementarity between different views as well as the indicator consistency among the representations, let alone considering them simultaneously. In this paper, we propose a novel multi-view subspace clustering model that attempts to harness the complementary information between different representations by introducing a novel position-aware exclusivity term. Meanwhile, a consistency term is employed to make these complementary representations to further have a common indicator. We formulate the above concerns into a unified optimization framework. Experimental results on several benchmark datasets are conducted to reveal the effectiveness of our algorithm over other state-of-the-arts.
多视图子空间聚类的目的是将一组多源数据划分为它们的底层组。为了提高多视图聚类的性能,近年来开发了许多子空间学习算法,但很少利用不同视图之间的表示互补性和表示之间的指标一致性,更不用说同时考虑它们了。在本文中,我们提出了一种新的多视图子空间聚类模型,该模型试图通过引入新的位置感知排他性项来利用不同表示之间的互补信息。同时,利用一致性项使这些互补表示进一步具有一个共同的指标。我们将上述关注点制定成一个统一的优化框架。在几个基准数据集上进行的实验结果表明,我们的算法比其他最先进的算法更有效。
{"title":"Exclusivity-Consistency Regularized Multi-view Subspace Clustering","authors":"Xiaobo Wang, Xiaojie Guo, Zhen Lei, Changqing Zhang, S. Li","doi":"10.1109/CVPR.2017.8","DOIUrl":"https://doi.org/10.1109/CVPR.2017.8","url":null,"abstract":"Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in recent years, but with rare exploitation of the representation complementarity between different views as well as the indicator consistency among the representations, let alone considering them simultaneously. In this paper, we propose a novel multi-view subspace clustering model that attempts to harness the complementary information between different representations by introducing a novel position-aware exclusivity term. Meanwhile, a consistency term is employed to make these complementary representations to further have a common indicator. We formulate the above concerns into a unified optimization framework. Experimental results on several benchmark datasets are conducted to reveal the effectiveness of our algorithm over other state-of-the-arts.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 4","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91507007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1