首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Learning Selective Self-Mutual Attention for RGB-D Saliency Detection RGB-D显著性检测的学习选择性自互注意
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01377
Nian Liu, Ni Zhang, Junwei Han
Saliency detection on RGB-D images is receiving more and more research interests recently. Previous models adopt the early fusion or the result fusion scheme to fuse the input RGB and depth data or their saliency maps, which incur the problem of distribution gap or information loss. Some other models use the feature fusion scheme but are limited by the linear feature fusion methods. In this paper, we propose to fuse attention learned in both modalities. Inspired by the Non-local model, we integrate the self-attention and each other's attention to propagate long-range contextual dependencies, thus incorporating multi-modal information to learn attention and propagate contexts more accurately. Considering the reliability of the other modality's attention, we further propose a selection attention to weight the newly added attention term. We embed the proposed attention module in a two-stream CNN for RGB-D saliency detection. Furthermore, we also propose a residual fusion module to fuse the depth decoder features into the RGB stream. Experimental results on seven benchmark datasets demonstrate the effectiveness of the proposed model components and our final saliency model. Our code and saliency maps are available at https://github.com/nnizhang/S2MA.
近年来,RGB-D图像的显著性检测受到越来越多的关注。以往的模型采用早期融合或结果融合方案来融合输入的RGB和depth数据或它们的显著性图,存在分布差距或信息丢失的问题。其他一些模型采用了特征融合方案,但受到线性特征融合方法的限制。在本文中,我们建议融合这两种模式下的注意力。受非局部模型的启发,我们将自我注意和彼此注意结合起来传播远程上下文依赖,从而结合多模态信息更准确地学习注意和传播上下文。考虑到其他模态注意的可靠性,我们进一步提出了一个选择注意来加权新增加的注意项。我们将提出的注意力模块嵌入到两流CNN中,用于RGB-D显著性检测。此外,我们还提出了残差融合模块,将深度解码器特征融合到RGB流中。在七个基准数据集上的实验结果证明了所提出的模型组件和我们最终的显著性模型的有效性。我们的代码和显著性图可在https://github.com/nnizhang/S2MA上获得。
{"title":"Learning Selective Self-Mutual Attention for RGB-D Saliency Detection","authors":"Nian Liu, Ni Zhang, Junwei Han","doi":"10.1109/cvpr42600.2020.01377","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01377","url":null,"abstract":"Saliency detection on RGB-D images is receiving more and more research interests recently. Previous models adopt the early fusion or the result fusion scheme to fuse the input RGB and depth data or their saliency maps, which incur the problem of distribution gap or information loss. Some other models use the feature fusion scheme but are limited by the linear feature fusion methods. In this paper, we propose to fuse attention learned in both modalities. Inspired by the Non-local model, we integrate the self-attention and each other's attention to propagate long-range contextual dependencies, thus incorporating multi-modal information to learn attention and propagate contexts more accurately. Considering the reliability of the other modality's attention, we further propose a selection attention to weight the newly added attention term. We embed the proposed attention module in a two-stream CNN for RGB-D saliency detection. Furthermore, we also propose a residual fusion module to fuse the depth decoder features into the RGB stream. Experimental results on seven benchmark datasets demonstrate the effectiveness of the proposed model components and our final saliency model. Our code and saliency maps are available at https://github.com/nnizhang/S2MA.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"133 1","pages":"13753-13762"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75762155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 173
A Self-supervised Approach for Adversarial Robustness 对抗鲁棒性的自监督方法
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00034
Muzammal Naseer, Salman Hameed Khan, Munawar Hayat, F. Khan, F. Porikli
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock towards their real-world deployment. Transferability of adversarial examples demand generalizable defenses that can provide cross-task protection. Adversarial training that enhances robustness by modifying target model's parameters lacks such generalizability. On the other hand, different input processing based defenses fall short in the face of continuously evolving attacks. In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space. By design, our defense is a generalizable approach and provides significant robustness against the textbf{unseen} adversarial attacks (eg by reducing the success rate of translation-invariant textbf{ensemble} attack from 82.6% to 31.9% in comparison to previous state-of-the-art). It can be deployed as a plug-and-play solution to protect a variety of vision systems, as we demonstrate for the case of classification, segmentation and detection.
在基于深度神经网络(dnn)的视觉系统中,例如分类、分割和对象检测,对抗性示例可能会导致灾难性的错误。dnn对此类攻击的脆弱性可能是其在现实世界中部署的主要障碍。对抗性示例的可转移性要求可以提供跨任务保护的通用防御。通过修改目标模型参数来增强鲁棒性的对抗性训练缺乏这种泛化性。另一方面,面对不断演变的攻击,基于不同输入处理的防御能力不足。在本文中,我们首先将两种方法的优点结合起来,提出了一种输入空间中的自监督对抗训练机制。通过设计,我们的防御是一种可推广的方法,并且对textbf{看不见}的对抗性攻击提供了显著的鲁棒性(eg通过将平移不变textbf{集成}攻击的成功率从82.6%降低到31.9%)。它可以作为一个即插即用的解决方案来部署,以保护各种视觉系统,正如我们在分类,分割和检测的情况下所演示的那样。
{"title":"A Self-supervised Approach for Adversarial Robustness","authors":"Muzammal Naseer, Salman Hameed Khan, Munawar Hayat, F. Khan, F. Porikli","doi":"10.1109/cvpr42600.2020.00034","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00034","url":null,"abstract":"Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems e.g., for classification, segmentation and object detection. The vulnerability of DNNs against such attacks can prove a major roadblock towards their real-world deployment. Transferability of adversarial examples demand generalizable defenses that can provide cross-task protection. Adversarial training that enhances robustness by modifying target model's parameters lacks such generalizability. On the other hand, different input processing based defenses fall short in the face of continuously evolving attacks. In this paper, we take the first step to combine the benefits of both approaches and propose a self-supervised adversarial training mechanism in the input space. By design, our defense is a generalizable approach and provides significant robustness against the textbf{unseen} adversarial attacks (eg by reducing the success rate of translation-invariant textbf{ensemble} attack from 82.6% to 31.9% in comparison to previous state-of-the-art). It can be deployed as a plug-and-play solution to protect a variety of vision systems, as we demonstrate for the case of classification, segmentation and detection.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"46 1","pages":"259-268"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74450315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 157
Varicolored Image De-Hazing 彩色图像去雾
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00462
Akshay Dudhane, K. Biradar, Prashant W. Patil, Praful Hambarde, S. Murala
The quality of images captured in bad weather is often affected by chromatic casts and low visibility due to the presence of atmospheric particles. Restoration of the color balance is often ignored in most of the existing image de-hazing methods. In this paper, we propose a varicolored end-to-end image de-hazing network which restores the color balance in a given varicolored hazy image and recovers the haze-free image. The proposed network comprises of 1) Haze color correction (HCC) module and 2) Visibility improvement (VI) module. The proposed HCC module provides required attention to each color channel and generates a color balanced hazy image. While the proposed VI module processes the color balanced hazy image through novel inception attention block to recover the haze-free image. We also propose a novel approach to generate a large-scale varicolored synthetic hazy image database. An ablation study has been carried out to demonstrate the effect of different factors on the performance of the proposed network for image de-hazing. Three benchmark synthetic datasets have been used for quantitative analysis of the proposed network. Visual results on a set of real-world hazy images captured in different weather conditions demonstrate the effectiveness of the proposed approach for varicolored image de-hazing.
在恶劣天气下拍摄的图像质量经常受到色差和由于大气颗粒的存在而造成的低能见度的影响。在现有的图像去雾方法中,色彩平衡的恢复往往被忽略。本文提出了一种彩色端到端图像去雾网络,该网络可以在给定的彩色模糊图像中恢复颜色平衡,从而恢复无雾图像。该网络包括1)霾色校正(HCC)模块和2)能见度改善(VI)模块。提出的HCC模块提供了对每个颜色通道所需的关注,并生成了色彩平衡的模糊图像。而本文提出的VI模块通过新颖的初始注意块对色彩平衡的模糊图像进行处理,恢复无雾图像。我们还提出了一种新的方法来生成大规模的彩色合成模糊图像数据库。一个消融研究已经进行,以证明不同的因素对所提出的网络的性能的影响,用于图像去雾。三个基准合成数据集被用于对所提出的网络进行定量分析。在不同天气条件下拍摄的一组真实朦胧图像的视觉结果证明了所提出的方法对彩色图像去雾的有效性。
{"title":"Varicolored Image De-Hazing","authors":"Akshay Dudhane, K. Biradar, Prashant W. Patil, Praful Hambarde, S. Murala","doi":"10.1109/CVPR42600.2020.00462","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00462","url":null,"abstract":"The quality of images captured in bad weather is often affected by chromatic casts and low visibility due to the presence of atmospheric particles. Restoration of the color balance is often ignored in most of the existing image de-hazing methods. In this paper, we propose a varicolored end-to-end image de-hazing network which restores the color balance in a given varicolored hazy image and recovers the haze-free image. The proposed network comprises of 1) Haze color correction (HCC) module and 2) Visibility improvement (VI) module. The proposed HCC module provides required attention to each color channel and generates a color balanced hazy image. While the proposed VI module processes the color balanced hazy image through novel inception attention block to recover the haze-free image. We also propose a novel approach to generate a large-scale varicolored synthetic hazy image database. An ablation study has been carried out to demonstrate the effect of different factors on the performance of the proposed network for image de-hazing. Three benchmark synthetic datasets have been used for quantitative analysis of the proposed network. Visual results on a set of real-world hazy images captured in different weather conditions demonstrate the effectiveness of the proposed approach for varicolored image de-hazing.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"20 1","pages":"4563-4572"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74754291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations 躲猫猫:平面表示室内场景中的遮挡推理
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00019
Ziyu Jiang, Buyu Liu, S. Schulter, Zhangyang Wang, Manmohan Chandraker
We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane. We infer these planes from a single input image with a novel neural network architecture. It consists of a two-branch category-specific module that aims to predict layout and objects of the scene separately so that different types of planes can be handled better. We also introduce a novel loss function based on plane warping that can leverage multiple views at training time for improved occlusion-aware reasoning. In order to train and evaluate our occlusion-reasoning model, we use the ScanNet dataset and propose (i) a strategy to automatically extract ground truth for both visible and hidden regions and (ii) a new evaluation metric that specifically focuses on the prediction in hidden regions. We empirically demonstrate that our proposed approach can achieve higher accuracy for occlusion reasoning compared to competitive baselines on the ScanNet dataset, e.g. 42.65% relative improvement on hidden regions.
我们解决了闭塞感知室内3D场景理解的挑战性任务。我们通过一组平面来表示场景,其中每个平面都由其法线,偏移量和两个遮罩来定义(i)可见部分的范围和(ii)由平面的可见部分和遮挡部分组成的完整区域。我们用一种新颖的神经网络架构从单个输入图像中推断出这些平面。它由两个分支类别特定模块组成,旨在分别预测场景的布局和对象,以便更好地处理不同类型的平面。我们还引入了一种新的基于平面扭曲的损失函数,可以在训练时利用多个视图来改进闭塞感知推理。为了训练和评估我们的遮挡推理模型,我们使用ScanNet数据集并提出(i)一种自动提取可见和隐藏区域的地面真相的策略,以及(ii)一种专门关注隐藏区域预测的新评估指标。我们的经验证明,与ScanNet数据集上的竞争基线相比,我们提出的方法可以实现更高的遮挡推理精度,例如在隐藏区域上相对提高42.65%。
{"title":"Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations","authors":"Ziyu Jiang, Buyu Liu, S. Schulter, Zhangyang Wang, Manmohan Chandraker","doi":"10.1109/cvpr42600.2020.00019","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00019","url":null,"abstract":"We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane. We infer these planes from a single input image with a novel neural network architecture. It consists of a two-branch category-specific module that aims to predict layout and objects of the scene separately so that different types of planes can be handled better. We also introduce a novel loss function based on plane warping that can leverage multiple views at training time for improved occlusion-aware reasoning. In order to train and evaluate our occlusion-reasoning model, we use the ScanNet dataset and propose (i) a strategy to automatically extract ground truth for both visible and hidden regions and (ii) a new evaluation metric that specifically focuses on the prediction in hidden regions. We empirically demonstrate that our proposed approach can achieve higher accuracy for occlusion reasoning compared to competitive baselines on the ScanNet dataset, e.g. 42.65% relative improvement on hidden regions.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"110-118"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73246375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Toward a Universal Model for Shape From Texture 从纹理到形状的通用模型
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00050
Dor Verbin, Todd E. Zickler
We consider the shape from texture problem, where the input is a single image of a curved, textured surface, and the texture and shape are both a priori unknown. We formulate this task as a three-player game between a shape process, a texture process, and a discriminator. The discriminator adapts a set of non-linear filters to try to distinguish image patches created by the texture process from those created by the shape process, while the shape and texture processes try to create image patches that are indistinguishable from those of the other. An equilibrium of this game yields two things: an estimate of the 2.5D surface from the shape process, and a stochastic texture synthesis model from the texture process. Experiments show that this approach is robust to common non-idealities such as shading, gloss, and clutter. We also find that it succeeds for a wide variety of texture types, including both periodic textures and those composed of isolated textons, which have previously required distinct and specialized processing.
我们从纹理问题中考虑形状,其中输入是一个弯曲的纹理表面的单个图像,纹理和形状都是先验未知的。我们将此任务表述为形状处理、纹理处理和鉴别器之间的三人游戏。鉴别器采用一组非线性滤波器来尝试区分由纹理处理产生的图像斑块和由形状处理产生的图像斑块,而形状和纹理处理则尝试创建与其他处理无法区分的图像斑块。这个游戏的平衡产生两件事:从形状过程中估计2.5D表面,以及从纹理过程中产生随机纹理合成模型。实验表明,该方法对阴影、光泽和杂波等常见非理想情况具有较强的鲁棒性。我们还发现,它成功地适用于各种各样的纹理类型,包括周期性纹理和由孤立纹理组成的纹理,这些纹理以前需要独特和专门的处理。
{"title":"Toward a Universal Model for Shape From Texture","authors":"Dor Verbin, Todd E. Zickler","doi":"10.1109/cvpr42600.2020.00050","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00050","url":null,"abstract":"We consider the shape from texture problem, where the input is a single image of a curved, textured surface, and the texture and shape are both a priori unknown. We formulate this task as a three-player game between a shape process, a texture process, and a discriminator. The discriminator adapts a set of non-linear filters to try to distinguish image patches created by the texture process from those created by the shape process, while the shape and texture processes try to create image patches that are indistinguishable from those of the other. An equilibrium of this game yields two things: an estimate of the 2.5D surface from the shape process, and a stochastic texture synthesis model from the texture process. Experiments show that this approach is robust to common non-idealities such as shading, gloss, and clutter. We also find that it succeeds for a wide variety of texture types, including both periodic textures and those composed of isolated textons, which have previously required distinct and specialized processing.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"419-427"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74399514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning MemNAS:具有生长修剪学习的高效记忆神经结构搜索
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00218
Peiye Liu, Bo Wu, Huadong Ma, Mingoo Seok
Recent studies on automatic neural architecture search techniques have demonstrated significant performance, competitive to or even better than hand-crafted neural architectures. However, most of the existing search approaches tend to use residual structures and a concatenation connection between shallow and deep features. A resulted neural network model, therefore, is non-trivial for resource-constraint devices to execute since such a model requires large memory to store network parameters and intermediate feature maps along with excessive computing complexity. To address this challenge, we propose MemNAS, a novel growing and trimming based neural architecture search framework that optimizes not only performance but also memory requirement of an inference network. Specifically, in the search process, we consider running memory use, including network parameters and the essential intermediate feature maps memory requirement, as an optimization objective along with performance. Besides, to improve the accuracy of the search, we extract the correlation information among multiple candidate architectures to rank them and then choose the candidates with desired performance and memory efficiency. On the ImageNet classification task, our MemNAS achieves 75.4% accuracy, 0.7% higher than MobileNetV2 with 42.1% less memory requirement. Additional experiments confirm that the proposed MemNAS can perform well across the different targets of the trade-off between accuracy and memory consumption.
近年来对自动神经结构搜索技术的研究已经显示出显著的性能,可以与手工神经结构相竞争,甚至优于手工神经结构。然而,大多数现有的搜索方法倾向于使用残差结构和浅层和深层特征之间的串联连接。因此,生成的神经网络模型对于资源约束设备来说是非常重要的,因为这样的模型需要大量内存来存储网络参数和中间特征映射,并且计算复杂度很高。为了解决这一挑战,我们提出了MemNAS,这是一种新颖的基于生长和修剪的神经结构搜索框架,它不仅优化了性能,而且还优化了推理网络的内存需求。具体来说,在搜索过程中,我们考虑了运行内存的使用,包括网络参数和基本的中间特征映射内存需求,作为优化目标和性能。此外,为了提高搜索的准确性,我们提取了多个候选架构之间的相关信息,对它们进行排序,然后选择具有理想性能和内存效率的候选架构。在ImageNet分类任务上,我们的MemNAS达到了75.4%的准确率,比MobileNetV2高0.7%,内存需求减少42.1%。另外的实验证实,所提出的MemNAS可以在精度和内存消耗之间权衡的不同目标上表现良好。
{"title":"MemNAS: Memory-Efficient Neural Architecture Search With Grow-Trim Learning","authors":"Peiye Liu, Bo Wu, Huadong Ma, Mingoo Seok","doi":"10.1109/cvpr42600.2020.00218","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00218","url":null,"abstract":"Recent studies on automatic neural architecture search techniques have demonstrated significant performance, competitive to or even better than hand-crafted neural architectures. However, most of the existing search approaches tend to use residual structures and a concatenation connection between shallow and deep features. A resulted neural network model, therefore, is non-trivial for resource-constraint devices to execute since such a model requires large memory to store network parameters and intermediate feature maps along with excessive computing complexity. To address this challenge, we propose MemNAS, a novel growing and trimming based neural architecture search framework that optimizes not only performance but also memory requirement of an inference network. Specifically, in the search process, we consider running memory use, including network parameters and the essential intermediate feature maps memory requirement, as an optimization objective along with performance. Besides, to improve the accuracy of the search, we extract the correlation information among multiple candidate architectures to rank them and then choose the candidates with desired performance and memory efficiency. On the ImageNet classification task, our MemNAS achieves 75.4% accuracy, 0.7% higher than MobileNetV2 with 42.1% less memory requirement. Additional experiments confirm that the proposed MemNAS can perform well across the different targets of the trade-off between accuracy and memory consumption.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"2105-2113"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77492121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images 机器看到的不是它们得到的:用对抗性文本图像愚弄场景文本识别模型
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01232
Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, Heng Tao Shen
The research on scene text recognition (STR) has made remarkable progress in recent years with the development of deep neural networks (DNNs). Recent studies on adversarial attack have verified that a DNN model designed for non-sequential tasks (e.g., classification, segmentation and retrieval) can be easily fooled by adversarial examples. Actually, STR is an application highly related to security issues. However, there are few studies considering the safety and reliability of STR models that make sequential prediction. In this paper, we make the first attempt in attacking the state-of-the-art DNN-based STR models. Specifically, we propose a novel and efficient optimization-based method that can be naturally integrated to different sequential prediction schemes, i.e., connectionist temporal classification (CTC) and attention mechanism. We apply our proposed method to five state-of-the-art STR models with both targeted and untargeted attack modes, the comprehensive results on 7 real-world datasets and 2 synthetic datasets consistently show the vulnerability of these STR models with a significant performance drop. Finally, we also test our attack method on a real-world STR engine of Baidu OCR, which demonstrates the practical potentials of our method.
近年来,随着深度神经网络(dnn)的发展,场景文本识别(STR)的研究取得了显著进展。最近对对抗性攻击的研究已经证实,为非顺序任务(如分类、分割和检索)设计的DNN模型很容易被对抗性示例愚弄。实际上,STR是一个与安全问题高度相关的应用程序。然而,很少有研究考虑STR模型进行序列预测的安全性和可靠性。在本文中,我们首次尝试攻击最先进的基于dnn的STR模型。具体而言,我们提出了一种新颖高效的基于优化的方法,该方法可以自然地集成到不同的序列预测方案中,即连接主义时间分类(CTC)和注意机制。我们将所提出的方法应用于5种最先进的STR模型,包括目标和非目标攻击模式,在7个真实数据集和2个合成数据集上的综合结果一致表明这些STR模型的脆弱性和显著的性能下降。最后,我们还在b百度OCR的真实STR引擎上测试了我们的攻击方法,证明了我们的方法的实用潜力。
{"title":"What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images","authors":"Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, Heng Tao Shen","doi":"10.1109/cvpr42600.2020.01232","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01232","url":null,"abstract":"The research on scene text recognition (STR) has made remarkable progress in recent years with the development of deep neural networks (DNNs). Recent studies on adversarial attack have verified that a DNN model designed for non-sequential tasks (e.g., classification, segmentation and retrieval) can be easily fooled by adversarial examples. Actually, STR is an application highly related to security issues. However, there are few studies considering the safety and reliability of STR models that make sequential prediction. In this paper, we make the first attempt in attacking the state-of-the-art DNN-based STR models. Specifically, we propose a novel and efficient optimization-based method that can be naturally integrated to different sequential prediction schemes, i.e., connectionist temporal classification (CTC) and attention mechanism. We apply our proposed method to five state-of-the-art STR models with both targeted and untargeted attack modes, the comprehensive results on 7 real-world datasets and 2 synthetic datasets consistently show the vulnerability of these STR models with a significant performance drop. Finally, we also test our attack method on a real-world STR engine of Baidu OCR, which demonstrates the practical potentials of our method.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"39 1","pages":"12301-12311"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80519017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Dynamic Fluid Surface Reconstruction Using Deep Neural Network 基于深度神经网络的动态流体表面重建
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00010
Simron Thapa, Nianyi Li, Jinwei Ye
Recovering the dynamic fluid surface is a long-standing challenging problem in computer vision. Most existing image-based methods require multiple views or a dedicated imaging system. Here we present a learning-based single-image approach for 3D fluid surface reconstruction. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the refractive distortion of a reference background image. Due to the dynamic nature of fluid surfaces, our network uses recurrent layers that carry temporal information from previous frames to achieve spatio-temporally consistent reconstruction given a video input. Due to the lack of fluid data, we synthesize a large fluid dataset using physics-based fluid modeling and rendering techniques for network training and validation. Through experiments on simulated and real captured fluid images, we demonstrate that our proposed deep neural network trained on our fluid dataset can recover dynamic 3D fluid surfaces with high accuracy.
动态流体表面的恢复是计算机视觉领域一个长期存在的难题。大多数现有的基于图像的方法需要多个视图或专用的成像系统。本文提出了一种基于学习的单图像三维流体表面重建方法。具体来说,我们设计了一个深度神经网络,通过分析参考背景图像的折射畸变来估计流体表面的深度和法线映射。由于流体表面的动态性,我们的网络使用循环层,这些层携带来自前一帧的时间信息,以实现给定视频输入的时空一致重建。由于缺乏流体数据,我们使用基于物理的流体建模和渲染技术合成了一个大型流体数据集,用于网络训练和验证。通过模拟和真实捕获的流体图像的实验,我们证明了我们所提出的深度神经网络在我们的流体数据集上训练可以高精度地恢复动态三维流体表面。
{"title":"Dynamic Fluid Surface Reconstruction Using Deep Neural Network","authors":"Simron Thapa, Nianyi Li, Jinwei Ye","doi":"10.1109/cvpr42600.2020.00010","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00010","url":null,"abstract":"Recovering the dynamic fluid surface is a long-standing challenging problem in computer vision. Most existing image-based methods require multiple views or a dedicated imaging system. Here we present a learning-based single-image approach for 3D fluid surface reconstruction. Specifically, we design a deep neural network that estimates the depth and normal maps of a fluid surface by analyzing the refractive distortion of a reference background image. Due to the dynamic nature of fluid surfaces, our network uses recurrent layers that carry temporal information from previous frames to achieve spatio-temporally consistent reconstruction given a video input. Due to the lack of fluid data, we synthesize a large fluid dataset using physics-based fluid modeling and rendering techniques for network training and validation. Through experiments on simulated and real captured fluid images, we demonstrate that our proposed deep neural network trained on our fluid dataset can recover dynamic 3D fluid surfaces with high accuracy.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"28 1","pages":"21-30"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84222236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real RL-CycleGAN:强化学习感知模拟到真实
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01117
Kanishka Rao, Chris Harris, A. Irpan, S. Levine, Julian Ibarz, Mohi Khansari
Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain the RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.
基于深度神经网络的强化学习(RL)可以为复杂的任务(如基于视觉的机器人抓取)学习适当的视觉表示,而无需手动设计或事先学习感知系统。然而,RL的数据是通过在期望的环境中运行代理来收集的,对于像机器人这样的应用程序,在现实世界中运行机器人可能非常昂贵和耗时。模拟训练提供了一个有吸引力的替代方案,但是确保在模拟中训练的策略能够有效地转移到现实世界中需要额外的机器。模拟可能与现实不匹配,通常,弥合模拟与现实之间的差距需要领域知识和特定任务的工程。我们可以通过使用生成模型将模拟图像转换为现实图像来自动化这一过程。然而,这种类型的翻译通常是与任务无关的,因为翻译的图像可能不会保留与任务相关的所有特征。在本文中,我们引入了用于图像平移的RL-scene一致性损失,它确保了平移操作相对于与图像相关的q值是不变的。这使我们能够学习任务感知翻译。将这种损失整合到无监督域翻译中,我们获得了RL-CycleGAN,这是一种用于强化学习的模拟到现实世界迁移的新方法。在对两个基于视觉的机器人抓取任务的RL-CycleGAN的评估中,我们表明RL-CycleGAN比许多先前的模拟到真实转移方法提供了实质性的改进,仅用少量的真实世界观察就获得了出色的真实世界性能。
{"title":"RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real","authors":"Kanishka Rao, Chris Harris, A. Irpan, S. Levine, Julian Ibarz, Mohi Khansari","doi":"10.1109/CVPR42600.2020.01117","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01117","url":null,"abstract":"Deep neural network based reinforcement learning (RL) can learn appropriate visual representations for complex tasks like vision-based robotic grasping without the need for manually engineering or prior learning a perception system. However, data for RL is collected via running an agent in the desired environment, and for applications like robotics, running a robot in the real world may be extremely costly and time consuming. Simulated training offers an appealing alternative, but ensuring that policies trained in simulation can transfer effectively into the real world requires additional machinery. Simulations may not match reality, and typically bridging the simulation-to-reality gap requires domain knowledge and task-specific engineering. We can automate this process by employing generative models to translate simulated images into realistic ones. However, this sort of translation is typically task-agnostic, in that the translated images may not preserve all features that are relevant to the task. In this paper, we introduce the RL-scene consistency loss for image translation, which ensures that the translation operation is invariant with respect to the Q-values associated with the image. This allows us to learn a task-aware translation. Incorporating this loss into unsupervised domain translation, we obtain the RL-CycleGAN, a new approach for simulation-to-real-world transfer for reinforcement learning. In evaluations of RL-CycleGAN on two vision-based robotics grasping tasks, we show that RL-CycleGAN offers a substantial improvement over a number of prior methods for sim-to-real transfer, attaining excellent real-world performance with only a modest number of real-world observations.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"3 1","pages":"11154-11163"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85819159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
OASIS: A Large-Scale Dataset for Single Image 3D in the Wild OASIS:野外单图像3D的大规模数据集
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00076
Weifeng Chen, Shengyi Qian, David Fan, Noriyuki Kojima, Max Hamilton, Jia Deng
Single-view 3D is the task of recovering 3D properties such as depth and surface normals from a single image. We hypothesize that a major obstacle to single-image 3D is data. We address this issue by presenting Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images. We train and evaluate leading models on a variety of single-image 3D tasks. We expect OASIS to be a useful resource for 3D vision research. Project site: https://pvl.cs.princeton.edu/OASIS.
单视图3D是从单个图像中恢复3D属性,如深度和表面法线的任务。我们假设单图像3D的主要障碍是数据。我们通过提供单幅图像表面的开放注释(OASIS)来解决这个问题,OASIS是一个野外单幅图像3D数据集,由14万幅图像的详细3D几何形状的注释组成。我们在各种单图像3D任务上训练和评估领先的模型。我们期望OASIS能成为3D视觉研究的有用资源。项目网址:https://pvl.cs.princeton.edu/OASIS。
{"title":"OASIS: A Large-Scale Dataset for Single Image 3D in the Wild","authors":"Weifeng Chen, Shengyi Qian, David Fan, Noriyuki Kojima, Max Hamilton, Jia Deng","doi":"10.1109/cvpr42600.2020.00076","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00076","url":null,"abstract":"Single-view 3D is the task of recovering 3D properties such as depth and surface normals from a single image. We hypothesize that a major obstacle to single-image 3D is data. We address this issue by presenting Open Annotations of Single Image Surfaces (OASIS), a dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images. We train and evaluate leading models on a variety of single-image 3D tasks. We expect OASIS to be a useful resource for 3D vision research. Project site: https://pvl.cs.princeton.edu/OASIS.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"676-685"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78369066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1