2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching 基于多尺度对抗相关匹配的保结构立体视图合成

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00601

Yu Zhang, Dongqing Zou, Jimmy S. J. Ren, Zhe Jiang, Xiaohao Chen

This paper addresses stereoscopic view synthesis from a single image. Various recent works solve this task by reorganizing pixels from the input view to reconstruct the target one in a stereo setup. However, purely depending on such photometric-based reconstruction process, the network may produce structurally inconsistent results. Regarding this issue, this work proposes Multi-Scale Adversarial Correlation Matching (MS-ACM), a novel learning framework for structure-aware view synthesis. The proposed framework does not assume any costly supervision signal of scene structures such as depth. Instead, it models structures as self-correlation coefficients extracted from multi-scale feature maps in transformed spaces. In training, the feature space attempts to push the correlation distances between the synthesized and target images far apart, thus amplifying inconsistent structures. At the same time, the view synthesis network minimizes such correlation distances by fixing mistakes it makes. With such adversarial training, structural errors of different scales and levels are iteratively discovered and reduced, preserving both global layouts and fine-grained details. Extensive experiments on the KITTI benchmark show that MS-ACM improves both visual quality and the metrics over existing methods when plugged into recent view synthesis architectures.

本文讨论了单幅图像的立体视图合成。最近的各种工作通过重新组织输入视图中的像素来重建立体设置中的目标像素来解决这个任务。然而，纯粹依靠这种基于光度法的重建过程，网络可能产生结构不一致的结果。针对这一问题，本研究提出了多尺度对抗相关匹配(MS-ACM)，这是一种用于结构感知视图合成的新型学习框架。该框架不考虑景深等昂贵的场景结构监督信号。相反，它将结构建模为从变换空间中的多尺度特征映射中提取的自相关系数。在训练中，特征空间试图将合成图像与目标图像之间的相关距离推得很远，从而放大不一致的结构。同时，视图合成网络通过修正其所犯的错误来最小化这种相关距离。通过这种对抗性训练，可以迭代地发现和减少不同规模和级别的结构错误，从而保留全局布局和细粒度细节。在KITTI基准上的大量实验表明，当插入到最新的视图合成体系结构中时，MS-ACM比现有方法提高了视觉质量和度量。

{"title":"Structure-Preserving Stereoscopic View Synthesis With Multi-Scale Adversarial Correlation Matching","authors":"Yu Zhang, Dongqing Zou, Jimmy S. J. Ren, Zhe Jiang, Xiaohao Chen","doi":"10.1109/CVPR.2019.00601","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00601","url":null,"abstract":"This paper addresses stereoscopic view synthesis from a single image. Various recent works solve this task by reorganizing pixels from the input view to reconstruct the target one in a stereo setup. However, purely depending on such photometric-based reconstruction process, the network may produce structurally inconsistent results. Regarding this issue, this work proposes Multi-Scale Adversarial Correlation Matching (MS-ACM), a novel learning framework for structure-aware view synthesis. The proposed framework does not assume any costly supervision signal of scene structures such as depth. Instead, it models structures as self-correlation coefficients extracted from multi-scale feature maps in transformed spaces. In training, the feature space attempts to push the correlation distances between the synthesized and target images far apart, thus amplifying inconsistent structures. At the same time, the view synthesis network minimizes such correlation distances by fixing mistakes it makes. With such adversarial training, structural errors of different scales and levels are iteratively discovered and reduced, preserving both global layouts and fine-grained details. Extensive experiments on the KITTI benchmark show that MS-ACM improves both visual quality and the metrics over existing methods when plugged into recent view synthesis architectures.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"31 1","pages":"5853-5862"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74514968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Learning a Unified Classifier Incrementally via Rebalancing 通过再平衡逐步学习统一分类器

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00092

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, Dahua Lin

Conventionally, deep neural networks are trained offline, relying on a large dataset prepared in advance. This paradigm is often challenged in real-world applications, e.g. online services that involve continuous streams of incoming data. Recently, incremental learning receives increasing attention, and is considered as a promising solution to the practical challenges mentioned above. However, it has been observed that incremental learning is subject to a fundamental difficulty -- catastrophic forgetting, namely adapting a model to new data often results in severe performance degradation on previous tasks or classes. Our study reveals that the imbalance between previous and new data is a crucial cause to this problem. In this work, we develop a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly. Specifically, we incorporate three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance. Experiments show that the proposed method can effectively rebalance the training process, thus obtaining superior performance compared to the existing methods. On CIFAR-100 and ImageNet, our method can reduce the classification errors by more than 6% and 13% respectively, under the incremental setting of 10 phases.

通常，深度神经网络是离线训练的，依赖于事先准备好的大型数据集。这种模式在现实世界的应用中经常受到挑战，例如涉及连续传入数据流的在线服务。近年来，渐进式学习受到越来越多的关注，并被认为是解决上述实际挑战的一个有希望的解决方案。然而，据观察，增量学习有一个基本的困难——灾难性遗忘，即将模型适应新数据通常会导致先前任务或课程的严重性能下降。我们的研究表明，新旧数据之间的不平衡是造成这一问题的重要原因。在这项工作中，我们开发了一个用于增量学习统一分类器的新框架，例如，统一对待新旧类的分类器。具体来说，我们结合了三个组成部分，余弦归一化，少忘约束和类间分离，以减轻不平衡的不利影响。实验表明，该方法能够有效地平衡训练过程，取得了比现有方法更好的性能。在CIFAR-100和ImageNet上，在10个阶段的增量设置下，我们的方法可以将分类误差分别降低6%和13%以上。

{"title":"Learning a Unified Classifier Incrementally via Rebalancing","authors":"Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, Dahua Lin","doi":"10.1109/CVPR.2019.00092","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00092","url":null,"abstract":"Conventionally, deep neural networks are trained offline, relying on a large dataset prepared in advance. This paradigm is often challenged in real-world applications, e.g. online services that involve continuous streams of incoming data. Recently, incremental learning receives increasing attention, and is considered as a promising solution to the practical challenges mentioned above. However, it has been observed that incremental learning is subject to a fundamental difficulty -- catastrophic forgetting, namely adapting a model to new data often results in severe performance degradation on previous tasks or classes. Our study reveals that the imbalance between previous and new data is a crucial cause to this problem. In this work, we develop a new framework for incrementally learning a unified classifier, e.g. a classifier that treats both old and new classes uniformly. Specifically, we incorporate three components, cosine normalization, less-forget constraint, and inter-class separation, to mitigate the adverse effects of the imbalance. Experiments show that the proposed method can effectively rebalance the training process, thus obtaining superior performance compared to the existing methods. On CIFAR-100 and ImageNet, our method can reduce the classification errors by more than 6% and 13% respectively, under the incremental setting of 10 phases.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"51 1","pages":"831-839"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74530982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 716

Action4D: Online Action Recognition in the Crowd and Clutter Action4D:在人群和杂乱的在线动作识别

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01213

Quanzeng You, Hao Jiang

Recognizing every person's action in a crowded and cluttered environment is a challenging task in computer vision. We propose to tackle this challenging problem using a holistic 4D ``scan'' of a cluttered scene to include every detail about the people and environment. This leads to a new problem, i.e., recognizing multiple people's actions in the cluttered 4D representation. At the first step, we propose a new method to track people in 4D, which can reliably detect and follow each person in real time. Then, we build a new deep neural network, the Action4DNet, to recognize the action of each tracked person. Such a model gives reliable and accurate results in the real-world settings. We also design an adaptive 3D convolution layer and a novel discriminative temporal feature learning objective to further improve the performance of our model. Our method is invariant to camera view angles, resistant to clutter and able to handle crowd. The experimental results show that the proposed method is fast, reliable and accurate. Our method paves the way to action recognition in the real-world applications and is ready to be deployed to enable smart homes, smart factories and smart stores.

在拥挤和混乱的环境中识别每个人的行为是计算机视觉中的一项具有挑战性的任务。我们建议使用一个混乱的场景的整体4D“扫描”来解决这个具有挑战性的问题，包括关于人和环境的每一个细节。这就产生了一个新问题，即在杂乱的4D表示中识别多个人的行为。首先，我们提出了一种新的四维跟踪方法，可以实时可靠地检测和跟踪每个人。然后，我们构建了一个新的深度神经网络Action4DNet来识别每个被跟踪人的动作。这样的模型在现实环境中给出了可靠和准确的结果。我们还设计了一个自适应三维卷积层和一个新的判别时态特征学习目标，以进一步提高模型的性能。该方法不受摄像机视角的影响，具有抗杂波性和处理人群的能力。实验结果表明，该方法快速、可靠、准确。我们的方法为实际应用中的动作识别铺平了道路，并准备好部署到智能家居，智能工厂和智能商店中。

{"title":"Action4D: Online Action Recognition in the Crowd and Clutter","authors":"Quanzeng You, Hao Jiang","doi":"10.1109/CVPR.2019.01213","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01213","url":null,"abstract":"Recognizing every person's action in a crowded and cluttered environment is a challenging task in computer vision. We propose to tackle this challenging problem using a holistic 4D ``scan'' of a cluttered scene to include every detail about the people and environment. This leads to a new problem, i.e., recognizing multiple people's actions in the cluttered 4D representation. At the first step, we propose a new method to track people in 4D, which can reliably detect and follow each person in real time. Then, we build a new deep neural network, the Action4DNet, to recognize the action of each tracked person. Such a model gives reliable and accurate results in the real-world settings. We also design an adaptive 3D convolution layer and a novel discriminative temporal feature learning objective to further improve the performance of our model. Our method is invariant to camera view angles, resistant to clutter and able to handle crowd. The experimental results show that the proposed method is fast, reliable and accurate. Our method paves the way to action recognition in the real-world applications and is ready to be deployed to enable smart homes, smart factories and smart stores.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"11849-11858"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72941666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Semi-Supervised Learning With Graph Learning-Convolutional Networks 半监督学习与图学习-卷积网络

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01157

Bo Jiang, Ziyan Zhang, Doudou Lin, Jin Tang, B. Luo

Graph Convolutional Neural Networks (graph CNNs) have been widely used for graph data representation and semi-supervised learning tasks. However, existing graph CNNs generally use a fixed graph which may not be optimal for semi-supervised learning tasks. In this paper, we propose a novel Graph Learning-Convolutional Network (GLCN) for graph data representation and semi-supervised learning. The aim of GLCN is to learn an optimal graph structure that best serves graph CNNs for semi-supervised learning by integrating both graph learning and graph convolution in a unified network architecture. The main advantage is that in GLCN both given labels and the estimated labels are incorporated and thus can provide useful ‘weakly’ supervised information to refine (or learn) the graph construction and also to facilitate the graph convolution operation for unknown label estimation. Experimental results on seven benchmarks demonstrate that GLCN significantly outperforms the state-of-the-art traditional fixed structure based graph CNNs.

图卷积神经网络(Graph Convolutional Neural Networks，图cnn)被广泛用于图数据表示和半监督学习任务。然而，现有的图cnn通常使用固定的图，这对于半监督学习任务来说可能不是最优的。在本文中，我们提出了一种新颖的图学习卷积网络(GLCN)用于图数据表示和半监督学习。GLCN的目标是通过在统一的网络架构中集成图学习和图卷积，学习最优的图结构，最优地服务于图cnn进行半监督学习。GLCN的主要优点是，给定的标签和估计的标签都被合并，因此可以提供有用的“弱”监督信息来改进(或学习)图的构造，并促进未知标签估计的图卷积操作。七个基准的实验结果表明，GLCN显著优于最先进的传统基于固定结构的图cnn。

引用次数: 197

Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues Slim DensePose:从稀疏注释和运动线索中节俭地学习

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01117

N. Neverova, James Thewlis, R. Güler, Iasonas Kokkinos, A. Vedaldi

DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation cost, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. In particular, we demonstrate that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues. To explore this idea, we introduce DensePose-Track, a dataset of videos where selected frames are annotated in the traditional DensePose manner. Then, building on geometric properties of the DensePose mapping, we use the video dynamic to propagate ground-truth annotations in time as well as to learn from Siamese equivariance constraints. Having performed exhaustive empirical evaluation of various data annotation and learning strategies, we demonstrate that doing so can deliver significantly improved pose estimation results over strong baselines. However, despite what is suggested by some recent works, we show that merely synthesizing motion patterns by applying geometric transformations to isolated frames is significantly less effective, and that motion cues help much more when they are extracted from videos.

DensePose通过将图像像素密集映射到人体表面坐标来取代传统的地标探测器。然而，这种能力带来了极大的注释成本，因为监督模型需要手动标记每个姿态实例中的数百个点。因此，在这项工作中，我们寻求显着精简DensePose注释的方法，提出更有效的数据收集策略。特别是，我们证明，如果在视频帧中收集注释，它们的功效可以通过使用动作线索免费增加。为了探索这个想法，我们引入了DensePose- track，这是一个视频数据集，其中选择的帧以传统的DensePose方式进行注释。然后，基于DensePose映射的几何属性，我们使用视频动态及时传播ground-truth注释，并从Siamese等方差约束中学习。通过对各种数据标注和学习策略进行详尽的经验评估，我们证明这样做可以在强基线上显著改善姿态估计结果。然而，尽管最近的一些研究表明，仅仅通过对孤立帧应用几何变换来合成运动模式的效果明显较差，而从视频中提取运动线索的帮助更大。

{"title":"Slim DensePose: Thrifty Learning From Sparse Annotations and Motion Cues","authors":"N. Neverova, James Thewlis, R. Güler, Iasonas Kokkinos, A. Vedaldi","doi":"10.1109/CVPR.2019.01117","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01117","url":null,"abstract":"DensePose supersedes traditional landmark detectors by densely mapping image pixels to body surface coordinates. This power, however, comes at a greatly increased annotation cost, as supervising the model requires to manually label hundreds of points per pose instance. In this work, we thus seek methods to significantly slim down the DensePose annotations, proposing more efficient data collection strategies. In particular, we demonstrate that if annotations are collected in video frames, their efficacy can be multiplied for free by using motion cues. To explore this idea, we introduce DensePose-Track, a dataset of videos where selected frames are annotated in the traditional DensePose manner. Then, building on geometric properties of the DensePose mapping, we use the video dynamic to propagate ground-truth annotations in time as well as to learn from Siamese equivariance constraints. Having performed exhaustive empirical evaluation of various data annotation and learning strategies, we demonstrate that doing so can deliver significantly improved pose estimation results over strong baselines. However, despite what is suggested by some recent works, we show that merely synthesizing motion patterns by applying geometric transformations to isolated frames is significantly less effective, and that motion cues help much more when they are extracted from videos.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"28 1","pages":"10907-10915"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75493737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Face Anti-Spoofing: Model Matters, so Does Data 面对反欺骗:模型很重要，数据也很重要

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00362

Xiao Yang, Wenhan Luo, Linchao Bao, Yuan Gao, Dihong Gong, Shibao Zheng, Zhifeng Li, Wei Liu

Face anti-spoofing is an important task in full-stack face applications including face detection, verification, and recognition. Previous approaches build models on datasets which do not simulate the real-world data well (e.g., small scale, insignificant variance, etc.). Existing models may rely on auxiliary information, which prevents these anti-spoofing solutions from generalizing well in practice. In this paper, we present a data collection solution along with a data synthesis technique to simulate digital medium-based face spoofing attacks, which can easily help us obtain a large amount of training data well reflecting the real-world scenarios. Through exploiting a novel Spatio-Temporal Anti-Spoof Network (STASN), we are able to push the performance on public face anti-spoofing datasets over state-of-the-art methods by a large margin. Since the proposed model can automatically attend to discriminative regions, it makes analyzing the behaviors of the network possible.We conduct extensive experiments and show that the proposed model can distinguish spoof faces by extracting features from a variety of regions to seek out subtle evidences such as borders, moire patterns, reflection artifacts, etc.

人脸防欺骗是人脸检测、验证和识别等全栈人脸应用中的重要任务。以前的方法在不能很好地模拟真实世界数据的数据集上建立模型(例如，小规模，不显著的方差等)。现有的模型可能依赖于辅助信息，这使得这些抗欺骗解决方案在实践中不能很好地推广。在本文中，我们提出了一种数据收集解决方案以及一种数据合成技术来模拟基于数字介质的人脸欺骗攻击，这可以很容易地帮助我们获得大量反映真实场景的训练数据。通过利用一种新颖的时空反欺骗网络(STASN)，我们能够将公共面孔反欺骗数据集的性能大大提高到最先进的方法。由于该模型能够自动关注判别区域，使得分析网络的行为成为可能。我们进行了大量的实验，并表明该模型可以通过从各种区域提取特征来寻找诸如边界，云纹图案，反射伪影等细微证据来区分欺骗人脸。

{"title":"Face Anti-Spoofing: Model Matters, so Does Data","authors":"Xiao Yang, Wenhan Luo, Linchao Bao, Yuan Gao, Dihong Gong, Shibao Zheng, Zhifeng Li, Wei Liu","doi":"10.1109/CVPR.2019.00362","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00362","url":null,"abstract":"Face anti-spoofing is an important task in full-stack face applications including face detection, verification, and recognition. Previous approaches build models on datasets which do not simulate the real-world data well (e.g., small scale, insignificant variance, etc.). Existing models may rely on auxiliary information, which prevents these anti-spoofing solutions from generalizing well in practice. In this paper, we present a data collection solution along with a data synthesis technique to simulate digital medium-based face spoofing attacks, which can easily help us obtain a large amount of training data well reflecting the real-world scenarios. Through exploiting a novel Spatio-Temporal Anti-Spoof Network (STASN), we are able to push the performance on public face anti-spoofing datasets over state-of-the-art methods by a large margin. Since the proposed model can automatically attend to discriminative regions, it makes analyzing the behaviors of the network possible.We conduct extensive experiments and show that the proposed model can distinguish spoof faces by extracting features from a variety of regions to seek out subtle evidences such as borders, moire patterns, reflection artifacts, etc.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"140 1","pages":"3502-3511"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73971009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 158

Where's Wally Now? Deep Generative and Discriminative Embeddings for Novelty Detection 沃利现在在哪里?基于深度生成和判别嵌入的新颖性检测

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01177

P. Burlina, Neil J. Joshi, I-J. Wang

We develop a framework for novelty detection (ND) methods relying on deep embeddings, either discriminative or generative, and also propose a novel framework for assessing their performance. While much progress was made recently in these approaches, it has been accompanied by certain limitations: most methods were tested on relatively simple problems (low resolution images / small number of classes) or involved non-public data; comparative performance has often proven inconclusive because of lacking statistical significance; and evaluation has generally been done on non-canonical problem sets of differing complexity, making apples-to-apples comparative performance evaluation difficult. This has led to a relative confusing state of affairs. We address these challenges via the following contributions: We make a proposal for a novel framework to measure the performance of novelty detection methods using a trade-space demonstrating performance (measured by ROCAUC) as a function of problem complexity. We also make several proposals to formally characterize problem complexity. We conduct experiments with problems of higher complexity (higher image resolution / number of classes). To this end we design several canonical datasets built from CIFAR-10 and ImageNet (IN-125) which we make available to perform future benchmarks for novelty detection as well as other related tasks including semantic zero/adaptive shot and unsupervised learning. Finally, we demonstrate, as one of the methods in our ND framework, a generative novelty detection method whose performance exceeds that of all recent best-in-class generative ND methods.

我们开发了一个基于深度嵌入的新颖性检测(ND)方法框架，无论是判别式的还是生成式的，并且还提出了一个评估其性能的新框架。虽然这些方法最近取得了很大进展，但也有一定的局限性:大多数方法都是在相对简单的问题上进行测试的(低分辨率图像/少量类别)或涉及非公开数据;由于缺乏统计意义，比较绩效往往被证明是不确定的;评估通常是在不同复杂程度的非规范问题集上进行的，这使得苹果对苹果的比较性能评估变得困难。这导致了一种相对混乱的局面。我们通过以下贡献来解决这些挑战:我们提出了一个新的框架，使用贸易空间展示性能(由ROCAUC测量)作为问题复杂性的函数来衡量新颖性检测方法的性能。我们还提出了几个形式化描述问题复杂性的建议。我们针对更复杂的问题(更高的图像分辨率/类别数量)进行实验。为此，我们设计了几个由CIFAR-10和ImageNet (IN-125)构建的规范数据集，我们可以使用这些数据集来执行未来的新颖性检测基准以及其他相关任务，包括语义零/自适应射击和无监督学习。最后，作为我们的ND框架中的方法之一，我们证明了一种生成新颖性检测方法，其性能超过了最近所有同类最佳的生成ND方法。

{"title":"Where's Wally Now? Deep Generative and Discriminative Embeddings for Novelty Detection","authors":"P. Burlina, Neil J. Joshi, I-J. Wang","doi":"10.1109/CVPR.2019.01177","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01177","url":null,"abstract":"We develop a framework for novelty detection (ND) methods relying on deep embeddings, either discriminative or generative, and also propose a novel framework for assessing their performance. While much progress was made recently in these approaches, it has been accompanied by certain limitations: most methods were tested on relatively simple problems (low resolution images / small number of classes) or involved non-public data; comparative performance has often proven inconclusive because of lacking statistical significance; and evaluation has generally been done on non-canonical problem sets of differing complexity, making apples-to-apples comparative performance evaluation difficult. This has led to a relative confusing state of affairs. We address these challenges via the following contributions: We make a proposal for a novel framework to measure the performance of novelty detection methods using a trade-space demonstrating performance (measured by ROCAUC) as a function of problem complexity. We also make several proposals to formally characterize problem complexity. We conduct experiments with problems of higher complexity (higher image resolution / number of classes). To this end we design several canonical datasets built from CIFAR-10 and ImageNet (IN-125) which we make available to perform future benchmarks for novelty detection as well as other related tasks including semantic zero/adaptive shot and unsupervised learning. Finally, we demonstrate, as one of the methods in our ND framework, a generative novelty detection method whose performance exceeds that of all recent best-in-class generative ND methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"11499-11508"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74902313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Learning to Minify Photometric Stereo 学习缩小光度立体

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00775

Junxuan Li, A. Robles-Kelly, Shaodi You, Y. Matsushita

Photometric stereo estimates the surface normal given a set of images acquired under different illumination conditions. To deal with diverse factors involved in the image formation process, recent photometric stereo methods demand a large number of images as input. We propose a method that can dramatically decrease the demands on the number of images by learning the most informative ones under different illumination conditions. To this end, we use a deep learning framework to automatically learn the critical illumination conditions required at input. Furthermore, we present an occlusion layer that can synthesize cast shadows, which effectively improves the estimation accuracy. We assess our method on challenging real-world conditions, where we outperform techniques elsewhere in the literature with a significantly reduced number of light conditions.

在不同光照条件下获得的一组图像，光度立体估计表面法线。为了处理图像形成过程中涉及的各种因素，最近的光度立体方法需要大量的图像作为输入。我们提出了一种方法，通过学习不同光照条件下信息量最大的图像，可以显著降低对图像数量的需求。为此，我们使用深度学习框架来自动学习输入时所需的关键照明条件。此外，我们还提出了一种可以合成阴影的遮挡层，有效地提高了估计精度。我们在具有挑战性的现实世界条件下评估了我们的方法，在这些条件下，我们在显著减少光照条件的情况下优于文献中的其他技术。

引用次数: 53

Multispectral Imaging for Fine-Grained Recognition of Powders on Complex Backgrounds 复杂背景下粉末细粒度识别的多光谱成像

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00890

Tiancheng Zhi, B. Pires, M. Hebert, S. Narasimhan

Hundreds of materials, such as drugs, explosives, makeup, food additives, are in the form of powder. Recognizing such powders is important for security checks, criminal identification, drug control, and quality assessment. However, powder recognition has drawn little attention in the computer vision community. Powders are hard to distinguish: they are amorphous, appear matte, have little color or texture variation and blend with surfaces they are deposited on in complex ways. To address these challenges, we present the first comprehensive dataset and approach for powder recognition using multi-spectral imaging. By using Shortwave Infrared (SWIR) multi-spectral imaging together with visible light (RGB) and Near Infrared (NIR), powders can be discriminated with reasonable accuracy. We present a method to select discriminative spectral bands to significantly reduce acquisition time while improving recognition accuracy. We propose a blending model to synthesize images of powders of various thickness deposited on a wide range of surfaces. Incorporating band selection and image synthesis, we conduct fine-grained recognition of 100 powders on complex backgrounds, and achieve 60%~70% accuracy on recognition with known powder location, and over 40% mean IoU without known location.

数以百计的材料，如药品、炸药、化妆品、食品添加剂，都是以粉末的形式存在的。识别这些粉末对于安全检查、犯罪鉴定、药物控制和质量评估都很重要。然而，粉末识别在计算机视觉界却很少受到关注。粉末很难区分:它们是无定形的，看起来是哑光的，几乎没有颜色或纹理变化，并且以复杂的方式与它们沉积的表面混合。为了应对这些挑战，我们提出了第一个使用多光谱成像进行粉末识别的综合数据集和方法。利用短波红外(SWIR)多光谱成像技术，结合可见光(RGB)和近红外(NIR)，可以较好地识别粉末。提出了一种选择鉴别光谱带的方法，在显著减少采集时间的同时提高识别精度。我们提出了一种混合模型来合成沉积在各种表面上的不同厚度的粉末图像。结合波段选择和图像合成，对复杂背景下的100种粉末进行了细粒度识别，在已知粉末位置的情况下，识别准确率达到60%~70%，在未知位置的情况下，平均IoU超过40%。

{"title":"Multispectral Imaging for Fine-Grained Recognition of Powders on Complex Backgrounds","authors":"Tiancheng Zhi, B. Pires, M. Hebert, S. Narasimhan","doi":"10.1109/CVPR.2019.00890","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00890","url":null,"abstract":"Hundreds of materials, such as drugs, explosives, makeup, food additives, are in the form of powder. Recognizing such powders is important for security checks, criminal identification, drug control, and quality assessment. However, powder recognition has drawn little attention in the computer vision community. Powders are hard to distinguish: they are amorphous, appear matte, have little color or texture variation and blend with surfaces they are deposited on in complex ways. To address these challenges, we present the first comprehensive dataset and approach for powder recognition using multi-spectral imaging. By using Shortwave Infrared (SWIR) multi-spectral imaging together with visible light (RGB) and Near Infrared (NIR), powders can be discriminated with reasonable accuracy. We present a method to select discriminative spectral bands to significantly reduce acquisition time while improving recognition accuracy. We propose a blending model to synthesize images of powders of various thickness deposited on a wide range of surfaces. Incorporating band selection and image synthesis, we conduct fine-grained recognition of 100 powders on complex backgrounds, and achieve 60%~70% accuracy on recognition with known powder location, and over 40% mean IoU without known location.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"82 1","pages":"8691-8700"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80143971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Generalizable Person Re-Identification by Domain-Invariant Mapping Network 基于域不变映射网络的广义人物再识别

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00081

Jifei Song, Yongxin Yang, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales

We aim to learn a domain generalizable person re-identification (ReID) model. When such a model is trained on a set of source domains (ReID datasets collected from different camera networks), it can be directly applied to any new unseen dataset for effective ReID without any model updating. Despite its practical value in real-world deployments, generalizable ReID has seldom been studied. In this work, a novel deep ReID model termed Domain-Invariant Mapping Network (DIMN) is proposed. DIMN is designed to learn a mapping between a person image and its identity classifier, i.e., it produces a classifier using a single shot. To make the model domain-invariant, we follow a meta-learning pipeline and sample a subset of source domain training tasks during each training episode. However, the model is significantly different from conventional meta-learning methods in that: (1) no model updating is required for the target domain, (2) different training tasks share a memory bank for maintaining both scalability and discrimination ability, and (3) it can be used to match an arbitrary number of identities in a target domain. Extensive experiments on a newly proposed large-scale ReID domain generalization benchmark show that our DIMN significantly outperforms alternative domain generalization or meta-learning methods.

我们的目标是学习一个领域可泛化的人物再识别(ReID)模型。当这样的模型在一组源域(来自不同相机网络的ReID数据集)上进行训练时，它可以直接应用于任何新的未见过的数据集，而无需更新模型。尽管它在实际部署中具有实用价值，但很少对其进行研究。本文提出了一种新的深度ReID模型——域不变映射网络(Domain-Invariant Mapping Network, DIMN)。DIMN被设计用来学习人物图像与其身份分类器之间的映射，也就是说，它使用单个镜头生成分类器。为了使模型域不变，我们遵循元学习管道，并在每个训练集期间对源域训练任务的子集进行采样。然而，该模型与传统元学习方法的显著不同在于:(1)不需要对目标域进行模型更新;(2)不同的训练任务共享一个记忆库，以保持可扩展性和区分能力;(3)它可以用于匹配目标域中任意数量的身份。在新提出的大规模ReID域泛化基准上的大量实验表明，我们的DIMN显著优于其他域泛化或元学习方法。

{"title":"Generalizable Person Re-Identification by Domain-Invariant Mapping Network","authors":"Jifei Song, Yongxin Yang, Yi-Zhe Song, T. Xiang, Timothy M. Hospedales","doi":"10.1109/CVPR.2019.00081","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00081","url":null,"abstract":"We aim to learn a domain generalizable person re-identification (ReID) model. When such a model is trained on a set of source domains (ReID datasets collected from different camera networks), it can be directly applied to any new unseen dataset for effective ReID without any model updating. Despite its practical value in real-world deployments, generalizable ReID has seldom been studied. In this work, a novel deep ReID model termed Domain-Invariant Mapping Network (DIMN) is proposed. DIMN is designed to learn a mapping between a person image and its identity classifier, i.e., it produces a classifier using a single shot. To make the model domain-invariant, we follow a meta-learning pipeline and sample a subset of source domain training tasks during each training episode. However, the model is significantly different from conventional meta-learning methods in that: (1) no model updating is required for the target domain, (2) different training tasks share a memory bank for maintaining both scalability and discrimination ability, and (3) it can be used to match an arbitrary number of identities in a target domain. Extensive experiments on a newly proposed large-scale ReID domain generalization benchmark show that our DIMN significantly outperforms alternative domain generalization or meta-learning methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"719-728"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81829094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 179