2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition 球面分形卷积神经网络在点云识别中的应用

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00054

Yongming Rao, Jiwen Lu, Jie Zhou

We present a generic, flexible and 3D rotation invariant framework based on spherical symmetry for point cloud recognition. By introducing regular icosahedral lattice and its fractals to approximate and discretize sphere, convolution can be easily implemented to process 3D points. Based on the fractal structure, a hierarchical feature learning framework together with an adaptive sphere projection module is proposed to learn deep feature in an end-to-end manner. Our framework not only inherits the strong representation power and generalization capability from convolutional neural networks for image recognition, but also extends CNN to learn robust feature resistant to rotations and perturbations. The proposed model is effective yet robust. Comprehensive experimental study demonstrates that our approach can achieve competitive performance compared to state-of-the-art techniques on both 3D object classification and part segmentation tasks, meanwhile, outperform other rotation invariant models on rotated 3D object classification and retrieval tasks by a large margin.

提出了一种通用的、灵活的、基于球对称的三维旋转不变性点云识别框架。通过引入规则的二十面体晶格及其分形来逼近和离散球体，可以方便地实现对三维点的卷积处理。基于分形结构，提出了一种分层特征学习框架和自适应球面投影模块，实现了端到端的深度特征学习。我们的框架不仅继承了卷积神经网络用于图像识别的强大表示能力和泛化能力，而且扩展了CNN学习抗旋转和扰动的鲁棒特征。该模型具有良好的鲁棒性和有效性。综合实验研究表明，我们的方法在三维物体分类和零件分割任务上都可以取得与现有技术相比具有竞争力的性能，同时在旋转三维物体分类和检索任务上也大大优于其他旋转不变量模型。

引用次数: 122

Context-Aware Spatio-Recurrent Curvilinear Structure Segmentation 上下文感知的空间循环曲线结构分割

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01293

Feigege Wang, Yue Gu, Wenxi Liu, Yuanlong Yu, Shengfeng He, Jianxiong Pan

Curvilinear structures are frequently observed in various images in different forms, such as blood vessels or neuronal boundaries in biomedical images. In this paper, we propose a novel curvilinear structure segmentation approach using context-aware spatio-recurrent networks. Instead of directly segmenting the whole image or densely segmenting fixed-sized local patches, our method recurrently samples patches with varied scales from the target image with learned policy and processes them locally, which is similar to the behavior of changing retinal fixations in the human visual system and it is beneficial for capturing the multi-scale or hierarchical modality of the complex curvilinear structures. In specific, the policy of choosing local patches is attentively learned based on the contextual information of the image and the historical sampling experience. In this way, with more patches sampled and refined, the segmentation of the whole image can be progressively improved. To validate our approach, comparison experiments on different types of image data are conducted and the sampling procedures for exemplar images are illustrated. We demonstrate that our method achieves the state-of-the-art performance in public datasets.

曲线结构经常以不同形式出现在各种图像中，如生物医学图像中的血管或神经元边界。本文提出了一种基于上下文感知的空间循环网络的曲线结构分割方法。该方法不是直接分割整幅图像或密集分割固定大小的局部小块，而是利用学习策略从目标图像中循环抽取不同尺度的小块并进行局部处理，这类似于人类视觉系统中视网膜注视变化的行为，有利于捕获复杂曲线结构的多尺度或分层模态。具体而言，基于图像的上下文信息和历史采样经验，仔细学习局部补丁的选择策略。这样，随着对更多的patch进行采样和细化，可以逐步提高整个图像的分割效果。为了验证我们的方法，对不同类型的图像数据进行了比较实验，并举例说明了样本图像的采样过程。我们证明了我们的方法在公共数据集中达到了最先进的性能。

{"title":"Context-Aware Spatio-Recurrent Curvilinear Structure Segmentation","authors":"Feigege Wang, Yue Gu, Wenxi Liu, Yuanlong Yu, Shengfeng He, Jianxiong Pan","doi":"10.1109/CVPR.2019.01293","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01293","url":null,"abstract":"Curvilinear structures are frequently observed in various images in different forms, such as blood vessels or neuronal boundaries in biomedical images. In this paper, we propose a novel curvilinear structure segmentation approach using context-aware spatio-recurrent networks. Instead of directly segmenting the whole image or densely segmenting fixed-sized local patches, our method recurrently samples patches with varied scales from the target image with learned policy and processes them locally, which is similar to the behavior of changing retinal fixations in the human visual system and it is beneficial for capturing the multi-scale or hierarchical modality of the complex curvilinear structures. In specific, the policy of choosing local patches is attentively learned based on the contextual information of the image and the historical sampling experience. In this way, with more patches sampled and refined, the segmentation of the whole image can be progressively improved. To validate our approach, comparison experiments on different types of image data are conducted and the sampling procedures for exemplar images are illustrated. We demonstrate that our method achieves the state-of-the-art performance in public datasets.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"62 1","pages":"12640-12649"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84292117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Acoustic Non-Line-Of-Sight Imaging 声学非视距成像

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00694

David B. Lindell, Gordon Wetzstein, V. Koltun

Non-line-of-sight (NLOS) imaging enables unprecedented capabilities in a wide range of applications, including robotic and machine vision, remote sensing, autonomous vehicle navigation, and medical imaging. Recent approaches to solving this challenging problem employ optical time-of-flight imaging systems with highly sensitive time-resolved photodetectors and ultra-fast pulsed lasers. However, despite recent successes in NLOS imaging using these systems, widespread implementation and adoption of the technology remains a challenge because of the requirement for specialized, expensive hardware. We introduce acoustic NLOS imaging, which is orders of magnitude less expensive than most optical systems and captures hidden 3D geometry at longer ranges with shorter acquisition times compared to state-of-the-art optical methods. Inspired by hardware setups used in radar and algorithmic approaches to model and invert wave-based image formation models developed in the seismic imaging community, we demonstrate a new approach to seeing around corners.

非视距(NLOS)成像在广泛的应用中实现了前所未有的能力，包括机器人和机器视觉、遥感、自动车辆导航和医学成像。最近解决这一具有挑战性的问题的方法是使用光学飞行时间成像系统，该系统具有高灵敏度的时间分辨光电探测器和超快脉冲激光器。然而，尽管最近在使用这些系统的NLOS成像方面取得了成功，但由于需要专门的、昂贵的硬件，该技术的广泛实施和采用仍然是一个挑战。我们介绍了声学NLOS成像，它比大多数光学系统便宜几个数量级，与最先进的光学方法相比，它可以在更长的范围内以更短的采集时间捕获隐藏的3D几何形状。受地震成像社区中用于建模和反演基于波的成像模型的雷达和算法方法的硬件设置的启发，我们展示了一种观察角落的新方法。

引用次数: 74

Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions 道路规则:用语义交互的卷积模型预测驾驶行为

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00865

Joey Hong, Benjamin Sapp, James Philbin

We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has approached this problem via low-level signals to predict short time horizons, and has not addressed how to leverage key assets relied upon heavily by industry self-driving systems: (1) large 3D perception efforts which provide highly accurate 3D states of agents with rich attributes, and (2) detailed and accurate semantic maps of the environment (lanes, traffic lights, crosswalks, etc). We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. This enables learning entity-entity and entity-environment interactions with simple, feed-forward computations in each timestep within an overall temporal model of an agent's behavior. We propose different ways of modelling the future as a {em distribution} over future states using standard supervised learning. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior.

我们专注于在复杂的现实驾驶场景中预测实体未来状态的问题。之前的研究已经通过低水平信号来预测短时间范围来解决这个问题，并且没有解决如何利用工业自动驾驶系统严重依赖的关键资产:(1)大型3D感知工作，提供具有丰富属性的智能体的高精度3D状态，以及(2)详细而准确的环境语义地图(车道，交通灯，人行横道等)。我们提出了一个统一的表示，在空间网格中编码这些高级语义信息，允许使用深度卷积模型融合复杂的场景上下文。这使得学习实体-实体和实体-环境的相互作用能够在代理行为的整体时间模型内的每个时间步中进行简单的前馈计算。我们提出了使用标准监督学习将未来建模为未来状态的{em分布}的不同方法。我们引入了一个新的数据集，提供了工业级丰富的感知和语义输入，并通过经验证明我们可以有效地学习驾驶行为的基础知识。

{"title":"Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions","authors":"Joey Hong, Benjamin Sapp, James Philbin","doi":"10.1109/CVPR.2019.00865","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00865","url":null,"abstract":"We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has approached this problem via low-level signals to predict short time horizons, and has not addressed how to leverage key assets relied upon heavily by industry self-driving systems: (1) large 3D perception efforts which provide highly accurate 3D states of agents with rich attributes, and (2) detailed and accurate semantic maps of the environment (lanes, traffic lights, crosswalks, etc). We present a unified representation which encodes such high-level semantic information in a spatial grid, allowing the use of deep convolutional models to fuse complex scene context. This enables learning entity-entity and entity-environment interactions with simple, feed-forward computations in each timestep within an overall temporal model of an agent's behavior. We propose different ways of modelling the future as a {em distribution} over future states using standard supervised learning. We introduce a novel dataset providing industry-grade rich perception and semantic inputs, and empirically show we can effectively learn fundamentals of driving behavior.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"8446-8454"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86856795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 195

Video Magnification in the Wild Using Fractional Anisotropy in Temporal Distribution 在时间分布中使用分数各向异性的野生视频放大

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00171

Shoichiro Takeda, Yasunori Akagi, Kazuki Okami, M. Isogai, H. Kimata

Video magnification methods can magnify and reveal subtle changes invisible to the naked eye. However, in such subtle changes, meaningful ones caused by physical and natural phenomena are mixed with non-meaningful ones caused by photographic noise. Therefore, current methods often produce noisy and misleading magnification outputs due to the non-meaningful subtle changes. For detecting only meaningful subtle changes, several methods have been proposed but require human manipulations, additional resources, or input video scene limitations. In this paper, we present a novel method using fractional anisotropy (FA) to detect only meaningful subtle changes without the aforementioned requirements. FA has been used in neuroscience to evaluate anisotropic diffusion of water molecules in the body. On the basis of our observation that temporal distribution of meaningful subtle changes more clearly indicates anisotropic diffusion than that of non-meaningful ones, we used FA to design a fractional anisotropic filter that passes only meaningful subtle changes. Using the filter enables our method to obtain better and more impressive magnification results than those obtained with state-of-the-art methods.

视频放大方法可以放大和显示肉眼看不见的细微变化。然而，在这种微妙的变化中，物理和自然现象所引起的有意义的变化与摄影噪声所引起的无意义的变化混合在一起。因此，由于无意义的细微变化，目前的方法经常产生嘈杂和误导性的放大输出。为了仅检测有意义的细微变化，已经提出了几种方法，但需要人工操作，额外的资源或输入视频场景的限制。在本文中，我们提出了一种新的方法，利用分数各向异性(FA)来检测有意义的细微变化，而不需要上述要求。FA已在神经科学中用于评价水分子在体内的各向异性扩散。在我们观察到有意义细微变化的时间分布比无意义细微变化的时间分布更清楚地表明各向异性扩散的基础上，我们使用FA设计了一个分数型各向异性滤波器，该滤波器只通过有意义的细微变化。使用过滤器使我们的方法获得比使用最先进的方法获得的更好和更令人印象深刻的放大结果。

{"title":"Video Magnification in the Wild Using Fractional Anisotropy in Temporal Distribution","authors":"Shoichiro Takeda, Yasunori Akagi, Kazuki Okami, M. Isogai, H. Kimata","doi":"10.1109/CVPR.2019.00171","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00171","url":null,"abstract":"Video magnification methods can magnify and reveal subtle changes invisible to the naked eye. However, in such subtle changes, meaningful ones caused by physical and natural phenomena are mixed with non-meaningful ones caused by photographic noise. Therefore, current methods often produce noisy and misleading magnification outputs due to the non-meaningful subtle changes. For detecting only meaningful subtle changes, several methods have been proposed but require human manipulations, additional resources, or input video scene limitations. In this paper, we present a novel method using fractional anisotropy (FA) to detect only meaningful subtle changes without the aforementioned requirements. FA has been used in neuroscience to evaluate anisotropic diffusion of water molecules in the body. On the basis of our observation that temporal distribution of meaningful subtle changes more clearly indicates anisotropic diffusion than that of non-meaningful ones, we used FA to design a fractional anisotropic filter that passes only meaningful subtle changes. Using the filter enables our method to obtain better and more impressive magnification results than those obtained with state-of-the-art methods.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"1614-1622"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85775420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification 基于度量融合的目标检索和人物再识别排序

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00083

S. Bai, Peng Tang, Philip H. S. Torr, Longin Jan Latecki

This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities). While the re-ranking step is involved by running a diffusion process on the underlying data manifolds, the fusion step can leverage the complementarity of multiple metrics. We give a comprehensive summary of existing fusion with diffusion strategies, and systematically analyze their pros and cons. Based on the analysis, we propose a unified yet robust algorithm which inherits their advantages and discards their disadvantages. Hence, we call it Unified Ensemble Diffusion (UED). More interestingly, we derive that the inherited properties indeed stem from a theoretical framework, where the relevant works can be elegantly summarized as special cases of UED by imposing additional constraints on the objective function and varying the solver of similarity propagation. Extensive experiments with 3D shape retrieval, image retrieval and person re-identification demonstrate that the proposed framework outperforms the state of the arts, and at the same time suggest that re-ranking via metric fusion is a promising tool to further improve the retrieval performance of existing algorithms.

这项工作研究了对象检索和人员再识别的无监督重新排序过程，并特别关注多个度量(或相似度)的集合。虽然重新排序步骤是通过在底层数据流形上运行扩散过程来完成的，但融合步骤可以利用多个指标的互补性。综合总结了现有的融合扩散策略，系统分析了它们的优缺点，在此基础上提出了一种统一的鲁棒算法，既继承了它们的优点，又去除了它们的缺点。因此，我们称之为统一集成扩散(UED)。更有趣的是，我们得出继承属性确实源于一个理论框架，其中相关工作可以通过对目标函数施加额外约束和改变相似性传播的求解器来优雅地总结为UED的特殊情况。在三维形状检索、图像检索和人物再识别方面的大量实验表明，所提出的框架优于目前的技术水平，同时表明，通过度量融合进行重新排序是一种有前途的工具，可以进一步提高现有算法的检索性能。

{"title":"Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification","authors":"S. Bai, Peng Tang, Philip H. S. Torr, Longin Jan Latecki","doi":"10.1109/CVPR.2019.00083","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00083","url":null,"abstract":"This work studies the unsupervised re-ranking procedure for object retrieval and person re-identification with a specific concentration on an ensemble of multiple metrics (or similarities). While the re-ranking step is involved by running a diffusion process on the underlying data manifolds, the fusion step can leverage the complementarity of multiple metrics. We give a comprehensive summary of existing fusion with diffusion strategies, and systematically analyze their pros and cons. Based on the analysis, we propose a unified yet robust algorithm which inherits their advantages and discards their disadvantages. Hence, we call it Unified Ensemble Diffusion (UED). More interestingly, we derive that the inherited properties indeed stem from a theoretical framework, where the relevant works can be elegantly summarized as special cases of UED by imposing additional constraints on the objective function and varying the solver of similarity propagation. Extensive experiments with 3D shape retrieval, image retrieval and person re-identification demonstrate that the proposed framework outperforms the state of the arts, and at the same time suggest that re-ranking via metric fusion is a promising tool to further improve the retrieval performance of existing algorithms.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"73 1","pages":"740-749"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86352676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios driingstereo:用于自动驾驶场景立体匹配的大规模数据集

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00099

Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, Bolei Zhou

Great progress has been made on estimating disparity maps from stereo images. However, with the limited stereo data available in the existing datasets and unstable ranging precision of current stereo methods, industry-level stereo matching in autonomous driving remains challenging. In this paper, we construct a novel large-scale stereo dataset named DrivingStereo. It contains over 180k images covering a diverse set of driving scenarios, which is hundreds of times larger than the KITTI Stereo dataset. High-quality labels of disparity are produced by a model-guided filtering strategy from multi-frame LiDAR points. For better evaluations, we present two new metrics for stereo matching in the driving scenes, i.e. a distance-aware metric and a semantic-aware metric. Extensive experiments show that compared with the models trained on FlyingThings3D or Cityscapes, the models trained on our DrivingStereo achieve higher generalization accuracy in real-world driving scenes, while the proposed metrics better evaluate the stereo methods on all-range distances and across different classes. Our dataset and code are available at https://drivingstereo-dataset.github.io.

在从立体图像估计视差图方面取得了很大进展。然而，由于现有数据集中可获得的立体数据有限，且现有立体方法的测距精度不稳定，自动驾驶中工业级立体匹配仍然具有挑战性。在本文中，我们构建了一个新的大规模立体数据集DrivingStereo。它包含超过18万张涵盖各种驾驶场景的图像，比KITTI Stereo数据集大数百倍。采用模型导向滤波策略，从多帧激光雷达点生成高质量的视差标签。为了更好地评估，我们提出了两个新的驾驶场景立体匹配度量，即距离感知度量和语义感知度量。大量的实验表明，与在FlyingThings3D或cityscape上训练的模型相比，在我们的DrivingStereo上训练的模型在真实驾驶场景中获得了更高的泛化精度，而我们提出的指标在全距离和不同类别上更好地评估了立体方法。我们的数据集和代码可在https://drivingstereo-dataset.github.io上获得。

{"title":"DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios","authors":"Guorun Yang, Xiao Song, Chaoqin Huang, Zhidong Deng, Jianping Shi, Bolei Zhou","doi":"10.1109/CVPR.2019.00099","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00099","url":null,"abstract":"Great progress has been made on estimating disparity maps from stereo images. However, with the limited stereo data available in the existing datasets and unstable ranging precision of current stereo methods, industry-level stereo matching in autonomous driving remains challenging. In this paper, we construct a novel large-scale stereo dataset named DrivingStereo. It contains over 180k images covering a diverse set of driving scenarios, which is hundreds of times larger than the KITTI Stereo dataset. High-quality labels of disparity are produced by a model-guided filtering strategy from multi-frame LiDAR points. For better evaluations, we present two new metrics for stereo matching in the driving scenes, i.e. a distance-aware metric and a semantic-aware metric. Extensive experiments show that compared with the models trained on FlyingThings3D or Cityscapes, the models trained on our DrivingStereo achieve higher generalization accuracy in real-world driving scenes, while the proposed metrics better evaluate the stereo methods on all-range distances and across different classes. Our dataset and code are available at https://drivingstereo-dataset.github.io.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"34 1","pages":"899-908"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85554546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 112

Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation 远距离监督质心移位:一种简单有效的视觉域自适应方法

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00309

Jian Liang, R. He, Zhenan Sun, T. Tan

Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks.

传统的领域自适应方法通常采用深度神经网络或子空间学习来寻找跨领域的不变表示。然而，大多数深度学习方法高度依赖于大尺度的源域，并且训练的计算成本很高，而子空间学习方法总是具有二次型的时间复杂度，并且受到大尺度域的影响。本文提供了一种简单有效的解决方案，可作为领域自适应任务的一个性能良好的基线。我们的方法建立在最近的质心分类器上，寻找目标域的质心与源域的质心适度偏移的子空间。具体来说，我们设计了一个不访问源域数据的统一目标，并采用交替最小化方案迭代发现伪目标标签、不变子空间和目标质心。该算法除了具有隐私保护特性(远程监督)外，还具有可证明的收敛性和良好的线性时间复杂度。此外，该方法可以很容易地扩展到多源设置和领域泛化，并且通过借鉴学习到的可转移特征，显著增强了常用的深度自适应方法。在包括对象、数字和人脸识别数据集在内的几个基准上进行的广泛实验验证了我们的方法在各种领域适应任务中产生了最先进的结果。

{"title":"Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation","authors":"Jian Liang, R. He, Zhenan Sun, T. Tan","doi":"10.1109/CVPR.2019.00309","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00309","url":null,"abstract":"Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"2970-2979"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73073990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Co-Saliency Detection via Mask-Guided Fully Convolutional Networks With Multi-Scale Label Smoothing 基于多尺度标签平滑的掩模引导全卷积网络的协显著性检测

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00321

Kaihua Zhang, Tengpeng Li, Bo Liu, Qingshan Liu

In image co-saliency detection problem, one critical issue is how to model the concurrent pattern of the co-salient parts, which appears both within each image and across all the relevant images. In this paper, we propose a hierarchical image co-saliency detection framework as a coarse to fine strategy to capture this pattern. We first propose a mask-guided fully convolutional network structure to generate the initial co-saliency detection result. The mask is used for background removal and it is learned from the high-level feature response maps of the pre-trained VGG-net output. We next propose a multi-scale label smoothing model to further refine the detection result. The proposed model jointly optimizes the label smoothness of pixels and superpixels. Experiment results on three popular image co-saliency detection benchmark datasets including iCoseg, MSRC and Cosal2015 demonstrate the remarkable performance compared with the state-of-the-art methods.

在图像共显著性检测问题中，一个关键问题是如何对每个图像和所有相关图像中出现的共显著部分的并发模式进行建模。在本文中，我们提出了一种分层图像共显著性检测框架，作为一种从粗到细的策略来捕获这种模式。我们首先提出了一个掩模引导的全卷积网络结构来生成初始的共显著性检测结果。掩码用于背景去除，它是从预训练的VGG-net输出的高级特征响应图中学习的。接下来，我们提出了一个多尺度标签平滑模型来进一步细化检测结果。该模型对像素和超像素的标签平滑度进行了联合优化。在iCoseg、MSRC和Cosal2015三个流行的图像共显著性检测基准数据集上的实验结果表明，与现有方法相比，该方法具有显著的性能。

引用次数: 70

Robustness Verification of Classification Deep Neural Networks via Linear Programming 基于线性规划的分类深度神经网络鲁棒性验证

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01168

Wang Lin, Zhengfeng Yang, Xin Chen, Qingye Zhao, Xiangkun Li, Zhiming Liu, Jifeng He

There is a pressing need to verify robustness of classification deep neural networks (CDNNs) as they are embedded in many safety-critical applications. Existing robustness verification approaches rely on computing the over-approximation of the output set, and can hardly scale up to practical CDNNs, as the result of error accumulation accompanied with approximation. In this paper, we develop a novel method for robustness verification of CDNNs with sigmoid activation functions. It converts the robustness verification problem into an equivalent problem of inspecting the most suspected point in the input region which constitutes a nonlinear optimization problem. To make it amenable, by relaxing the nonlinear constraints into the linear inclusions, it is further refined as a linear programming problem. We conduct comparison experiments on a few CDNNs trained for classifying images in some state-of-the-art benchmarks, showing our advantages of precision and scalability that enable effective verification of practical CDNNs.

由于分类深度神经网络(CDNNs)嵌入到许多安全关键应用中，因此迫切需要验证其鲁棒性。现有的鲁棒性验证方法依赖于计算输出集的过近似值，由于误差积累伴随着近似值，很难扩展到实际的cdn。在本文中，我们开发了一种新的方法来验证具有s型激活函数的cdn的鲁棒性。它将鲁棒性验证问题转化为检查输入区域中最可疑点的等效问题，构成非线性优化问题。通过将非线性约束放宽为线性包含，进一步细化为线性规划问题。我们在一些最先进的基准测试中对一些训练用于图像分类的cdn进行了比较实验，显示了我们的精度和可扩展性优势，能够有效验证实际的cdn。

引用次数: 34

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀