首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization 擦除集成学习:一种简单而有效的弱监督对象定位方法
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00879
Jinjie Mai, Meng Yang, Wenfeng Luo
Weakly supervised object localization (WSOL) aims to localize object with only weak supervision like image-level labels. However, a long-standing problem for available techniques based on the classification network is that they often result in highlighting the most discriminative parts rather than the entire extent of object. Nevertheless, trying to explore the integral extent of the object could degrade the performance of image classification on the contrary. To remedy this, we propose a simple yet powerful approach by introducing a novel adversarial erasing technique, erasing integrated learning (EIL). By integrating discriminative region mining and adversarial erasing in a single forward-backward propagation in a vanilla CNN, the proposed EIL explores the high response class-specific area and the less discriminative region simultaneously, thus could maintain high performance in classification and jointly discover the full extent of the object. Furthermore, we apply multiple EIL (MEIL) modules at different levels of the network in a sequential manner, which for the first time integrates semantic features of multiple levels and multiple scales through adversarial erasing learning. In particular, the proposed EIL and advanced MEIL both achieve a new state-of-the-art performance in CUB-200-2011 and ILSVRC 2016 benchmark, making significant improvement in localization while advancing high performance in image classification.
弱监督对象定位(WSOL)的目的是在图像级标签等弱监督的情况下对对象进行定位。然而,基于分类网络的现有技术存在一个长期存在的问题,即它们往往会突出最具区别性的部分,而不是对象的整个范围。然而,试图探索目标的整体程度反而会降低图像分类的性能。为了解决这个问题,我们提出了一种简单而强大的方法,即引入一种新的对抗性擦除技术,即擦除集成学习(EIL)。本文提出的EIL通过将判别区域挖掘和对抗性擦除集成到vanilla CNN的单次前向向后传播中,同时探索高响应类特异性区域和低判别性区域,从而保持较高的分类性能,并共同发现目标的全面性。此外,我们在网络的不同层次上以顺序的方式应用了多个EIL (MEIL)模块,首次通过对抗性擦除学习集成了多层次、多尺度的语义特征。特别是,所提出的EIL和先进的EIL在CUB-200-2011和ILSVRC 2016基准测试中都达到了新的最先进的性能,在提升图像分类高性能的同时,在定位方面取得了显著的进步。
{"title":"Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization","authors":"Jinjie Mai, Meng Yang, Wenfeng Luo","doi":"10.1109/cvpr42600.2020.00879","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00879","url":null,"abstract":"Weakly supervised object localization (WSOL) aims to localize object with only weak supervision like image-level labels. However, a long-standing problem for available techniques based on the classification network is that they often result in highlighting the most discriminative parts rather than the entire extent of object. Nevertheless, trying to explore the integral extent of the object could degrade the performance of image classification on the contrary. To remedy this, we propose a simple yet powerful approach by introducing a novel adversarial erasing technique, erasing integrated learning (EIL). By integrating discriminative region mining and adversarial erasing in a single forward-backward propagation in a vanilla CNN, the proposed EIL explores the high response class-specific area and the less discriminative region simultaneously, thus could maintain high performance in classification and jointly discover the full extent of the object. Furthermore, we apply multiple EIL (MEIL) modules at different levels of the network in a sequential manner, which for the first time integrates semantic features of multiple levels and multiple scales through adversarial erasing learning. In particular, the proposed EIL and advanced MEIL both achieve a new state-of-the-art performance in CUB-200-2011 and ILSVRC 2016 benchmark, making significant improvement in localization while advancing high performance in image classification.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1 1","pages":"8763-8772"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83619856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis 云中的卷积:学习三维图卷积网络中用于点云分析的可变形核
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00187
Zhi-Hao Lin, S. Huang, Y. Wang
Point clouds are among the popular geometry representations for 3D vision applications. However, without regular structures like 2D images, processing and summarizing information over these unordered data points are very challenging. Although a number of previous works attempt to analyze point clouds and achieve promising performances, their performances would degrade significantly when data variations like shift and scale changes are presented. In this paper, we propose 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced. The novelty of our 3D-GCN lies in the definition of learnable kernels with a graph max-pooling mechanism. We show that 3D-GCN can be applied to 3D classification and segmentation tasks, with ablation studies and visualizations verifying the design of 3D-GCN.
点云是3D视觉应用中流行的几何表示之一。然而,如果没有像2D图像这样的规则结构,处理和总结这些无序数据点的信息是非常具有挑战性的。虽然之前的一些工作试图分析点云并取得了不错的性能,但当出现位移和尺度变化等数据变化时,它们的性能会显著下降。在本文中,我们提出了3D图卷积网络(3D- gcn),该网络旨在跨尺度提取点云的局部3D特征,同时引入了平移和尺度不变性特性。我们的3D-GCN的新颖之处在于用图最大池化机制定义了可学习的核。我们证明了3D- gcn可以应用于3D分类和分割任务,并通过消融研究和可视化验证了3D- gcn的设计。
{"title":"Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis","authors":"Zhi-Hao Lin, S. Huang, Y. Wang","doi":"10.1109/cvpr42600.2020.00187","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00187","url":null,"abstract":"Point clouds are among the popular geometry representations for 3D vision applications. However, without regular structures like 2D images, processing and summarizing information over these unordered data points are very challenging. Although a number of previous works attempt to analyze point clouds and achieve promising performances, their performances would degrade significantly when data variations like shift and scale changes are presented. In this paper, we propose 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced. The novelty of our 3D-GCN lies in the definition of learnable kernels with a graph max-pooling mechanism. We show that 3D-GCN can be applied to 3D classification and segmentation tasks, with ablation studies and visualizations verifying the design of 3D-GCN.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"1797-1806"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76155480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 128
Self-Robust 3D Point Recognition via Gather-Vector Guidance 基于集合向量制导的自鲁棒3D点识别
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01153
Xiaoyi Dong, Dongdong Chen, Hang Zhou, G. Hua, Weiming Zhang, Nenghai Yu
In this paper, we look into the problem of 3D adversary attack, and propose to leverage the internal properties of the point clouds and the adversarial examples to design a new self-robust deep neural network (DNN) based 3D recognition systems. As a matter of fact, on one hand, point clouds are highly structured. Hence for each local part of clean point clouds, it is possible to learn what is it (``part of a bottle") and its relative position (``upper part of a bottle") to the global object center. On the other hand, with the visual quality constraint, 3D adversarial samples often only produce small local perturbations, thus they will roughly keep the original global center but may cause incorrect local relative position estimation. Motivated by these two properties, we use relative position (dubbed as ``gather-vector") as the adversarial indicator and propose a new robust gather module. Equipped with this module, we further propose a new self-robust 3D point recognition network. Through extensive experiments, we demonstrate that the proposed method can improve the robustness of the target attack under the white-box setting significantly. For I-FGSM based attack, our method reduces the attack success rate from 94.37 % to 75.69 %. For C&W based attack, our method reduces the attack success rate more than 40.00 %. Moreover, our method is complementary to other types of defense methods to achieve better defense results.
本文研究了三维对手攻击问题,提出利用点云和对抗实例的内部特性来设计一种新的基于自鲁棒深度神经网络(DNN)的三维识别系统。事实上,一方面,点云是高度结构化的。因此,对于干净点云的每个局部部分,可以了解它是什么(“瓶子的一部分”)及其相对于全局对象中心的位置(“瓶子的上部”)。另一方面,在视觉质量约束下,三维对抗样本往往只产生较小的局部扰动,因此会大致保持原始全局中心,但可能导致局部相对位置估计不正确。在这两个特性的激励下,我们使用相对位置(称为“收集向量”)作为对抗指标,并提出了一个新的鲁棒收集模块。在此基础上,我们进一步提出了一种新的自鲁棒三维点识别网络。通过大量的实验,我们证明了该方法可以显著提高白盒设置下目标攻击的鲁棒性。对于基于I-FGSM的攻击,我们的方法将攻击成功率从94.37%降低到75.69%。对于基于C&W的攻击,我们的方法将攻击成功率降低了40.00%以上。此外,我们的方法与其他类型的防御方法相辅相成,以达到更好的防御效果。
{"title":"Self-Robust 3D Point Recognition via Gather-Vector Guidance","authors":"Xiaoyi Dong, Dongdong Chen, Hang Zhou, G. Hua, Weiming Zhang, Nenghai Yu","doi":"10.1109/cvpr42600.2020.01153","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01153","url":null,"abstract":"In this paper, we look into the problem of 3D adversary attack, and propose to leverage the internal properties of the point clouds and the adversarial examples to design a new self-robust deep neural network (DNN) based 3D recognition systems. As a matter of fact, on one hand, point clouds are highly structured. Hence for each local part of clean point clouds, it is possible to learn what is it (``part of a bottle\") and its relative position (``upper part of a bottle\") to the global object center. On the other hand, with the visual quality constraint, 3D adversarial samples often only produce small local perturbations, thus they will roughly keep the original global center but may cause incorrect local relative position estimation. Motivated by these two properties, we use relative position (dubbed as ``gather-vector\") as the adversarial indicator and propose a new robust gather module. Equipped with this module, we further propose a new self-robust 3D point recognition network. Through extensive experiments, we demonstrate that the proposed method can improve the robustness of the target attack under the white-box setting significantly. For I-FGSM based attack, our method reduces the attack success rate from 94.37 % to 75.69 %. For C&W based attack, our method reduces the attack success rate more than 40.00 %. Moreover, our method is complementary to other types of defense methods to achieve better defense results.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"118 2 1","pages":"11513-11521"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76183257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians 时间-上下文增强的严重闭塞行人检测
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01344
Jialian Wu, Chunluan Zhou, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan
State-of-the-art pedestrian detectors have performed promisingly on non-occluded pedestrians, yet they are still confronted by heavy occlusions. Although many previous works have attempted to alleviate the pedestrian occlusion issue, most of them rest on still images. In this paper, we exploit the local temporal context of pedestrians in videos and propose a tube feature aggregation network (TFAN) aiming at enhancing pedestrian detectors against severe occlusions. Specifically, for an occluded pedestrian in the current frame, we iteratively search for its relevant counterparts along temporal axis to form a tube. Then, features from the tube are aggregated according to an adaptive weight to enhance the feature representations of the occluded pedestrian. Furthermore, we devise a temporally discriminative embedding module (TDEM) and a part-based relation module (PRM), respectively, which adapts our approach to better handle tube drifting and heavy occlusions. Extensive experiments are conducted on three datasets, Caltech, NightOwls and KAIST, showing that our proposed method is significantly effective for heavily occluded pedestrian detection. Moreover, we achieve the state-of-the-art performance on the Caltech and NightOwls datasets.
最先进的行人检测器在无遮挡的行人上表现良好,但它们仍然面临着严重的遮挡。虽然以前的许多作品都试图缓解行人遮挡问题,但大多数都停留在静止图像上。在本文中,我们利用视频中行人的局部时间背景,提出了一种管道特征聚合网络(TFAN),旨在增强行人检测器对严重遮挡的识别能力。具体来说,对于当前帧中被遮挡的行人,我们沿着时间轴迭代搜索其对应的行人,形成一个管。然后,根据自适应权重聚合来自管道的特征,以增强遮挡行人的特征表示。此外,我们设计了一个时间判别嵌入模块(TDEM)和一个基于部件的关系模块(PRM),使我们的方法能够更好地处理管道漂移和严重闭塞。在Caltech, NightOwls和KAIST三个数据集上进行了大量实验,结果表明我们提出的方法对于严重遮挡的行人检测非常有效。此外,我们在Caltech和NightOwls数据集上实现了最先进的性能。
{"title":"Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians","authors":"Jialian Wu, Chunluan Zhou, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan","doi":"10.1109/cvpr42600.2020.01344","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01344","url":null,"abstract":"State-of-the-art pedestrian detectors have performed promisingly on non-occluded pedestrians, yet they are still confronted by heavy occlusions. Although many previous works have attempted to alleviate the pedestrian occlusion issue, most of them rest on still images. In this paper, we exploit the local temporal context of pedestrians in videos and propose a tube feature aggregation network (TFAN) aiming at enhancing pedestrian detectors against severe occlusions. Specifically, for an occluded pedestrian in the current frame, we iteratively search for its relevant counterparts along temporal axis to form a tube. Then, features from the tube are aggregated according to an adaptive weight to enhance the feature representations of the occluded pedestrian. Furthermore, we devise a temporally discriminative embedding module (TDEM) and a part-based relation module (PRM), respectively, which adapts our approach to better handle tube drifting and heavy occlusions. Extensive experiments are conducted on three datasets, Caltech, NightOwls and KAIST, showing that our proposed method is significantly effective for heavily occluded pedestrian detection. Moreover, we achieve the state-of-the-art performance on the Caltech and NightOwls datasets.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"30 1","pages":"13427-13436"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77111475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation 共享:无监督几何估计的合成数据与真实数据相结合
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01399
K. Pnvr, Hao Zhou, D. Jacobs
We propose a novel method for combining synthetic and real images when training networks to determine geometric information from a single image. We suggest a method for mapping both image types into a single, shared domain. This is connected to a primary network for end-to-end training. Ideally, this results in images from two domains that present shared information to the primary network. Our experiments demonstrate significant improvements over the state-of-the-art in two important domains, surface normal estimation of human faces and monocular depth estimation for outdoor scenes, both in an unsupervised setting.
在训练网络时,我们提出了一种将合成图像和真实图像相结合的新方法来从单个图像中确定几何信息。我们建议将这两种图像类型映射到单个共享域的方法。它连接到一个主网络,用于端到端培训。理想情况下,这将导致来自两个域的图像向主网络提供共享信息。我们的实验表明,在两个重要领域,人脸的表面法线估计和户外场景的单目深度估计,都是在无监督的环境下,在最先进的技术上有了显著的改进。
{"title":"SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation","authors":"K. Pnvr, Hao Zhou, D. Jacobs","doi":"10.1109/CVPR42600.2020.01399","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01399","url":null,"abstract":"We propose a novel method for combining synthetic and real images when training networks to determine geometric information from a single image. We suggest a method for mapping both image types into a single, shared domain. This is connected to a primary network for end-to-end training. Ideally, this results in images from two domains that present shared information to the primary network. Our experiments demonstrate significant improvements over the state-of-the-art in two important domains, surface normal estimation of human faces and monocular depth estimation for outdoor scenes, both in an unsupervised setting.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"178 1","pages":"13971-13980"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76466513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Optical Flow in the Dark 黑暗中的光流
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00678
Yinqiang Zheng, Mingfang Zhang, Feng Lu
Many successful optical flow estimation methods have been proposed, but they become invalid when tested in dark scenes because low-light scenarios are not considered when they are designed and current optical flow benchmark datasets lack low-light samples. Even if we preprocess to enhance the dark images, which achieves great visual perception, it still leads to poor optical flow results or even worse ones, because information like motion consistency may be broken while enhancing. We propose an end-to-end data-driven method that avoids error accumulation and learns optical flow directly from low-light noisy images. Specifically, we develop a method to synthesize large-scale low-light optical flow datasets by simulating the noise model on dark raw images. We also collect a new optical flow dataset in raw format with a large range of exposure to be used as a benchmark. The models trained on our synthetic dataset can relatively maintain optical flow accuracy as the image brightness descends and they outperform the existing methods greatly on low-light images.
目前已经提出了许多成功的光流估计方法,但由于在设计时没有考虑弱光场景,且目前的光流基准数据集缺乏弱光样本,这些方法在暗场景下测试时失效。即使我们对暗图像进行预处理增强,达到了很好的视觉感受效果,但由于在增强的过程中可能会破坏运动一致性等信息,导致光流效果很差甚至更差。我们提出了一种端到端数据驱动的方法,避免了误差积累,并直接从低光噪声图像中学习光流。具体而言,我们开发了一种通过模拟暗原始图像上的噪声模型来合成大规模低照度光流数据集的方法。我们还收集了一个具有大曝光范围的原始格式的新光流数据集作为基准。在合成数据集上训练的模型可以在图像亮度下降的情况下保持相对的光流精度,并且在低照度图像上大大优于现有的方法。
{"title":"Optical Flow in the Dark","authors":"Yinqiang Zheng, Mingfang Zhang, Feng Lu","doi":"10.1109/cvpr42600.2020.00678","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00678","url":null,"abstract":"Many successful optical flow estimation methods have been proposed, but they become invalid when tested in dark scenes because low-light scenarios are not considered when they are designed and current optical flow benchmark datasets lack low-light samples. Even if we preprocess to enhance the dark images, which achieves great visual perception, it still leads to poor optical flow results or even worse ones, because information like motion consistency may be broken while enhancing. We propose an end-to-end data-driven method that avoids error accumulation and learns optical flow directly from low-light noisy images. Specifically, we develop a method to synthesize large-scale low-light optical flow datasets by simulating the noise model on dark raw images. We also collect a new optical flow dataset in raw format with a large range of exposure to be used as a benchmark. The models trained on our synthetic dataset can relatively maintain optical flow accuracy as the image brightness descends and they outperform the existing methods greatly on low-light images.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"106 1","pages":"6748-6756"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79557004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer 神经网络的低秩压缩:学习每层的秩
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00807
Yerlan Idelbayev, M. A. Carreira-Perpiñán
Neural net compression can be achieved by approximating each layer's weight matrix by a low-rank matrix. The real difficulty in doing this is not in training the resulting neural net (made up of one low-rank matrix per layer), but in determining what the optimal rank of each layer is—effectively, an architecture search problem with one hyperparameter per layer. We show that, with a suitable formulation, this problem is amenable to a mixed discrete-continuous optimization jointly over the ranks and over the matrix elements, and give a corresponding algorithm. We show that this indeed can select ranks much better than existing approaches, making low-rank compression much more attractive than previously thought. For example, we can make a VGG network faster than a ResNet and with nearly the same classification error.
神经网络压缩可以通过用低秩矩阵逼近每层的权值矩阵来实现。这样做的真正困难不在于训练得到的神经网络(每层由一个低秩矩阵组成),而在于确定每层的最优秩是什么——实际上,这是一个每层有一个超参数的架构搜索问题。我们用合适的公式证明了该问题可适用于对秩和对矩阵元素的混合离散-连续优化,并给出了相应的算法。我们表明,这确实可以比现有的方法更好地选择秩,使低秩压缩比以前想象的更有吸引力。例如,我们可以使VGG网络比ResNet更快,并且具有几乎相同的分类误差。
{"title":"Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer","authors":"Yerlan Idelbayev, M. A. Carreira-Perpiñán","doi":"10.1109/cvpr42600.2020.00807","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00807","url":null,"abstract":"Neural net compression can be achieved by approximating each layer's weight matrix by a low-rank matrix. The real difficulty in doing this is not in training the resulting neural net (made up of one low-rank matrix per layer), but in determining what the optimal rank of each layer is—effectively, an architecture search problem with one hyperparameter per layer. We show that, with a suitable formulation, this problem is amenable to a mixed discrete-continuous optimization jointly over the ranks and over the matrix elements, and give a corresponding algorithm. We show that this indeed can select ranks much better than existing approaches, making low-rank compression much more attractive than previously thought. For example, we can make a VGG network faster than a ResNet and with nearly the same classification error.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"85 1","pages":"8046-8056"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81086220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Height and Uprightness Invariance for 3D Prediction From a Single View 单视图三维预测的高度和垂直不变性
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00057
Manel Baradad, A. Torralba
Current state-of-the-art methods that predict 3D from single images ignore the fact that the height of objects and their upright orientation is invariant to the camera pose and intrinsic parameters. To account for this, we propose a system that directly regresses 3D world coordinates for each pixel. First, our system predicts the camera position with respect to the ground plane and its intrinsic parameters. Followed by that, it predicts the 3D position for each pixel along the rays spanned by the camera. The predicted 3D coordinates and normals are invariant to a change in the camera position or its model, and we can directly impose a regression loss on these world coordinates. Our approach yields competitive results for depth and camera pose estimation (while not being explicitly trained to predict any of these) and improves across-dataset generalization performance over existing state-of-the-art methods.
目前最先进的方法,从单个图像预测3D忽略了一个事实,即物体的高度和它们的垂直方向是不变的相机姿态和内在参数。为了解决这个问题,我们提出了一个直接回归每个像素的3D世界坐标的系统。首先,我们的系统预测相机位置相对于地平面及其固有参数。然后,它沿着相机跨越的光线预测每个像素的3D位置。预测的三维坐标和法线对摄像机位置或其模型的变化是不变的,我们可以直接对这些世界坐标施加回归损失。我们的方法在深度和相机姿态估计方面产生了有竞争力的结果(虽然没有被明确地训练来预测这些),并且比现有的最先进的方法提高了跨数据集的泛化性能。
{"title":"Height and Uprightness Invariance for 3D Prediction From a Single View","authors":"Manel Baradad, A. Torralba","doi":"10.1109/CVPR42600.2020.00057","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00057","url":null,"abstract":"Current state-of-the-art methods that predict 3D from single images ignore the fact that the height of objects and their upright orientation is invariant to the camera pose and intrinsic parameters. To account for this, we propose a system that directly regresses 3D world coordinates for each pixel. First, our system predicts the camera position with respect to the ground plane and its intrinsic parameters. Followed by that, it predicts the 3D position for each pixel along the rays spanned by the camera. The predicted 3D coordinates and normals are invariant to a change in the camera position or its model, and we can directly impose a regression loss on these world coordinates. Our approach yields competitive results for depth and camera pose estimation (while not being explicitly trained to predict any of these) and improves across-dataset generalization performance over existing state-of-the-art methods.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"488-497"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83911854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Visual-Semantic Matching by Exploring High-Order Attention and Distraction 探索高阶注意和分心的视觉语义匹配
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01280
Yongzhi Li, Duo Zhang, Yadong Mu
Cross-modality semantic matching is a vital task in computer vision and has attracted increasing attention in recent years. Existing methods mainly explore object-based alignment between image objects and text words. In this work, we address this task from two previously-ignored aspects: high-order semantic information (e.g., object-predicate-subject triplet, object-attribute pair) and visual distraction (i.e., despite the high relevance to textual query, images may also contain many prominent distracting objects or visual relations). Specifically, we build scene graphs for both visual and textual modalities. Our technical contributions are two-folds: firstly, we formulate the visual-semantic matching task as an attention-driven cross-modality scene graph matching problem. Graph convolutional networks (GCNs) are used to extract high-order information from two scene graphs. A novel cross-graph attention mechanism is proposed to contextually reweigh graph elements and calculate the inter-graph similarity; Secondly, some top-ranked samples are indeed false matching due to the co-occurrence of both highly-relevant and distracting information. We devise an information-theoretic measure for estimating semantic distraction and re-ranking the initial retrieval results. Comprehensive experiments and ablation studies on two large public datasets (MS-COCO and Flickr30K) demonstrate the superiority of the proposed method and the effectiveness of both high-order attention and distraction.
跨情态语义匹配是计算机视觉中的一项重要任务,近年来受到越来越多的关注。现有的方法主要是探索图像对象与文本单词之间基于对象的对齐。在这项工作中,我们从两个以前被忽视的方面来解决这个问题:高阶语义信息(例如,宾语-谓语-主语三元组,宾语-属性对)和视觉分心(即,尽管与文本查询高度相关,图像也可能包含许多突出的分心对象或视觉关系)。具体来说,我们为视觉和文本模式构建场景图。我们的技术贡献有两个方面:首先,我们将视觉语义匹配任务表述为一个注意驱动的跨模态场景图匹配问题。图卷积网络(GCNs)用于从两个场景图中提取高阶信息。提出了一种新的跨图注意机制,根据上下文重新权衡图元素并计算图间相似度;其次,一些排名靠前的样本确实是错误匹配,因为高相关和分散的信息同时出现。我们设计了一种信息论方法来估计语义分心并对初始检索结果重新排序。在MS-COCO和Flickr30K两个大型公共数据集上的综合实验和烧烧研究证明了该方法的优越性以及高阶注意和分心的有效性。
{"title":"Visual-Semantic Matching by Exploring High-Order Attention and Distraction","authors":"Yongzhi Li, Duo Zhang, Yadong Mu","doi":"10.1109/cvpr42600.2020.01280","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01280","url":null,"abstract":"Cross-modality semantic matching is a vital task in computer vision and has attracted increasing attention in recent years. Existing methods mainly explore object-based alignment between image objects and text words. In this work, we address this task from two previously-ignored aspects: high-order semantic information (e.g., object-predicate-subject triplet, object-attribute pair) and visual distraction (i.e., despite the high relevance to textual query, images may also contain many prominent distracting objects or visual relations). Specifically, we build scene graphs for both visual and textual modalities. Our technical contributions are two-folds: firstly, we formulate the visual-semantic matching task as an attention-driven cross-modality scene graph matching problem. Graph convolutional networks (GCNs) are used to extract high-order information from two scene graphs. A novel cross-graph attention mechanism is proposed to contextually reweigh graph elements and calculate the inter-graph similarity; Secondly, some top-ranked samples are indeed false matching due to the co-occurrence of both highly-relevant and distracting information. We devise an information-theoretic measure for estimating semantic distraction and re-ranking the initial retrieval results. Comprehensive experiments and ablation studies on two large public datasets (MS-COCO and Flickr30K) demonstrate the superiority of the proposed method and the effectiveness of both high-order attention and distraction.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"12783-12792"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88215546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
End-to-End Camera Calibration for Broadcast Videos 端到端摄像机校准广播视频
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01364
Long Sha, Jennifer Hobbs, Panna Felsen, Xinyu Wei, P. Lucey, Sujoy Ganguly
The increasing number of vision-based tracking systems deployed in production have necessitated fast, robust camera calibration. In the domain of sport, the majority of current work focuses on sports where lines and intersections are easy to extract, and appearance is relatively consistent across venues. However, for more challenging sports like basketball, those techniques are not sufficient. In this paper, we propose an end-to-end approach for single moving camera calibration across challenging scenarios in sports. Our method contains three key modules: 1) area-based court segmentation, 2) camera pose estimation with embedded templates, 3) homography prediction via a spatial transform network (STN). All three modules are connected, enabling end-to-end training. We evaluate our method on a new college basketball dataset and demonstrate state of the art performance in variable and dynamic environments. We also validate our method on the World Cup 2014 dataset to show its competitive performance against the state-of-the-art methods. Lastly, we show that our method is two orders of magnitude faster than the previous state of the art on both datasets.
越来越多的基于视觉的跟踪系统部署在生产中,需要快速,强大的相机校准。在体育领域,目前的大部分工作集中在线条和交叉点易于提取的运动上,并且外观在各个场馆之间相对一致。然而,对于像篮球这样更具挑战性的运动,这些技术是不够的。在本文中,我们提出了一种端到端方法,用于跨运动中具有挑战性的场景的单个移动摄像机校准。该方法包含三个关键模块:1)基于区域的球场分割,2)基于嵌入式模板的相机姿态估计,3)基于空间变换网络(STN)的单应性预测。所有三个模块都连接在一起,支持端到端的培训。我们在一个新的大学篮球数据集上评估了我们的方法,并展示了在可变和动态环境中的最新表现。我们还在2014年世界杯数据集上验证了我们的方法,以显示其与最先进的方法相比的竞争性能。最后,我们表明,我们的方法在两个数据集上都比以前的技术状态快两个数量级。
{"title":"End-to-End Camera Calibration for Broadcast Videos","authors":"Long Sha, Jennifer Hobbs, Panna Felsen, Xinyu Wei, P. Lucey, Sujoy Ganguly","doi":"10.1109/cvpr42600.2020.01364","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01364","url":null,"abstract":"The increasing number of vision-based tracking systems deployed in production have necessitated fast, robust camera calibration. In the domain of sport, the majority of current work focuses on sports where lines and intersections are easy to extract, and appearance is relatively consistent across venues. However, for more challenging sports like basketball, those techniques are not sufficient. In this paper, we propose an end-to-end approach for single moving camera calibration across challenging scenarios in sports. Our method contains three key modules: 1) area-based court segmentation, 2) camera pose estimation with embedded templates, 3) homography prediction via a spatial transform network (STN). All three modules are connected, enabling end-to-end training. We evaluate our method on a new college basketball dataset and demonstrate state of the art performance in variable and dynamic environments. We also validate our method on the World Cup 2014 dataset to show its competitive performance against the state-of-the-art methods. Lastly, we show that our method is two orders of magnitude faster than the previous state of the art on both datasets.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"81 1","pages":"13624-13633"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88445050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1