首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Dynamic Traffic Modeling From Overhead Imagery 来自头顶图像的动态交通建模
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01233
Scott Workman, Nathan Jacobs
Our goal is to use overhead imagery to understand patterns in traffic flow, for instance answering questions such as how fast could you traverse Times Square at 3am on a Sunday. A traditional approach for solving this problem would be to model the speed of each road segment as a function of time. However, this strategy is limited in that a significant amount of data must first be collected before a model can be used and it fails to generalize to new areas. Instead, we propose an automatic approach for generating dynamic maps of traffic speeds using convolutional neural networks. Our method operates on overhead imagery, is conditioned on location and time, and outputs a local motion model that captures likely directions of travel and corresponding travel speeds. To train our model, we take advantage of historical traffic data collected from New York City. Experimental results demonstrate that our method can be applied to generate accurate city-scale traffic models.
我们的目标是使用头顶图像来了解交通流量模式,例如回答诸如周日凌晨3点你能以多快的速度穿过时代广场这样的问题。解决这个问题的传统方法是将每个路段的速度建模为时间的函数。然而,这种策略是有限的,因为在使用模型之前必须首先收集大量的数据,并且它不能推广到新的领域。相反,我们提出了一种使用卷积神经网络自动生成交通速度动态地图的方法。我们的方法在头顶图像上运行,以位置和时间为条件,并输出一个局部运动模型,该模型可以捕获可能的行进方向和相应的行进速度。为了训练我们的模型,我们利用了从纽约市收集的历史交通数据。实验结果表明,该方法可用于生成精确的城市尺度交通模型。
{"title":"Dynamic Traffic Modeling From Overhead Imagery","authors":"Scott Workman, Nathan Jacobs","doi":"10.1109/CVPR42600.2020.01233","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01233","url":null,"abstract":"Our goal is to use overhead imagery to understand patterns in traffic flow, for instance answering questions such as how fast could you traverse Times Square at 3am on a Sunday. A traditional approach for solving this problem would be to model the speed of each road segment as a function of time. However, this strategy is limited in that a significant amount of data must first be collected before a model can be used and it fails to generalize to new areas. Instead, we propose an automatic approach for generating dynamic maps of traffic speeds using convolutional neural networks. Our method operates on overhead imagery, is conditioned on location and time, and outputs a local motion model that captures likely directions of travel and corresponding travel speeds. To train our model, we take advantage of historical traffic data collected from New York City. Experimental results demonstrate that our method can be applied to generate accurate city-scale traffic models.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"12312-12321"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91293776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Probabilistic Video Prediction From Noisy Data With a Posterior Confidence 基于后验置信度的噪声数据概率视频预测
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01084
Yunbo Wang, Jiajun Wu, Mingsheng Long, J. Tenenbaum
We study a new research problem of probabilistic future frames prediction from a sequence of noisy inputs, which is useful because it is difficult to guarantee the quality of input frames in practical spatiotemporal prediction applications. It is also challenging because it involves two levels of uncertainty: the perceptual uncertainty from noisy observations and the dynamics uncertainty in forward modeling. In this paper, we propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Network (BP-Net). Unlike previous work in stochastic video prediction that assumes spatiotemporal coherence and therefore fails to deal with perceptual uncertainty, BP-Net models both levels of uncertainty in an integrated framework. Furthermore, unlike previous work that can only provide unsorted estimations of future frames, BP-Net leverages a differentiable sequential importance sampling (SIS) approach to make future predictions based on the inference of underlying physical states, thereby providing sorted prediction candidates in accordance with the SIS importance weights, i.e., the confidences. Our experiment results demonstrate that BP-Net remarkably outperforms existing approaches on predicting future frames from noisy data.
本文研究了基于噪声输入序列的未来帧概率预测问题,该问题在实际的时空预测应用中很难保证输入帧的质量。它还具有挑战性,因为它涉及两个层面的不确定性:来自噪声观测的感知不确定性和正演建模中的动态不确定性。在本文中,我们提出了一个端到端可训练模型贝叶斯预测网络(BP-Net)来解决这个问题。与之前的随机视频预测工作不同,BP-Net假设了时空一致性,因此无法处理感知的不确定性,BP-Net在一个综合框架中对两种不确定性水平进行了建模。此外,与以往只能提供未来帧的未排序估计不同,BP-Net利用可微顺序重要性抽样(SIS)方法根据底层物理状态的推断进行未来预测,从而根据SIS重要性权重(即置信度)提供排序的预测候选。我们的实验结果表明,BP-Net在从噪声数据预测未来帧方面明显优于现有方法。
{"title":"Probabilistic Video Prediction From Noisy Data With a Posterior Confidence","authors":"Yunbo Wang, Jiajun Wu, Mingsheng Long, J. Tenenbaum","doi":"10.1109/CVPR42600.2020.01084","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01084","url":null,"abstract":"We study a new research problem of probabilistic future frames prediction from a sequence of noisy inputs, which is useful because it is difficult to guarantee the quality of input frames in practical spatiotemporal prediction applications. It is also challenging because it involves two levels of uncertainty: the perceptual uncertainty from noisy observations and the dynamics uncertainty in forward modeling. In this paper, we propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Network (BP-Net). Unlike previous work in stochastic video prediction that assumes spatiotemporal coherence and therefore fails to deal with perceptual uncertainty, BP-Net models both levels of uncertainty in an integrated framework. Furthermore, unlike previous work that can only provide unsorted estimations of future frames, BP-Net leverages a differentiable sequential importance sampling (SIS) approach to make future predictions based on the inference of underlying physical states, thereby providing sorted prediction candidates in accordance with the SIS importance weights, i.e., the confidences. Our experiment results demonstrate that BP-Net remarkably outperforms existing approaches on predicting future frames from noisy data.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"10827-10836"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87670384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
What Does Plate Glass Reveal About Camera Calibration? 平板玻璃揭示了相机校准的什么?
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00309
Qian Zheng, Jinnan Chen, Zhangchi Lu, Boxin Shi, Xudong Jiang, Kim-Hui Yap, Ling-yu Duan, A. Kot
This paper aims to calibrate the orientation of glass and the field of view of the camera from a single reflection-contaminated image. We show how a reflective amplitude coefficient map can be used as a calibration cue. Different from existing methods, the proposed solution is free from image contents. To reduce the impact of a noisy calibration cue estimated from a reflection-contaminated image, we propose two strategies: an optimization-based method that imposes part of though reliable entries on the map and a learning-based method that fully exploits all entries. We collect a dataset containing 320 samples as well as their camera parameters for evaluation. We demonstrate that our method not only facilitates a general single image camera calibration method that leverages image contents but also contributes to improving the performance of single image reflection removal. Furthermore, we show our byproduct output helps alleviate the ill-posed problem of estimating the panorama from a single image.
本文的目的是根据单幅反射污染图像校准玻璃的方向和相机的视场。我们展示了如何将反射振幅系数图用作校准提示。与现有方法不同的是,该方法不受图像内容的影响。为了减少从反射污染图像中估计的噪声校准线索的影响,我们提出了两种策略:一种基于优化的方法,将部分可靠条目强加到地图上,另一种基于学习的方法,充分利用所有条目。我们收集了一个包含320个样本及其相机参数的数据集进行评估。我们证明,我们的方法不仅有利于利用图像内容的一般单图像相机校准方法,而且有助于提高单图像反射去除的性能。此外,我们表明我们的副产物输出有助于缓解从单个图像估计全景的不适定问题。
{"title":"What Does Plate Glass Reveal About Camera Calibration?","authors":"Qian Zheng, Jinnan Chen, Zhangchi Lu, Boxin Shi, Xudong Jiang, Kim-Hui Yap, Ling-yu Duan, A. Kot","doi":"10.1109/cvpr42600.2020.00309","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00309","url":null,"abstract":"This paper aims to calibrate the orientation of glass and the field of view of the camera from a single reflection-contaminated image. We show how a reflective amplitude coefficient map can be used as a calibration cue. Different from existing methods, the proposed solution is free from image contents. To reduce the impact of a noisy calibration cue estimated from a reflection-contaminated image, we propose two strategies: an optimization-based method that imposes part of though reliable entries on the map and a learning-based method that fully exploits all entries. We collect a dataset containing 320 samples as well as their camera parameters for evaluation. We demonstrate that our method not only facilitates a general single image camera calibration method that leverages image contents but also contributes to improving the performance of single image reflection removal. Furthermore, we show our byproduct output helps alleviate the ill-posed problem of estimating the panorama from a single image.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"254 1","pages":"3019-3029"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87033267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection A2dele:高效RGB-D显著目标检测的自适应和专注深度蒸馏器
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00908
Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, Huchuan Lu
Existing state-of-the-art RGB-D salient object detection methods explore RGB-D data relying on a two-stream architecture, in which an independent subnetwork is required to process depth data. This inevitably incurs extra computational costs and memory consumption, and using depth data during testing may hinder the practical applications of RGB-D saliency detection. To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. First, by adaptively minimizing the differences between predictions generated from the depth stream and RGB stream, we realize the desired control of pixel-wise depth knowledge transferred to the RGB stream. Second, to transfer the localization knowledge to RGB features, we encourage consistencies between the dilated prediction of the depth stream and the attention map from the RGB stream. As a result, we achieve a lightweight architecture without use of depth data at test time by embedding our A2dele. Our extensive experimental evaluation on five benchmarks demonstrate that our RGB stream achieves state-of-the-art performance, which tremendously minimizes the model size by 76% and runs 12 times faster, compared with the best performing method. Furthermore, our A2dele can be applied to existing RGB-D networks to significantly improve their efficiency while maintaining performance (boosts FPS by nearly twice for DMRA and 3 times for CPFP).
现有最先进的RGB-D显著目标检测方法依赖于两流架构来探索RGB-D数据,其中需要一个独立的子网来处理深度数据。这不可避免地会产生额外的计算成本和内存消耗,并且在测试期间使用深度数据可能会阻碍RGB-D显著性检测的实际应用。为了解决这两个困境,我们提出了一个深度蒸馏器(A2dele)来探索使用网络预测和注意力作为两个桥梁将深度知识从深度流转移到RGB流的方法。首先,通过自适应地最小化深度流和RGB流生成的预测之间的差异,我们实现了对传输到RGB流的逐像素深度知识的期望控制。其次,为了将定位知识转移到RGB特征,我们鼓励深度流的扩展预测与RGB流的注意图之间的一致性。因此,通过嵌入我们的A2dele,我们实现了一个轻量级的架构,而无需在测试时使用深度数据。我们在五个基准测试上的广泛实验评估表明,我们的RGB流达到了最先进的性能,与性能最好的方法相比,它将模型大小极大地减少了76%,运行速度提高了12倍。此外,我们的A2dele可以应用于现有的RGB-D网络,在保持性能的同时显著提高其效率(DMRA将FPS提高近两倍,CPFP提高3倍)。
{"title":"A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection","authors":"Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, Huchuan Lu","doi":"10.1109/CVPR42600.2020.00908","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00908","url":null,"abstract":"Existing state-of-the-art RGB-D salient object detection methods explore RGB-D data relying on a two-stream architecture, in which an independent subnetwork is required to process depth data. This inevitably incurs extra computational costs and memory consumption, and using depth data during testing may hinder the practical applications of RGB-D saliency detection. To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. First, by adaptively minimizing the differences between predictions generated from the depth stream and RGB stream, we realize the desired control of pixel-wise depth knowledge transferred to the RGB stream. Second, to transfer the localization knowledge to RGB features, we encourage consistencies between the dilated prediction of the depth stream and the attention map from the RGB stream. As a result, we achieve a lightweight architecture without use of depth data at test time by embedding our A2dele. Our extensive experimental evaluation on five benchmarks demonstrate that our RGB stream achieves state-of-the-art performance, which tremendously minimizes the model size by 76% and runs 12 times faster, compared with the best performing method. Furthermore, our A2dele can be applied to existing RGB-D networks to significantly improve their efficiency while maintaining performance (boosts FPS by nearly twice for DMRA and 3 times for CPFP).","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"64 1","pages":"9057-9066"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90428016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Reflection Scene Separation From a Single Image 从单个图像的反射场景分离
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00247
Renjie Wan, Boxin Shi, Haoliang Li, Ling-yu Duan, A. Kot
For images taken through glass, existing methods focus on the restoration of the background scene by regarding the reflection components as noise. However, the scene reflected by glass surface also contains important information to be recovered, especially for the surveillance or criminal investigations. In this paper, instead of removing reflection components from the mixture image, we aim at recovering reflection scenes from the mixture image. We first propose a strategy to obtain such ground truth and its corresponding input images. Then, we propose a two-stage framework to obtain the visible reflection scene from the mixture image. Specifically, we train the network with a shift-invariant loss which is robust to misalignment between the input and output images. The experimental results show that our proposed method achieves promising results.
对于透过玻璃拍摄的图像,现有的方法主要是将反射分量视为噪声来恢复背景场景。然而,玻璃表面反射出的场景也包含着重要的信息,特别是对于监视或刑事侦查来说。在本文中,我们的目标不是从混合图像中去除反射成分,而是从混合图像中恢复反射场景。我们首先提出了一种策略来获取这些基础真值及其相应的输入图像。然后,我们提出了一种两阶段框架,从混合图像中获得可见反射场景。具体来说,我们用移位不变损失训练网络,该损失对输入和输出图像之间的不对齐具有鲁棒性。实验结果表明,该方法取得了较好的效果。
{"title":"Reflection Scene Separation From a Single Image","authors":"Renjie Wan, Boxin Shi, Haoliang Li, Ling-yu Duan, A. Kot","doi":"10.1109/cvpr42600.2020.00247","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00247","url":null,"abstract":"For images taken through glass, existing methods focus on the restoration of the background scene by regarding the reflection components as noise. However, the scene reflected by glass surface also contains important information to be recovered, especially for the surveillance or criminal investigations. In this paper, instead of removing reflection components from the mixture image, we aim at recovering reflection scenes from the mixture image. We first propose a strategy to obtain such ground truth and its corresponding input images. Then, we propose a two-stage framework to obtain the visible reflection scene from the mixture image. Specifically, we train the network with a shift-invariant loss which is robust to misalignment between the input and output images. The experimental results show that our proposed method achieves promising results.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"30 1","pages":"2395-2403"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85549926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Learning Video Stabilization Using Optical Flow 学习视频稳定使用光流
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00818
Ji-yang Yu, R. Ramamoorthi
We propose a novel neural network that infers the per-pixel warp fields for video stabilization from the optical flow fields of the input video. While previous learning based video stabilization methods attempt to implicitly learn frame motions from color videos, our method resorts to optical flow for motion analysis and directly learns the stabilization using the optical flow. We also propose a pipeline that uses optical flow principal components for motion inpainting and warp field smoothing, making our method robust to moving objects, occlusion and optical flow inaccuracy, which is challenging for other video stabilization methods. Our method achieves quantitatively and visually better results than the state-of-the-art optimization based and deep learning based video stabilization methods. Our method also gives a ~3x speed improvement compared to the optimization based methods.
我们提出了一种新的神经网络,从输入视频的光流场中推断出用于视频稳定的逐像素扭曲场。先前基于学习的视频稳定方法试图从彩色视频中隐式学习帧运动,而我们的方法采用光流进行运动分析,并使用光流直接学习稳定。我们还提出了一种使用光流主成分进行运动涂漆和翘曲场平滑的管道,使我们的方法对运动物体,遮挡和光流不准确具有鲁棒性,这是其他视频稳定方法所面临的挑战。我们的方法比基于最先进的优化和基于深度学习的视频稳定方法在定量和视觉上取得了更好的结果。与基于优化的方法相比,我们的方法的速度提高了约3倍。
{"title":"Learning Video Stabilization Using Optical Flow","authors":"Ji-yang Yu, R. Ramamoorthi","doi":"10.1109/CVPR42600.2020.00818","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00818","url":null,"abstract":"We propose a novel neural network that infers the per-pixel warp fields for video stabilization from the optical flow fields of the input video. While previous learning based video stabilization methods attempt to implicitly learn frame motions from color videos, our method resorts to optical flow for motion analysis and directly learns the stabilization using the optical flow. We also propose a pipeline that uses optical flow principal components for motion inpainting and warp field smoothing, making our method robust to moving objects, occlusion and optical flow inaccuracy, which is challenging for other video stabilization methods. Our method achieves quantitatively and visually better results than the state-of-the-art optimization based and deep learning based video stabilization methods. Our method also gives a ~3x speed improvement compared to the optimization based methods.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"8156-8164"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88774007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Attention-Aware Multi-View Stereo 注意感知多视角立体
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00166
Keyang Luo, T. Guan, L. Ju, Yuesong Wang, Zhu Chen, Yawei Luo
Multi-view stereo is a crucial task in computer vision, that requires accurate and robust photo-consistency among input images for depth estimation. Recent studies have shown that learning-based feature matching and confidence regularization can play a vital role in this task. Nevertheless, how to design good matching confidence volumes as well as effective regularizers for them are still under in-depth study. In this paper, we propose an attention-aware deep neural network “AttMVS” for learning multi-view stereo. In particular, we propose a novel attention-enhanced matching confidence volume, that combines the raw pixel-wise matching confidence from the extracted perceptual features with the contextual information of local scenes, to improve the matching robustness. Furthermore, we develop an attention-guided regularization module, which consists of multilevel ray fusion modules, to hierarchically aggregate and regularize the matching confidence volume into a latent depth probability volume.Experimental results show that our approach achieves the best overall performance on the DTU dataset and the intermediate sequences of Tanks & Temples benchmark over many state-of-the-art MVS algorithms.
多视点立体视觉是计算机视觉中的一项重要任务,它要求输入图像之间具有准确和鲁棒的图像一致性,以进行深度估计。最近的研究表明,基于学习的特征匹配和置信度正则化可以在这一任务中发挥重要作用。然而,如何为它们设计良好的匹配置信度以及有效的正则化器仍在深入研究中。在本文中,我们提出了一种用于学习多视点立体视觉的注意力感知深度神经网络“AttMVS”。特别是,我们提出了一种新的注意力增强匹配置信度,将提取的感知特征的原始像素级匹配置信度与局部场景的上下文信息相结合,以提高匹配的鲁棒性。此外,我们开发了一个由多层射线融合模块组成的注意引导正则化模块,将匹配置信度体分层聚集并正则化为潜在深度概率体。实验结果表明,与许多最先进的MVS算法相比,我们的方法在DTU数据集和Tanks & Temples基准的中间序列上实现了最佳的整体性能。
{"title":"Attention-Aware Multi-View Stereo","authors":"Keyang Luo, T. Guan, L. Ju, Yuesong Wang, Zhu Chen, Yawei Luo","doi":"10.1109/cvpr42600.2020.00166","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00166","url":null,"abstract":"Multi-view stereo is a crucial task in computer vision, that requires accurate and robust photo-consistency among input images for depth estimation. Recent studies have shown that learning-based feature matching and confidence regularization can play a vital role in this task. Nevertheless, how to design good matching confidence volumes as well as effective regularizers for them are still under in-depth study. In this paper, we propose an attention-aware deep neural network “AttMVS” for learning multi-view stereo. In particular, we propose a novel attention-enhanced matching confidence volume, that combines the raw pixel-wise matching confidence from the extracted perceptual features with the contextual information of local scenes, to improve the matching robustness. Furthermore, we develop an attention-guided regularization module, which consists of multilevel ray fusion modules, to hierarchically aggregate and regularize the matching confidence volume into a latent depth probability volume.Experimental results show that our approach achieves the best overall performance on the DTU dataset and the intermediate sequences of Tanks & Temples benchmark over many state-of-the-art MVS algorithms.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"95 1","pages":"1587-1596"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88966625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Boundary-Aware 3D Building Reconstruction From a Single Overhead Image 基于单幅头顶图像的边界感知三维建筑重建
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00052
Jisan Mahmud, True Price, Akash Bapat, Jan-Michael Frahm
We propose a boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image. Unlike most existing techniques which rely on multiple images for 3D scene modeling, we seek to model the buildings in the scene from a single overhead image by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. To jointly train for these tasks, we leverage pixel-wise semantic segmentation and normalized digital surface maps (nDSM) as supervision, in addition to labeled building outlines. At test time, buildings in the scene are automatically modeled in 3D using only an input overhead image. We demonstrate an increase in building modeling performance using a multi-feature network architecture that improves building outline detection by considering network features learned for the other jointly learned tasks. We also introduce a novel mechanism for robustly refining instance-specific building outlines using the learned modified SDF. We verify the effectiveness of our method on multiple large-scale satellite and aerial imagery datasets, where we obtain state-of-the-art performance in the 3D building reconstruction task.
我们提出了一种基于边界感知的多任务深度学习框架,用于从单个头顶图像中快速建模3D建筑。与大多数依赖多幅图像进行3D场景建模的现有技术不同,我们寻求通过联合学习来自建筑物边界的修改符号距离函数(SDF)、场景的密集高度图和场景语义,从单个架空图像中建模场景中的建筑物。为了联合训练这些任务,除了标记建筑轮廓外,我们还利用逐像素语义分割和规范化数字表面地图(nDSM)作为监督。在测试时,场景中的建筑物仅使用输入的头顶图像自动建模为3D。我们演示了使用多特征网络架构来提高建筑物建模性能,该架构通过考虑为其他联合学习任务学习的网络特征来改进建筑物轮廓检测。我们还引入了一种新机制,使用学习到的改进的SDF稳健地精炼特定于实例的建筑轮廓。我们在多个大型卫星和航空图像数据集上验证了我们方法的有效性,在这些数据集上,我们在3D建筑重建任务中获得了最先进的性能。
{"title":"Boundary-Aware 3D Building Reconstruction From a Single Overhead Image","authors":"Jisan Mahmud, True Price, Akash Bapat, Jan-Michael Frahm","doi":"10.1109/cvpr42600.2020.00052","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00052","url":null,"abstract":"We propose a boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image. Unlike most existing techniques which rely on multiple images for 3D scene modeling, we seek to model the buildings in the scene from a single overhead image by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. To jointly train for these tasks, we leverage pixel-wise semantic segmentation and normalized digital surface maps (nDSM) as supervision, in addition to labeled building outlines. At test time, buildings in the scene are automatically modeled in 3D using only an input overhead image. We demonstrate an increase in building modeling performance using a multi-feature network architecture that improves building outline detection by considering network features learned for the other jointly learned tasks. We also introduce a novel mechanism for robustly refining instance-specific building outlines using the learned modified SDF. We verify the effectiveness of our method on multiple large-scale satellite and aerial imagery datasets, where we obtain state-of-the-art performance in the 3D building reconstruction task.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"438-448"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81471331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection 用于精确和快速显著性检测的交互式双流解码器
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00916
Huajun Zhou, Xiaohua Xie, J. Lai, Zixuan Chen, Lingxiao Yang
Recently, contour information largely improves the performance of saliency detection. However, the discussion on the correlation between saliency and contour remains scarce. In this paper, we first analyze such correlation and then propose an interactive two-stream decoder to explore multiple cues, including saliency, contour and their correlation. Specifically, our decoder consists of two branches, a saliency branch and a contour branch. Each branch is assigned to learn distinctive features for predicting the corresponding map. Meanwhile, the intermediate connections are forced to learn the correlation by interactively transmitting the features from each branch to the other one. In addition, we develop an adaptive contour loss to automatically discriminate hard examples during learning process. Extensive experiments on six benchmarks well demonstrate that our network achieves competitive performance with a fast speed around 50 FPS. Moreover, our VGG-based model only contains 17.08 million parameters, which is significantly smaller than other VGG-based approaches. Code has been made available at: https://github.com/moothes/ITSD-pytorch.
近年来,轮廓信息在很大程度上提高了显著性检测的性能。然而,关于显著性和等高线之间的相关性的讨论仍然很少。在本文中,我们首先分析了这种相关性,然后提出了一个交互式双流解码器来探索多种线索,包括显著性、轮廓及其相关性。具体来说,我们的解码器由两个分支组成,一个显著分支和一个轮廓分支。每个分支被分配学习不同的特征,以预测相应的地图。同时,中间连接通过交互地将特征从一个分支传递到另一个分支来学习相关性。此外,我们还开发了一种自适应轮廓损失算法,用于在学习过程中自动识别难样本。在六个基准测试上进行的大量实验很好地证明了我们的网络在50 FPS左右的速度下达到了具有竞争力的性能。此外,我们基于vgg的模型只包含1708万个参数,这比其他基于vgg的方法要少得多。代码已在https://github.com/moothes/ITSD-pytorch上提供。
{"title":"Interactive Two-Stream Decoder for Accurate and Fast Saliency Detection","authors":"Huajun Zhou, Xiaohua Xie, J. Lai, Zixuan Chen, Lingxiao Yang","doi":"10.1109/cvpr42600.2020.00916","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00916","url":null,"abstract":"Recently, contour information largely improves the performance of saliency detection. However, the discussion on the correlation between saliency and contour remains scarce. In this paper, we first analyze such correlation and then propose an interactive two-stream decoder to explore multiple cues, including saliency, contour and their correlation. Specifically, our decoder consists of two branches, a saliency branch and a contour branch. Each branch is assigned to learn distinctive features for predicting the corresponding map. Meanwhile, the intermediate connections are forced to learn the correlation by interactively transmitting the features from each branch to the other one. In addition, we develop an adaptive contour loss to automatically discriminate hard examples during learning process. Extensive experiments on six benchmarks well demonstrate that our network achieves competitive performance with a fast speed around 50 FPS. Moreover, our VGG-based model only contains 17.08 million parameters, which is significantly smaller than other VGG-based approaches. Code has been made available at: https://github.com/moothes/ITSD-pytorch.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"9138-9147"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84673840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 224
Context Aware Graph Convolution for Skeleton-Based Action Recognition 基于骨架的动作识别的上下文感知图卷积
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01434
Xikun Zhang, Chang Xu, D. Tao
Graph convolutional models have gained impressive successes on skeleton based human action recognition task. As graph convolution is a local operation, it cannot fully investigate non-local joints that could be vital to recognizing the action. For example, actions like typing and clapping request the cooperation of two hands, which are distant from each other in a human skeleton graph. Multiple graph convolutional layers thus tend to be stacked together to increase receptive field, which brings in computational inefficiency and optimization difficulty. But there is still no guarantee that distant joints (e.g. two hands) can be well integrated. In this paper, we propose a context aware graph convolutional network (CA-GCN). Besides the computation of localized graph convolution, CA-GCN considers a context term for each vertex by integrating information of all other vertices. Long range dependencies among joints are thus naturally integrated in context information, which then eliminates the need of stacking multiple layers to enlarge receptive field and greatly simplifies the network. Moreover, we further propose an advanced CA-GCN, in which asymmetric relevance measurement and higher level representation are utilized to compute context information for more flexibility and better performance. Besides the joint features, our CA-GCN could also be extended to handle graphs with edge (limb) features. Extensive experiments on two real-world datasets demonstrate the importance of context information and the effectiveness of the proposed CA-GCN in skeleton based action recognition.
图卷积模型在基于骨骼的人体动作识别任务中取得了令人瞩目的成功。由于图卷积是一种局部操作,它不能完全研究对动作识别至关重要的非局部关节。例如,打字和拍手等动作需要两只手的配合,而在人体骨架图中,这两只手相距很远。因此,多个图卷积层倾向于堆叠在一起以增加接受场,这带来了计算效率低下和优化难度。但是仍然不能保证远处的关节(例如两只手)可以很好地结合在一起。本文提出了一种上下文感知图卷积网络(CA-GCN)。CA-GCN除了计算局部图卷积外,还通过整合所有其他顶点的信息来考虑每个顶点的上下文项。因此,节点之间的长距离依赖关系自然地集成在上下文信息中,从而消除了堆叠多层以扩大接受域的需要,大大简化了网络。此外,我们进一步提出了一种改进的CA-GCN,该算法利用非对称关联度量和更高层次的表示来计算上下文信息,以获得更大的灵活性和更好的性能。除了联合特征外,我们的CA-GCN还可以扩展到处理具有边(肢)特征的图。在两个真实数据集上的大量实验证明了上下文信息的重要性以及所提出的CA-GCN在基于骨架的动作识别中的有效性。
{"title":"Context Aware Graph Convolution for Skeleton-Based Action Recognition","authors":"Xikun Zhang, Chang Xu, D. Tao","doi":"10.1109/cvpr42600.2020.01434","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01434","url":null,"abstract":"Graph convolutional models have gained impressive successes on skeleton based human action recognition task. As graph convolution is a local operation, it cannot fully investigate non-local joints that could be vital to recognizing the action. For example, actions like typing and clapping request the cooperation of two hands, which are distant from each other in a human skeleton graph. Multiple graph convolutional layers thus tend to be stacked together to increase receptive field, which brings in computational inefficiency and optimization difficulty. But there is still no guarantee that distant joints (e.g. two hands) can be well integrated. In this paper, we propose a context aware graph convolutional network (CA-GCN). Besides the computation of localized graph convolution, CA-GCN considers a context term for each vertex by integrating information of all other vertices. Long range dependencies among joints are thus naturally integrated in context information, which then eliminates the need of stacking multiple layers to enlarge receptive field and greatly simplifies the network. Moreover, we further propose an advanced CA-GCN, in which asymmetric relevance measurement and higher level representation are utilized to compute context information for more flexibility and better performance. Besides the joint features, our CA-GCN could also be extended to handle graphs with edge (limb) features. Extensive experiments on two real-world datasets demonstrate the importance of context information and the effectiveness of the proposed CA-GCN in skeleton based action recognition.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"14321-14330"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84712882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1