首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Dynamic Traffic Modeling From Overhead Imagery 来自头顶图像的动态交通建模
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01233
Scott Workman, Nathan Jacobs
Our goal is to use overhead imagery to understand patterns in traffic flow, for instance answering questions such as how fast could you traverse Times Square at 3am on a Sunday. A traditional approach for solving this problem would be to model the speed of each road segment as a function of time. However, this strategy is limited in that a significant amount of data must first be collected before a model can be used and it fails to generalize to new areas. Instead, we propose an automatic approach for generating dynamic maps of traffic speeds using convolutional neural networks. Our method operates on overhead imagery, is conditioned on location and time, and outputs a local motion model that captures likely directions of travel and corresponding travel speeds. To train our model, we take advantage of historical traffic data collected from New York City. Experimental results demonstrate that our method can be applied to generate accurate city-scale traffic models.
我们的目标是使用头顶图像来了解交通流量模式,例如回答诸如周日凌晨3点你能以多快的速度穿过时代广场这样的问题。解决这个问题的传统方法是将每个路段的速度建模为时间的函数。然而,这种策略是有限的,因为在使用模型之前必须首先收集大量的数据,并且它不能推广到新的领域。相反,我们提出了一种使用卷积神经网络自动生成交通速度动态地图的方法。我们的方法在头顶图像上运行,以位置和时间为条件,并输出一个局部运动模型,该模型可以捕获可能的行进方向和相应的行进速度。为了训练我们的模型,我们利用了从纽约市收集的历史交通数据。实验结果表明,该方法可用于生成精确的城市尺度交通模型。
{"title":"Dynamic Traffic Modeling From Overhead Imagery","authors":"Scott Workman, Nathan Jacobs","doi":"10.1109/CVPR42600.2020.01233","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01233","url":null,"abstract":"Our goal is to use overhead imagery to understand patterns in traffic flow, for instance answering questions such as how fast could you traverse Times Square at 3am on a Sunday. A traditional approach for solving this problem would be to model the speed of each road segment as a function of time. However, this strategy is limited in that a significant amount of data must first be collected before a model can be used and it fails to generalize to new areas. Instead, we propose an automatic approach for generating dynamic maps of traffic speeds using convolutional neural networks. Our method operates on overhead imagery, is conditioned on location and time, and outputs a local motion model that captures likely directions of travel and corresponding travel speeds. To train our model, we take advantage of historical traffic data collected from New York City. Experimental results demonstrate that our method can be applied to generate accurate city-scale traffic models.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"12312-12321"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91293776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Probabilistic Video Prediction From Noisy Data With a Posterior Confidence 基于后验置信度的噪声数据概率视频预测
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01084
Yunbo Wang, Jiajun Wu, Mingsheng Long, J. Tenenbaum
We study a new research problem of probabilistic future frames prediction from a sequence of noisy inputs, which is useful because it is difficult to guarantee the quality of input frames in practical spatiotemporal prediction applications. It is also challenging because it involves two levels of uncertainty: the perceptual uncertainty from noisy observations and the dynamics uncertainty in forward modeling. In this paper, we propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Network (BP-Net). Unlike previous work in stochastic video prediction that assumes spatiotemporal coherence and therefore fails to deal with perceptual uncertainty, BP-Net models both levels of uncertainty in an integrated framework. Furthermore, unlike previous work that can only provide unsorted estimations of future frames, BP-Net leverages a differentiable sequential importance sampling (SIS) approach to make future predictions based on the inference of underlying physical states, thereby providing sorted prediction candidates in accordance with the SIS importance weights, i.e., the confidences. Our experiment results demonstrate that BP-Net remarkably outperforms existing approaches on predicting future frames from noisy data.
本文研究了基于噪声输入序列的未来帧概率预测问题,该问题在实际的时空预测应用中很难保证输入帧的质量。它还具有挑战性,因为它涉及两个层面的不确定性:来自噪声观测的感知不确定性和正演建模中的动态不确定性。在本文中,我们提出了一个端到端可训练模型贝叶斯预测网络(BP-Net)来解决这个问题。与之前的随机视频预测工作不同,BP-Net假设了时空一致性,因此无法处理感知的不确定性,BP-Net在一个综合框架中对两种不确定性水平进行了建模。此外,与以往只能提供未来帧的未排序估计不同,BP-Net利用可微顺序重要性抽样(SIS)方法根据底层物理状态的推断进行未来预测,从而根据SIS重要性权重(即置信度)提供排序的预测候选。我们的实验结果表明,BP-Net在从噪声数据预测未来帧方面明显优于现有方法。
{"title":"Probabilistic Video Prediction From Noisy Data With a Posterior Confidence","authors":"Yunbo Wang, Jiajun Wu, Mingsheng Long, J. Tenenbaum","doi":"10.1109/CVPR42600.2020.01084","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01084","url":null,"abstract":"We study a new research problem of probabilistic future frames prediction from a sequence of noisy inputs, which is useful because it is difficult to guarantee the quality of input frames in practical spatiotemporal prediction applications. It is also challenging because it involves two levels of uncertainty: the perceptual uncertainty from noisy observations and the dynamics uncertainty in forward modeling. In this paper, we propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Network (BP-Net). Unlike previous work in stochastic video prediction that assumes spatiotemporal coherence and therefore fails to deal with perceptual uncertainty, BP-Net models both levels of uncertainty in an integrated framework. Furthermore, unlike previous work that can only provide unsorted estimations of future frames, BP-Net leverages a differentiable sequential importance sampling (SIS) approach to make future predictions based on the inference of underlying physical states, thereby providing sorted prediction candidates in accordance with the SIS importance weights, i.e., the confidences. Our experiment results demonstrate that BP-Net remarkably outperforms existing approaches on predicting future frames from noisy data.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"10827-10836"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87670384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
What Does Plate Glass Reveal About Camera Calibration? 平板玻璃揭示了相机校准的什么?
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00309
Qian Zheng, Jinnan Chen, Zhangchi Lu, Boxin Shi, Xudong Jiang, Kim-Hui Yap, Ling-yu Duan, A. Kot
This paper aims to calibrate the orientation of glass and the field of view of the camera from a single reflection-contaminated image. We show how a reflective amplitude coefficient map can be used as a calibration cue. Different from existing methods, the proposed solution is free from image contents. To reduce the impact of a noisy calibration cue estimated from a reflection-contaminated image, we propose two strategies: an optimization-based method that imposes part of though reliable entries on the map and a learning-based method that fully exploits all entries. We collect a dataset containing 320 samples as well as their camera parameters for evaluation. We demonstrate that our method not only facilitates a general single image camera calibration method that leverages image contents but also contributes to improving the performance of single image reflection removal. Furthermore, we show our byproduct output helps alleviate the ill-posed problem of estimating the panorama from a single image.
本文的目的是根据单幅反射污染图像校准玻璃的方向和相机的视场。我们展示了如何将反射振幅系数图用作校准提示。与现有方法不同的是,该方法不受图像内容的影响。为了减少从反射污染图像中估计的噪声校准线索的影响,我们提出了两种策略:一种基于优化的方法,将部分可靠条目强加到地图上,另一种基于学习的方法,充分利用所有条目。我们收集了一个包含320个样本及其相机参数的数据集进行评估。我们证明,我们的方法不仅有利于利用图像内容的一般单图像相机校准方法,而且有助于提高单图像反射去除的性能。此外,我们表明我们的副产物输出有助于缓解从单个图像估计全景的不适定问题。
{"title":"What Does Plate Glass Reveal About Camera Calibration?","authors":"Qian Zheng, Jinnan Chen, Zhangchi Lu, Boxin Shi, Xudong Jiang, Kim-Hui Yap, Ling-yu Duan, A. Kot","doi":"10.1109/cvpr42600.2020.00309","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00309","url":null,"abstract":"This paper aims to calibrate the orientation of glass and the field of view of the camera from a single reflection-contaminated image. We show how a reflective amplitude coefficient map can be used as a calibration cue. Different from existing methods, the proposed solution is free from image contents. To reduce the impact of a noisy calibration cue estimated from a reflection-contaminated image, we propose two strategies: an optimization-based method that imposes part of though reliable entries on the map and a learning-based method that fully exploits all entries. We collect a dataset containing 320 samples as well as their camera parameters for evaluation. We demonstrate that our method not only facilitates a general single image camera calibration method that leverages image contents but also contributes to improving the performance of single image reflection removal. Furthermore, we show our byproduct output helps alleviate the ill-posed problem of estimating the panorama from a single image.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"254 1","pages":"3019-3029"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87033267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection A2dele:高效RGB-D显著目标检测的自适应和专注深度蒸馏器
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00908
Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, Huchuan Lu
Existing state-of-the-art RGB-D salient object detection methods explore RGB-D data relying on a two-stream architecture, in which an independent subnetwork is required to process depth data. This inevitably incurs extra computational costs and memory consumption, and using depth data during testing may hinder the practical applications of RGB-D saliency detection. To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. First, by adaptively minimizing the differences between predictions generated from the depth stream and RGB stream, we realize the desired control of pixel-wise depth knowledge transferred to the RGB stream. Second, to transfer the localization knowledge to RGB features, we encourage consistencies between the dilated prediction of the depth stream and the attention map from the RGB stream. As a result, we achieve a lightweight architecture without use of depth data at test time by embedding our A2dele. Our extensive experimental evaluation on five benchmarks demonstrate that our RGB stream achieves state-of-the-art performance, which tremendously minimizes the model size by 76% and runs 12 times faster, compared with the best performing method. Furthermore, our A2dele can be applied to existing RGB-D networks to significantly improve their efficiency while maintaining performance (boosts FPS by nearly twice for DMRA and 3 times for CPFP).
现有最先进的RGB-D显著目标检测方法依赖于两流架构来探索RGB-D数据,其中需要一个独立的子网来处理深度数据。这不可避免地会产生额外的计算成本和内存消耗,并且在测试期间使用深度数据可能会阻碍RGB-D显著性检测的实际应用。为了解决这两个困境,我们提出了一个深度蒸馏器(A2dele)来探索使用网络预测和注意力作为两个桥梁将深度知识从深度流转移到RGB流的方法。首先,通过自适应地最小化深度流和RGB流生成的预测之间的差异,我们实现了对传输到RGB流的逐像素深度知识的期望控制。其次,为了将定位知识转移到RGB特征,我们鼓励深度流的扩展预测与RGB流的注意图之间的一致性。因此,通过嵌入我们的A2dele,我们实现了一个轻量级的架构,而无需在测试时使用深度数据。我们在五个基准测试上的广泛实验评估表明,我们的RGB流达到了最先进的性能,与性能最好的方法相比,它将模型大小极大地减少了76%,运行速度提高了12倍。此外,我们的A2dele可以应用于现有的RGB-D网络,在保持性能的同时显著提高其效率(DMRA将FPS提高近两倍,CPFP提高3倍)。
{"title":"A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection","authors":"Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, Huchuan Lu","doi":"10.1109/CVPR42600.2020.00908","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00908","url":null,"abstract":"Existing state-of-the-art RGB-D salient object detection methods explore RGB-D data relying on a two-stream architecture, in which an independent subnetwork is required to process depth data. This inevitably incurs extra computational costs and memory consumption, and using depth data during testing may hinder the practical applications of RGB-D saliency detection. To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. First, by adaptively minimizing the differences between predictions generated from the depth stream and RGB stream, we realize the desired control of pixel-wise depth knowledge transferred to the RGB stream. Second, to transfer the localization knowledge to RGB features, we encourage consistencies between the dilated prediction of the depth stream and the attention map from the RGB stream. As a result, we achieve a lightweight architecture without use of depth data at test time by embedding our A2dele. Our extensive experimental evaluation on five benchmarks demonstrate that our RGB stream achieves state-of-the-art performance, which tremendously minimizes the model size by 76% and runs 12 times faster, compared with the best performing method. Furthermore, our A2dele can be applied to existing RGB-D networks to significantly improve their efficiency while maintaining performance (boosts FPS by nearly twice for DMRA and 3 times for CPFP).","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"64 1","pages":"9057-9066"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90428016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Reflection Scene Separation From a Single Image 从单个图像的反射场景分离
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00247
Renjie Wan, Boxin Shi, Haoliang Li, Ling-yu Duan, A. Kot
For images taken through glass, existing methods focus on the restoration of the background scene by regarding the reflection components as noise. However, the scene reflected by glass surface also contains important information to be recovered, especially for the surveillance or criminal investigations. In this paper, instead of removing reflection components from the mixture image, we aim at recovering reflection scenes from the mixture image. We first propose a strategy to obtain such ground truth and its corresponding input images. Then, we propose a two-stage framework to obtain the visible reflection scene from the mixture image. Specifically, we train the network with a shift-invariant loss which is robust to misalignment between the input and output images. The experimental results show that our proposed method achieves promising results.
对于透过玻璃拍摄的图像,现有的方法主要是将反射分量视为噪声来恢复背景场景。然而,玻璃表面反射出的场景也包含着重要的信息,特别是对于监视或刑事侦查来说。在本文中,我们的目标不是从混合图像中去除反射成分,而是从混合图像中恢复反射场景。我们首先提出了一种策略来获取这些基础真值及其相应的输入图像。然后,我们提出了一种两阶段框架,从混合图像中获得可见反射场景。具体来说,我们用移位不变损失训练网络,该损失对输入和输出图像之间的不对齐具有鲁棒性。实验结果表明,该方法取得了较好的效果。
{"title":"Reflection Scene Separation From a Single Image","authors":"Renjie Wan, Boxin Shi, Haoliang Li, Ling-yu Duan, A. Kot","doi":"10.1109/cvpr42600.2020.00247","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00247","url":null,"abstract":"For images taken through glass, existing methods focus on the restoration of the background scene by regarding the reflection components as noise. However, the scene reflected by glass surface also contains important information to be recovered, especially for the surveillance or criminal investigations. In this paper, instead of removing reflection components from the mixture image, we aim at recovering reflection scenes from the mixture image. We first propose a strategy to obtain such ground truth and its corresponding input images. Then, we propose a two-stage framework to obtain the visible reflection scene from the mixture image. Specifically, we train the network with a shift-invariant loss which is robust to misalignment between the input and output images. The experimental results show that our proposed method achieves promising results.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"30 1","pages":"2395-2403"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85549926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation PandaNet:基于锚的单镜头多人3D姿态估计
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00689
Abdallah Benzine, Florian Chabot, B. Luvison, Q. Pham, C. Achard
Recently, several deep learning models have been proposed for 3D human pose estimation. Nevertheless, most of these approaches only focus on the single-person case or estimate 3D pose of a few people at high resolution. Furthermore, many applications such as autonomous driving or crowd analysis require pose estimation of a large number of people possibly at low-resolution. In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based Network), a new single-shot, anchor-based and multi-person 3D pose estimation approach. The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass. It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box and allows the pose estimation of a possibly large number of people at low resolution. To manage people overlapping, we introduce a Pose-Aware Anchor Selection strategy. Moreover, as imbalance exists between different people sizes in the image, and joints coordinates have different uncertainties depending on these sizes, we propose a method to automatically optimize weights associated to different people scales and joints for efficient training. PandaNet surpasses previous single-shot methods on several challenging datasets: a multi-person urban virtual but very realistic dataset (JTA Dataset), and two real world 3D multi-person datasets (CMU Panoptic and MuPoTS-3D).
最近,人们提出了几种用于三维人体姿态估计的深度学习模型。然而,这些方法大多只关注单人情况或在高分辨率下估计少数人的3D姿势。此外,许多应用,如自动驾驶或人群分析,需要在低分辨率下对大量人员进行姿势估计。在这项工作中,我们提出了PandaNet(姿态估计和检测基于锚点的网络),这是一种新的单镜头,基于锚点的多人三维姿态估计方法。提出的模型执行边界盒检测,并对每个检测到的人进行2D和3D姿态回归到单个向前传递。它不需要任何后处理来重组关节,因为网络预测了每个边界框的完整3D姿态,并允许在低分辨率下对可能大量的人进行姿态估计。为了管理人员重叠,我们引入了姿态感知锚点选择策略。此外,针对图像中不同人的尺寸之间存在不平衡,关节坐标根据这些尺寸具有不同的不确定性,我们提出了一种自动优化不同人的尺寸和关节的关联权值的方法,以实现高效的训练。PandaNet在几个具有挑战性的数据集上超越了以前的单镜头方法:多人城市虚拟但非常逼真的数据集(JTA数据集)和两个真实世界的3D多人数据集(CMU Panoptic和MuPoTS-3D)。
{"title":"PandaNet: Anchor-Based Single-Shot Multi-Person 3D Pose Estimation","authors":"Abdallah Benzine, Florian Chabot, B. Luvison, Q. Pham, C. Achard","doi":"10.1109/cvpr42600.2020.00689","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00689","url":null,"abstract":"Recently, several deep learning models have been proposed for 3D human pose estimation. Nevertheless, most of these approaches only focus on the single-person case or estimate 3D pose of a few people at high resolution. Furthermore, many applications such as autonomous driving or crowd analysis require pose estimation of a large number of people possibly at low-resolution. In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based Network), a new single-shot, anchor-based and multi-person 3D pose estimation approach. The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass. It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box and allows the pose estimation of a possibly large number of people at low resolution. To manage people overlapping, we introduce a Pose-Aware Anchor Selection strategy. Moreover, as imbalance exists between different people sizes in the image, and joints coordinates have different uncertainties depending on these sizes, we propose a method to automatically optimize weights associated to different people scales and joints for efficient training. PandaNet surpasses previous single-shot methods on several challenging datasets: a multi-person urban virtual but very realistic dataset (JTA Dataset), and two real world 3D multi-person datasets (CMU Panoptic and MuPoTS-3D).","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"128 1","pages":"6855-6864"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74130086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging 高分辨率噪声鲁棒成像中强度图像和神经形态事件的联合滤波
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00168
Zihao W. Wang, Peiqi Duan, O. Cossairt, A. Katsaggelos, Tiejun Huang, Boxin Shi
We present a novel computational imaging system with high resolution and low noise. Our system consists of a traditional video camera which captures high-resolution intensity images, and an event camera which encodes high-speed motion as a stream of asynchronous binary events. To process the hybrid input, we propose a unifying framework that first bridges the two sensing modalities via a noise-robust motion compensation model, and then performs joint image filtering. The filtered output represents the temporal gradient of the captured space-time volume, which can be viewed as motion-compensated event frames with high resolution and low noise. Therefore, the output can be widely applied to many existing event-based algorithms that are highly dependent on spatial resolution and noise robustness. In experimental results performed on both publicly available datasets as well as our contributing RGB-DAVIS dataset, we show systematic performance improvement in applications such as high frame-rate video synthesis, feature/corner detection and tracking, as well as high dynamic range image reconstruction.
提出了一种新型的高分辨率、低噪声的计算机成像系统。该系统由传统摄像机和事件摄像机组成,前者捕获高分辨率图像,后者将高速运动编码为异步二进制事件流。为了处理混合输入,我们提出了一个统一的框架,首先通过噪声鲁棒运动补偿模型连接两种传感模式,然后进行联合图像滤波。滤波后的输出表示捕获的时空体的时间梯度,可以看作是高分辨率、低噪声的运动补偿事件帧。因此,该输出可以广泛应用于许多现有的基于事件的算法,这些算法高度依赖于空间分辨率和噪声鲁棒性。在公开可用的数据集以及我们贡献的RGB-DAVIS数据集上进行的实验结果中,我们在高帧率视频合成,特征/角点检测和跟踪以及高动态范围图像重建等应用中显示了系统的性能改进。
{"title":"Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging","authors":"Zihao W. Wang, Peiqi Duan, O. Cossairt, A. Katsaggelos, Tiejun Huang, Boxin Shi","doi":"10.1109/cvpr42600.2020.00168","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00168","url":null,"abstract":"We present a novel computational imaging system with high resolution and low noise. Our system consists of a traditional video camera which captures high-resolution intensity images, and an event camera which encodes high-speed motion as a stream of asynchronous binary events. To process the hybrid input, we propose a unifying framework that first bridges the two sensing modalities via a noise-robust motion compensation model, and then performs joint image filtering. The filtered output represents the temporal gradient of the captured space-time volume, which can be viewed as motion-compensated event frames with high resolution and low noise. Therefore, the output can be widely applied to many existing event-based algorithms that are highly dependent on spatial resolution and noise robustness. In experimental results performed on both publicly available datasets as well as our contributing RGB-DAVIS dataset, we show systematic performance improvement in applications such as high frame-rate video synthesis, feature/corner detection and tracking, as well as high dynamic range image reconstruction.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"49 1","pages":"1606-1616"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74947919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation PFRL:用于6D姿态估计的无姿态强化学习
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01147
Jianzhun Shao, Yuhang Jiang, Gu Wang, Zhigang Li, Xiangyang Ji
6D pose estimation from a single RGB image is a challenging and vital task in computer vision. The current mainstream deep model methods resort to 2D images annotated with real-world ground-truth 6D object poses, whose collection is fairly cumbersome and expensive, even unavailable in many cases. In this work, to get rid of the burden of 6D annotations, we formulate the 6D pose refinement as a Markov Decision Process and impose on the reinforcement learning approach with only 2D image annotations as weakly-supervised 6D pose information, via a delicate reward definition and a composite reinforced optimization method for efficient and effective policy training. Experiments on LINEMOD and T-LESS datasets demonstrate that our Pose-Free approach is able to achieve state-of-the-art performance compared with the methods without using real-world ground-truth 6D pose labels.
从单个RGB图像估计6D姿态在计算机视觉中是一项具有挑战性和重要的任务。目前主流的深度模型方法采用的是用真实的地面真实的6D物体姿态注释的2D图像,其收集相当麻烦且昂贵,甚至在许多情况下无法使用。在这项工作中,为了摆脱6D注释的负担,我们将6D姿态细化为马尔可夫决策过程,并通过精细的奖励定义和复合增强优化方法,将仅2D图像注释作为弱监督6D姿态信息的强化学习方法强加于策略训练中,以实现高效和有效的策略训练。在LINEMOD和T-LESS数据集上的实验表明,与不使用真实世界的6D姿态标签的方法相比,我们的pose - free方法能够实现最先进的性能。
{"title":"PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation","authors":"Jianzhun Shao, Yuhang Jiang, Gu Wang, Zhigang Li, Xiangyang Ji","doi":"10.1109/cvpr42600.2020.01147","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01147","url":null,"abstract":"6D pose estimation from a single RGB image is a challenging and vital task in computer vision. The current mainstream deep model methods resort to 2D images annotated with real-world ground-truth 6D object poses, whose collection is fairly cumbersome and expensive, even unavailable in many cases. In this work, to get rid of the burden of 6D annotations, we formulate the 6D pose refinement as a Markov Decision Process and impose on the reinforcement learning approach with only 2D image annotations as weakly-supervised 6D pose information, via a delicate reward definition and a composite reinforced optimization method for efficient and effective policy training. Experiments on LINEMOD and T-LESS datasets demonstrate that our Pose-Free approach is able to achieve state-of-the-art performance compared with the methods without using real-world ground-truth 6D pose labels.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"236 1","pages":"11451-11460"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72908853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Separating Particulate Matter From a Single Microscopic Image 从单个显微图像中分离颗粒物质
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00464
Tushar Sandhan, J. Choi
Particulate matter (PM) is the blend of various solid and liquid particles suspended in atmosphere. These submicron particles are imperceptible for usual hand-held camera photography, but become a great obstacle in microscopic imaging. PM removal from a single microscopic image is a highly ill-posed and one of the challenging image denoising problems. In this work, we thoroughly analyze the physical properties of PM, microscope and their inevitable interaction; and propose an optimization scheme, which removes the PM from a high-resolution microscopic image within a few seconds. Experiments on real world microscopic images show that the proposed method significantly outperforms other competitive image denoising methods. It preserves the comprehensive microscopic foreground details while clearly separating the PM from a single monochromatic or color image.
颗粒物(PM)是悬浮在大气中的各种固体和液体颗粒的混合物。这些亚微米颗粒对于普通的手持相机摄影来说是难以察觉的,但在显微成像中却成为一个巨大的障碍。从单个显微图像中去除PM是一个高度不适定和具有挑战性的图像去噪问题之一。在这项工作中,我们深入分析了PM,显微镜的物理性质及其不可避免的相互作用;并提出了一种优化方案,该方案可以在几秒钟内从高分辨率显微图像中去除PM。在真实微观图像上的实验表明,该方法明显优于其他竞争图像去噪方法。它保留了全面的微观前景细节,同时清楚地将PM与单一单色或彩色图像分开。
{"title":"Separating Particulate Matter From a Single Microscopic Image","authors":"Tushar Sandhan, J. Choi","doi":"10.1109/CVPR42600.2020.00464","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00464","url":null,"abstract":"Particulate matter (PM) is the blend of various solid and liquid particles suspended in atmosphere. These submicron particles are imperceptible for usual hand-held camera photography, but become a great obstacle in microscopic imaging. PM removal from a single microscopic image is a highly ill-posed and one of the challenging image denoising problems. In this work, we thoroughly analyze the physical properties of PM, microscope and their inevitable interaction; and propose an optimization scheme, which removes the PM from a high-resolution microscopic image within a few seconds. Experiments on real world microscopic images show that the proposed method significantly outperforms other competitive image denoising methods. It preserves the comprehensive microscopic foreground details while clearly separating the PM from a single monochromatic or color image.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"4583-4592"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77180115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Searching for Actions on the Hyperbole 搜索对夸张的操作
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00122
Teng Long, P. Mettes, Heng Tao Shen, Cees G. M. Snoek
In this paper, we introduce hierarchical action search. Starting from the observation that hierarchies are mostly ignored in the action literature, we retrieve not only individual actions but also relevant and related actions, given an action name or video example as input. We propose a hyperbolic action network, which is centered around a hyperbolic space shared by action hierarchies and videos. Our discriminative hyperbolic embedding projects actions on the shared space while jointly optimizing hypernym-hyponym relations between action pairs and a large margin separation between all actions. The projected actions serve as hyperbolic prototypes that we match with projected video representations. The result is a learned space where videos are positioned in entailment cones formed by different subtrees. To perform search in this space, we start from a query and increasingly enlarge its entailment cone to retrieve hierarchically relevant action videos. Experiments on three action datasets with new hierarchy annotations show the effectiveness of our approach for hierarchical action search by name and by video example, regardless of whether queried actions have been seen or not during training. Our implementation is available at https://github.com/Tenglon/hyperbolic_action
在本文中,我们引入了层次动作搜索。观察到层次结构在动作文献中大多被忽略,我们不仅检索单个动作,还检索相关和相关的动作,给定动作名称或视频示例作为输入。我们提出了一个双曲动作网络,它以动作层次和视频共享的双曲空间为中心。我们的判别双曲嵌入在共享空间上投射动作,同时共同优化动作对之间的上下关系和所有动作之间的大间距分隔。投影动作作为双曲原型,我们将其与投影视频表示相匹配。结果是一个学习空间,其中视频被放置在由不同子树形成的蕴涵锥体中。为了在这个空间中进行搜索,我们从一个查询开始,逐渐扩大其蕴涵锥,以检索层次相关的动作视频。在三个具有新层次注释的动作数据集上的实验表明,无论在训练过程中是否看到查询的动作,我们的方法都可以通过名称和视频示例进行分层动作搜索。我们的实现可以在https://github.com/Tenglon/hyperbolic_action上获得
{"title":"Searching for Actions on the Hyperbole","authors":"Teng Long, P. Mettes, Heng Tao Shen, Cees G. M. Snoek","doi":"10.1109/cvpr42600.2020.00122","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00122","url":null,"abstract":"In this paper, we introduce hierarchical action search. Starting from the observation that hierarchies are mostly ignored in the action literature, we retrieve not only individual actions but also relevant and related actions, given an action name or video example as input. We propose a hyperbolic action network, which is centered around a hyperbolic space shared by action hierarchies and videos. Our discriminative hyperbolic embedding projects actions on the shared space while jointly optimizing hypernym-hyponym relations between action pairs and a large margin separation between all actions. The projected actions serve as hyperbolic prototypes that we match with projected video representations. The result is a learned space where videos are positioned in entailment cones formed by different subtrees. To perform search in this space, we start from a query and increasingly enlarge its entailment cone to retrieve hierarchically relevant action videos. Experiments on three action datasets with new hierarchy annotations show the effectiveness of our approach for hierarchical action search by name and by video example, regardless of whether queried actions have been seen or not during training. Our implementation is available at https://github.com/Tenglon/hyperbolic_action","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"205 1","pages":"1138-1147"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77180913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1