首页 > 最新文献

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
PointGrid: A Deep Network for 3D Shape Understanding PointGrid:用于三维形状理解的深度网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00959
Truc Le, Y. Duan
Volumetric grid is widely used for 3D deep learning due to its regularity. However the use of relatively lower order local approximation functions such as piece-wise constant function (occupancy grid) or piece-wise linear function (distance field) to approximate 3D shape means that it needs a very high-resolution grid to represent finer geometry details, which could be memory and computationally inefficient. In this work, we propose the PointGrid, a 3D convolutional network that incorporates a constant number of points within each grid cell thus allowing the network to learn higher order local approximation functions that could better represent the local geometry shape details. With experiments on popular shape recognition benchmarks, PointGrid demonstrates state-of-the-art performance over existing deep learning methods on both classification and segmentation.
体积网格因其规律性被广泛应用于三维深度学习。然而,使用相对较低阶的局部近似函数,如分段常数函数(占用网格)或分段线性函数(距离场)来近似3D形状意味着它需要一个非常高分辨率的网格来表示更精细的几何细节,这可能会占用内存和计算效率低下。在这项工作中,我们提出了PointGrid,这是一个3D卷积网络,在每个网格单元中包含恒定数量的点,从而允许网络学习高阶局部近似函数,从而更好地表示局部几何形状细节。通过对流行的形状识别基准的实验,PointGrid在分类和分割方面展示了比现有深度学习方法更先进的性能。
{"title":"PointGrid: A Deep Network for 3D Shape Understanding","authors":"Truc Le, Y. Duan","doi":"10.1109/CVPR.2018.00959","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00959","url":null,"abstract":"Volumetric grid is widely used for 3D deep learning due to its regularity. However the use of relatively lower order local approximation functions such as piece-wise constant function (occupancy grid) or piece-wise linear function (distance field) to approximate 3D shape means that it needs a very high-resolution grid to represent finer geometry details, which could be memory and computationally inefficient. In this work, we propose the PointGrid, a 3D convolutional network that incorporates a constant number of points within each grid cell thus allowing the network to learn higher order local approximation functions that could better represent the local geometry shape details. With experiments on popular shape recognition benchmarks, PointGrid demonstrates state-of-the-art performance over existing deep learning methods on both classification and segmentation.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"96 1","pages":"9204-9214"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74373639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 290
DenseASPP for Semantic Segmentation in Street Scenes 街景语义分割的DenseASPP
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00388
Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang
Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.
语义图像分割是自动驾驶中基本的街景理解任务,将高分辨率图像中的每个像素划分为一组语义标签。与其他场景不同,自动驾驶场景中的物体呈现非常大的尺度变化,这对高级特征表示提出了很大的挑战,因为必须对多尺度信息进行正确编码。为了解决这个问题,引入了亚光卷积[14],在不牺牲空间分辨率的情况下生成具有更大接受域的特征。在亚特罗斯卷积的基础上,提出了亚特罗斯空间金字塔池(ASPP)[2],以不同的扩张率将多个亚特罗斯卷积特征连接成最终的特征表示。尽管ASPP能够生成多尺度特征,但我们认为尺度轴上的特征分辨率对于自动驾驶场景来说不够密集。为此,我们提出了密集连接的亚特鲁斯空间金字塔池(DenseASPP),它以密集的方式连接一组亚特鲁斯卷积层,从而生成的多尺度特征不仅覆盖更大的尺度范围,而且覆盖该尺度范围的密度也不显著增加模型的大小。我们在街景基准cityscape[4]上对DenseASPP进行了评估,并获得了最先进的性能。
{"title":"DenseASPP for Semantic Segmentation in Street Scenes","authors":"Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang","doi":"10.1109/CVPR.2018.00388","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00388","url":null,"abstract":"Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"12 1","pages":"3684-3692"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76684713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1036
Mesoscopic Facial Geometry Inference Using Deep Neural Networks 基于深度神经网络的介观面部几何推理
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00877
Loc Huynh, Weikai Chen, Shunsuke Saito, Jun Xing, Koki Nagano, Andrew Jones, P. Debevec, Hao Li
We present a learning-based approach for synthesizing facial geometry at medium and fine scales from diffusely-lit facial texture maps. When applied to an image sequence, the synthesized detail is temporally coherent. Unlike current state-of-the-art methods [17, 5], which assume "dark is deep", our model is trained with measured facial detail collected using polarized gradient illumination in a Light Stage [20]. This enables us to produce plausible facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as concavities such as at moles, hair stubble, and occluded pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network [29] and super resolution network [43]. To effectively capture geometric detail at both mid- and high frequencies, we factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably with a high-quality active facial scanhening technique, and require only a single passive lighting condition without a complex scanning setup.
我们提出了一种基于学习的方法,用于从漫射光照面部纹理图中合成中等和精细尺度的面部几何。当应用于图像序列时,合成细节在时间上是一致的。与当前最先进的方法不同[17,5],该方法假设“黑暗是深的”,我们的模型是使用在Light Stage中使用偏振梯度照明收集的测量面部细节进行训练[20]。这使我们能够在整个面部产生可信的面部细节,包括以前的方法可能错误地将深色特征解释为凹陷,如痣,头发茬和闭塞的毛孔。与直接推断3D几何形状不同,我们提出在高分辨率位移图中编码精细细节,这些位移图是通过采用最先进的图像到图像平移网络[29]和超分辨率网络[43]的混合网络学习的。为了有效地捕获中频和高频的几何细节,我们将学习分解为两个独立的子网络,从而可以对所有面部细节进行建模。我们基于学习的方法的结果与高质量的主动面部扫描技术相比具有优势,并且只需要一个被动照明条件,而不需要复杂的扫描设置。
{"title":"Mesoscopic Facial Geometry Inference Using Deep Neural Networks","authors":"Loc Huynh, Weikai Chen, Shunsuke Saito, Jun Xing, Koki Nagano, Andrew Jones, P. Debevec, Hao Li","doi":"10.1109/CVPR.2018.00877","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00877","url":null,"abstract":"We present a learning-based approach for synthesizing facial geometry at medium and fine scales from diffusely-lit facial texture maps. When applied to an image sequence, the synthesized detail is temporally coherent. Unlike current state-of-the-art methods [17, 5], which assume \"dark is deep\", our model is trained with measured facial detail collected using polarized gradient illumination in a Light Stage [20]. This enables us to produce plausible facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as concavities such as at moles, hair stubble, and occluded pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network [29] and super resolution network [43]. To effectively capture geometric detail at both mid- and high frequencies, we factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably with a high-quality active facial scanhening technique, and require only a single passive lighting condition without a complex scanning setup.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"52 1","pages":"8407-8416"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76900806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Augmenting Crowd-Sourced 3D Reconstructions Using Semantic Detections 使用语义检测增强众包3D重建
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00206
True Price, Johannes L. Schönberger, Zhen Wei, M. Pollefeys, Jan-Michael Frahm
Image-based 3D reconstruction for Internet photo collections has become a robust technology to produce impressive virtual representations of real-world scenes. However, several fundamental challenges remain for Structure-from-Motion (SfM) pipelines, namely: the placement and reconstruction of transient objects only observed in single views, estimating the absolute scale of the scene, and (suprisingly often) recovering ground surfaces in the scene. We propose a method to jointly address these remaining open problems of SfM. In particular, we focus on detecting people in individual images and accurately placing them into an existing 3D model. As part of this placement, our method also estimates the absolute scale of the scene from object semantics, which in this case constitutes the height distribution of the population. Further, we obtain a smooth approximation of the ground surface and recover the gravity vector of the scene directly from the individual person detections. We demonstrate the results of our approach on a number of unordered Internet photo collections, and we quantitatively evaluate the obtained absolute scene scales.
基于图像的3D网络照片重建已经成为一项强大的技术,可以产生令人印象深刻的虚拟现实场景。然而,对于结构-从运动(SfM)管道来说,仍然存在几个基本的挑战,即:仅在单个视图中观察到的瞬态物体的放置和重建,估计场景的绝对规模,以及(令人惊讶的是)恢复场景中的地面。我们提出了一种方法来共同解决SfM中这些悬而未决的问题。特别是,我们专注于检测单个图像中的人,并准确地将他们放入现有的3D模型中。作为放置的一部分,我们的方法还从对象语义中估计场景的绝对规模,在这种情况下,它构成了人口的高度分布。此外,我们获得了地面的光滑近似,并直接从个体检测中恢复场景的重力矢量。我们在一些无序的互联网照片集中展示了我们的方法的结果,我们定量地评估了获得的绝对场景尺度。
{"title":"Augmenting Crowd-Sourced 3D Reconstructions Using Semantic Detections","authors":"True Price, Johannes L. Schönberger, Zhen Wei, M. Pollefeys, Jan-Michael Frahm","doi":"10.1109/CVPR.2018.00206","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00206","url":null,"abstract":"Image-based 3D reconstruction for Internet photo collections has become a robust technology to produce impressive virtual representations of real-world scenes. However, several fundamental challenges remain for Structure-from-Motion (SfM) pipelines, namely: the placement and reconstruction of transient objects only observed in single views, estimating the absolute scale of the scene, and (suprisingly often) recovering ground surfaces in the scene. We propose a method to jointly address these remaining open problems of SfM. In particular, we focus on detecting people in individual images and accurately placing them into an existing 3D model. As part of this placement, our method also estimates the absolute scale of the scene from object semantics, which in this case constitutes the height distribution of the population. Further, we obtain a smooth approximation of the ground surface and recover the gravity vector of the scene directly from the individual person detections. We demonstrate the results of our approach on a number of unordered Internet photo collections, and we quantitatively evaluate the obtained absolute scene scales.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"96 1","pages":"1926-1935"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80971681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
High Performance Visual Tracking with Siamese Region Proposal Network 基于暹罗区域建议网络的高性能视觉跟踪
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00935
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu
Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks. However, most of these trackers can hardly get top performance with real-time speed. In this paper, we propose the Siamese region proposal network (Siamese-RPN) which is end-to-end trained off-line with large-scale image pairs. Specifically, it consists of Siamese subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch. In the inference phase, the proposed framework is formulated as a local one-shot detection task. We can pre-compute the template branch of the Siamese subnetwork and formulate the correlation layers as trivial convolution layers to perform online tracking. Benefit from the proposal refinement, traditional multi-scale test and online fine-tuning can be discarded. The Siamese-RPN runs at 160 FPS while achieving leading performance in VOT2015, VOT2016 and VOT2017 real-time challenges.
近年来,视觉对象跟踪一直是一个基本的主题,许多基于深度学习的跟踪器已经在多个基准测试中取得了最先进的性能。然而,大多数跟踪器很难在实时速度下获得最佳性能。本文提出了基于大规模图像对的端到端离线训练的暹罗区域建议网络(Siamese- rpn)。其中包括用于特征提取的暹罗子网络和包含分类分支和回归分支的区域建议子网络。在推理阶段,提出的框架被表述为局部单次检测任务。我们可以预先计算暹罗子网络的模板分支,并将相关层表示为平凡卷积层来进行在线跟踪。得益于提议的细化,传统的多尺度测试和在线微调可以被抛弃。Siamese-RPN以160 FPS的速度运行,在VOT2015、VOT2016和VOT2017实时挑战中取得了领先的性能。
{"title":"High Performance Visual Tracking with Siamese Region Proposal Network","authors":"Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu","doi":"10.1109/CVPR.2018.00935","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00935","url":null,"abstract":"Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks. However, most of these trackers can hardly get top performance with real-time speed. In this paper, we propose the Siamese region proposal network (Siamese-RPN) which is end-to-end trained off-line with large-scale image pairs. Specifically, it consists of Siamese subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch. In the inference phase, the proposed framework is formulated as a local one-shot detection task. We can pre-compute the template branch of the Siamese subnetwork and formulate the correlation layers as trivial convolution layers to perform online tracking. Benefit from the proposal refinement, traditional multi-scale test and online fine-tuning can be discarded. The Siamese-RPN runs at 160 FPS while achieving leading performance in VOT2015, VOT2016 and VOT2017 real-time challenges.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"8971-8980"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83111186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1754
Robust Hough Transform Based 3D Reconstruction from Circular Light Fields 基于鲁棒霍夫变换的圆形光场三维重建
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00765
A. Vianello, J. Ackermann, M. Diebold, B. Jähne
Light-field imaging is based on images taken on a regular grid. Thus, high-quality 3D reconstructions are obtainable by analyzing orientations in epipolar plane images (EPIs). Unfortunately, such data only allows to evaluate one side of the object. Moreover, a constant intensity along each orientation is mandatory for most of the approaches. This paper presents a novel method which allows to reconstruct depth information from data acquired with a circular camera motion, termed circular light fields. With this approach it is possible to determine the full 360° view of target objects. Additionally, circular light fields allow retrieving depth from datasets acquired with telecentric lenses, which is not possible with linear light fields. The proposed method finds trajectories of 3D points in the EPIs by means of a modified Hough transform. For this purpose, binary EPI-edge images are used, which not only allow to obtain reliable depth information, but also overcome the limitation of constant intensity along trajectories. Experimental results on synthetic and real datasets demonstrate the quality of the proposed algorithm.
光场成像是基于在规则网格上拍摄的图像。因此,通过分析极平面图像(EPIs)的方向,可以获得高质量的三维重建。不幸的是,这样的数据只允许计算对象的一面。此外,对于大多数方法来说,每个方向的恒定强度是强制性的。本文提出了一种新的方法,该方法可以从圆形相机运动获得的数据中重建深度信息,称为圆形光场。通过这种方法,可以确定目标物体的完整360°视图。此外,圆形光场允许从远心透镜获得的数据集中检索深度,这是线性光场无法实现的。该方法利用改进的Hough变换找到EPIs中三维点的轨迹。为此,使用二值epi边缘图像,不仅可以获得可靠的深度信息,而且可以克服沿轨迹恒定强度的限制。在合成数据集和真实数据集上的实验结果证明了该算法的有效性。
{"title":"Robust Hough Transform Based 3D Reconstruction from Circular Light Fields","authors":"A. Vianello, J. Ackermann, M. Diebold, B. Jähne","doi":"10.1109/CVPR.2018.00765","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00765","url":null,"abstract":"Light-field imaging is based on images taken on a regular grid. Thus, high-quality 3D reconstructions are obtainable by analyzing orientations in epipolar plane images (EPIs). Unfortunately, such data only allows to evaluate one side of the object. Moreover, a constant intensity along each orientation is mandatory for most of the approaches. This paper presents a novel method which allows to reconstruct depth information from data acquired with a circular camera motion, termed circular light fields. With this approach it is possible to determine the full 360° view of target objects. Additionally, circular light fields allow retrieving depth from datasets acquired with telecentric lenses, which is not possible with linear light fields. The proposed method finds trajectories of 3D points in the EPIs by means of a modified Hough transform. For this purpose, binary EPI-edge images are used, which not only allow to obtain reliable depth information, but also overcome the limitation of constant intensity along trajectories. Experimental results on synthetic and real datasets demonstrate the quality of the proposed algorithm.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"105 1","pages":"7327-7335"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80943086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials RayNet:学习用射线势进行体积三维重建
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00410
Despoina Paschalidou, Ali O. Ulusoy, Carolin Schmitt, L. Gool, Andreas Geiger
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.
在本文中,我们考虑了使用从不同视图捕获的图像重建密集三维模型的问题。最近基于卷积神经网络(CNN)的方法允许从数据中学习整个任务。然而,它们没有纳入图像形成的物理,如透视几何和遮挡。相反,基于具有射线势的马尔可夫随机场(MRF)的经典方法明确地模拟了这些物理过程,但它们无法处理不同视点之间的大表面外观变化。在本文中,我们提出了RayNet,它结合了两个框架的优势。RayNet集成了一个学习视图不变特征表示的CNN和一个明确编码透视投影和遮挡物理的MRF。我们使用经验风险最小化对RayNet进行端到端培训。我们在具有挑战性的真实世界数据集上彻底评估了我们的方法,并展示了其优于分段训练基线、手工制作模型以及其他基于学习的方法的优点。
{"title":"RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials","authors":"Despoina Paschalidou, Ali O. Ulusoy, Carolin Schmitt, L. Gool, Andreas Geiger","doi":"10.1109/CVPR.2018.00410","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00410","url":null,"abstract":"In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"52 1","pages":"3897-3906"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89074136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Divide and Conquer for Full-Resolution Light Field Deblurring 分而治之的全分辨率光场去模糊
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00672
M. Mohan, A. Rajagopalan
The increasing popularity of computational light field (LF) cameras has necessitated the need for tackling motion blur which is a ubiquitous phenomenon in hand-held photography. The state-of-the-art method for blind deblurring of LFs of general 3D scenes is limited to handling only downsampled LF, both in spatial and angular resolution. This is due to the computational overhead involved in processing data-hungry full-resolution 4D LF altogether. Moreover, the method warrants high-end GPUs for optimization and is ineffective for wide-angle settings and irregular camera motion. In this paper, we introduce a new blind motion deblurring strategy for LFs which alleviates these limitations significantly. Our model achieves this by isolating 4D LF motion blur across the 2D subaperture images, thus paving the way for independent deblurring of these subaperture images. Furthermore, our model accommodates common camera motion parameterization across the subaperture images. Consequently, blind deblurring of any single subaperture image elegantly paves the way for cost-effective non-blind deblurring of the other subaperture images. Our approach is CPU-efficient computationally and can effectively deblur full-resolution LFs.
随着计算光场相机(LF)的日益普及,需要解决运动模糊问题,这是手持摄影中普遍存在的现象。对于一般3D场景的LF,最先进的盲去模糊方法仅限于处理下采样的LF,无论是空间分辨率还是角度分辨率。这是由于处理数据饥渴的全分辨率4D LF所涉及的计算开销。此外,该方法需要高端gpu进行优化,但对于广角设置和不规则相机运动无效。在本文中,我们引入了一种新的LFs盲运动去模糊策略,该策略显著地缓解了这些限制。我们的模型通过隔离2D子孔径图像中的4D LF运动模糊来实现这一点,从而为这些子孔径图像的独立去模糊铺平了道路。此外,我们的模型适应了跨子孔径图像的常见相机运动参数化。因此,任何单个子孔径图像的盲去模糊都为其他子孔径图像的经济有效的非盲去模糊铺平了道路。我们的方法在计算上是cpu高效的,并且可以有效地消除全分辨率LFs的模糊。
{"title":"Divide and Conquer for Full-Resolution Light Field Deblurring","authors":"M. Mohan, A. Rajagopalan","doi":"10.1109/CVPR.2018.00672","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00672","url":null,"abstract":"The increasing popularity of computational light field (LF) cameras has necessitated the need for tackling motion blur which is a ubiquitous phenomenon in hand-held photography. The state-of-the-art method for blind deblurring of LFs of general 3D scenes is limited to handling only downsampled LF, both in spatial and angular resolution. This is due to the computational overhead involved in processing data-hungry full-resolution 4D LF altogether. Moreover, the method warrants high-end GPUs for optimization and is ineffective for wide-angle settings and irregular camera motion. In this paper, we introduce a new blind motion deblurring strategy for LFs which alleviates these limitations significantly. Our model achieves this by isolating 4D LF motion blur across the 2D subaperture images, thus paving the way for independent deblurring of these subaperture images. Furthermore, our model accommodates common camera motion parameterization across the subaperture images. Consequently, blind deblurring of any single subaperture image elegantly paves the way for cost-effective non-blind deblurring of the other subaperture images. Our approach is CPU-efficient computationally and can effectively deblur full-resolution LFs.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"30 1","pages":"6421-6429"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90308082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Monocular Relative Depth Perception with Web Stereo Data Supervision 基于网络立体数据监督的单眼相对深度感知
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00040
Ke Xian, Chunhua Shen, ZHIGUO CAO, Hao Lu, Yang Xiao, Ruibo Li, Zhenbo Luo
In this paper we study the problem of monocular relative depth perception in the wild. We introduce a simple yet effective method to automatically generate dense relative depth annotations from web stereo images, and propose a new dataset that consists of diverse images as well as corresponding dense relative depth maps. Further, an improved ranking loss is introduced to deal with imbalanced ordinal relations, enforcing the network to focus on a set of hard pairs. Experimental results demonstrate that our proposed approach not only achieves state-of-the-art accuracy of relative depth perception in the wild, but also benefits other dense per-pixel prediction tasks, e.g., metric depth estimation and semantic segmentation.
本文主要研究野生环境下的单眼相对深度感知问题。本文介绍了一种简单而有效的方法,从网络立体图像中自动生成密集相对深度注释,并提出了一个由不同图像组成的新数据集以及相应的密集相对深度图。进一步,引入改进的排序损失来处理不平衡有序关系,使网络集中在一组硬对上。实验结果表明,我们提出的方法不仅在野外实现了最先进的相对深度感知精度,而且还有利于其他密集的逐像素预测任务,例如度量深度估计和语义分割。
{"title":"Monocular Relative Depth Perception with Web Stereo Data Supervision","authors":"Ke Xian, Chunhua Shen, ZHIGUO CAO, Hao Lu, Yang Xiao, Ruibo Li, Zhenbo Luo","doi":"10.1109/CVPR.2018.00040","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00040","url":null,"abstract":"In this paper we study the problem of monocular relative depth perception in the wild. We introduce a simple yet effective method to automatically generate dense relative depth annotations from web stereo images, and propose a new dataset that consists of diverse images as well as corresponding dense relative depth maps. Further, an improved ranking loss is introduced to deal with imbalanced ordinal relations, enforcing the network to focus on a set of hard pairs. Experimental results demonstrate that our proposed approach not only achieves state-of-the-art accuracy of relative depth perception in the wild, but also benefits other dense per-pixel prediction tasks, e.g., metric depth estimation and semantic segmentation.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"21 1","pages":"311-320"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84502200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 154
A Revised Underwater Image Formation Model 一种改进的水下图像形成模型
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00703
D. Akkaynak, T. Treibitz
The current underwater image formation model descends from atmospheric dehazing equations where attenuation is a weak function of wavelength. We recently showed that this model introduces significant errors and dependencies in the estimation of the direct transmission signal because underwater, light attenuates in a wavelength-dependent manner. Here, we show that the backscattered signal derived from the current model also suffers from dependencies that were previously unaccounted for. In doing so, we use oceanographic measurements to derive the physically valid space of backscatter, and further show that the wideband coefficients that govern backscatter are different than those that govern direct transmission, even though the current model treats them to be the same. We propose a revised equation for underwater image formation that takes these differences into account, and validate it through in situ experiments underwater. This revised model might explain frequent instabilities of current underwater color reconstruction models, and calls for the development of new methods.
目前的水下图像形成模型来源于大气除雾方程,其中衰减是波长的弱函数。我们最近表明,该模型在直接传输信号的估计中引入了显着的误差和依赖性,因为在水下,光以波长相关的方式衰减。在这里,我们表明,从当前模型导出的后向散射信号也受到以前未考虑的依赖关系的影响。在此过程中,我们使用海洋学测量来推导物理上有效的后向散射空间,并进一步表明,控制后向散射的宽带系数与控制直接传输的宽带系数不同,尽管目前的模型认为它们是相同的。我们提出了一个考虑到这些差异的水下图像形成修正方程,并通过水下原位实验对其进行了验证。修正后的模型可以解释当前水下颜色重建模型的频繁不稳定性,并呼吁开发新的方法。
{"title":"A Revised Underwater Image Formation Model","authors":"D. Akkaynak, T. Treibitz","doi":"10.1109/CVPR.2018.00703","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00703","url":null,"abstract":"The current underwater image formation model descends from atmospheric dehazing equations where attenuation is a weak function of wavelength. We recently showed that this model introduces significant errors and dependencies in the estimation of the direct transmission signal because underwater, light attenuates in a wavelength-dependent manner. Here, we show that the backscattered signal derived from the current model also suffers from dependencies that were previously unaccounted for. In doing so, we use oceanographic measurements to derive the physically valid space of backscatter, and further show that the wideband coefficients that govern backscatter are different than those that govern direct transmission, even though the current model treats them to be the same. We propose a revised equation for underwater image formation that takes these differences into account, and validate it through in situ experiments underwater. This revised model might explain frequent instabilities of current underwater color reconstruction models, and calls for the development of new methods.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"6723-6732"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86274766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 172
期刊
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1