首页 > 最新文献

2016 Fourth International Conference on 3D Vision (3DV)最新文献

英文 中文
Learning Camera Viewpoint Using CNN to Improve 3D Body Pose Estimation 使用CNN学习相机视点来改进3D身体姿势估计
Pub Date : 2016-09-18 DOI: 10.1109/3DV.2016.75
Mona Fathollahi Ghezelghieh, R. Kasturi, Sudeep Sarkar
The objective of this work is to estimate 3D human pose from a single RGB image. Extracting image representations which incorporate both spatial relation of body parts and their relative depth plays an essential role in accurate3D pose reconstruction. In this paper, for the first time, we show that camera viewpoint in combination to 2D joint locations significantly improves 3D pose accuracy without the explicit use of perspective geometry mathematical models. To this end, we train a deep Convolutional Neural Net-work (CNN) to learn categorical camera viewpoint. To make the network robust against clothing and body shape of the subject in the image, we utilized 3D computer rendering to synthesize additional training images. We test our framework on the largest 3D pose estimation bench-mark, Human3.6m, and achieve up to 20% error reduction on standing-pose activities compared to the state-of-the-art approaches that do not use body part segmentation.
这项工作的目的是从单个RGB图像中估计3D人体姿势。提取既包含人体各部位空间关系又包含其相对深度的图像表示对于精确的三维姿态重建至关重要。在本文中,我们首次展示了在不明确使用透视几何数学模型的情况下,相机视点与2D关节位置的组合显著提高了3D姿态精度。为此,我们训练了一个深度卷积神经网络(CNN)来学习分类相机视点。为了使网络对图像中受试者的服装和体型具有鲁棒性,我们利用3D计算机渲染来合成额外的训练图像。我们在最大的3D姿势估计基准Human3.6m上测试了我们的框架,与不使用身体部位分割的最先进方法相比,在站立姿势活动上实现了高达20%的误差降低。
{"title":"Learning Camera Viewpoint Using CNN to Improve 3D Body Pose Estimation","authors":"Mona Fathollahi Ghezelghieh, R. Kasturi, Sudeep Sarkar","doi":"10.1109/3DV.2016.75","DOIUrl":"https://doi.org/10.1109/3DV.2016.75","url":null,"abstract":"The objective of this work is to estimate 3D human pose from a single RGB image. Extracting image representations which incorporate both spatial relation of body parts and their relative depth plays an essential role in accurate3D pose reconstruction. In this paper, for the first time, we show that camera viewpoint in combination to 2D joint locations significantly improves 3D pose accuracy without the explicit use of perspective geometry mathematical models. To this end, we train a deep Convolutional Neural Net-work (CNN) to learn categorical camera viewpoint. To make the network robust against clothing and body shape of the subject in the image, we utilized 3D computer rendering to synthesize additional training images. We test our framework on the largest 3D pose estimation bench-mark, Human3.6m, and achieve up to 20% error reduction on standing-pose activities compared to the state-of-the-art approaches that do not use body part segmentation.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131299655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Dense Wide-Baseline Scene Flow from Two Handheld Video Cameras 密集的宽基线场景流从两个手持摄像机
Pub Date : 2016-09-16 DOI: 10.1109/3DV.2016.36
Christian Richardt, Hyeongwoo Kim, Levi Valgaerts, C. Theobalt
We propose a new technique for computing dense scene flow from two handheld videos with wide camera baselines and different photometric properties due to different sensors or camera settings like exposure and white balance. Our technique innovates in two ways over existing methods: (1) it supports independently moving cameras, and (2) it computes dense scene flow for wide-baseline scenarios. We achieve this by combining state-of-the-art wide-baseline correspondence finding with a variational scene flow formulation. First, we compute dense, wide-baseline correspondences using DAISY descriptors for matching between cameras and over time. We then detect and replace occluded pixels in the correspondence fields using a novel edge-preserving Laplacian correspondence completion technique. We finally refine the computed correspondence fields in a variational scene flow formulation. We show dense scene flow results computed from challenging datasets with independently moving, handheld cameras of varying camera settings.
我们提出了一种新技术,用于计算两个手持视频的密集场景流,这些视频具有宽相机基线和由于不同传感器或相机设置(如曝光和白平衡)而导致的不同光度属性。与现有方法相比,我们的技术在两个方面进行了创新:(1)它支持独立移动的摄像机,(2)它计算宽基线场景的密集场景流。我们通过将最先进的宽基线对应发现与变分场景流公式相结合来实现这一点。首先,我们使用DAISY描述符计算密集的宽基线对应,用于相机和时间之间的匹配。然后,我们使用一种新的保持边缘的拉普拉斯对应补全技术检测和替换对应域中被遮挡的像素。最后,我们在变分场景流公式中改进了计算对应场。我们展示了密集的场景流结果从具有不同相机设置的独立移动手持相机的具有挑战性的数据集计算。
{"title":"Dense Wide-Baseline Scene Flow from Two Handheld Video Cameras","authors":"Christian Richardt, Hyeongwoo Kim, Levi Valgaerts, C. Theobalt","doi":"10.1109/3DV.2016.36","DOIUrl":"https://doi.org/10.1109/3DV.2016.36","url":null,"abstract":"We propose a new technique for computing dense scene flow from two handheld videos with wide camera baselines and different photometric properties due to different sensors or camera settings like exposure and white balance. Our technique innovates in two ways over existing methods: (1) it supports independently moving cameras, and (2) it computes dense scene flow for wide-baseline scenarios. We achieve this by combining state-of-the-art wide-baseline correspondence finding with a variational scene flow formulation. First, we compute dense, wide-baseline correspondences using DAISY descriptors for matching between cameras and over time. We then detect and replace occluded pixels in the correspondence fields using a novel edge-preserving Laplacian correspondence completion technique. We finally refine the computed correspondence fields in a variational scene flow formulation. We show dense scene flow results computed from challenging datasets with independently moving, handheld cameras of varying camera settings.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131055076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
SpectroMeter: Amortized Sublinear Spectral Approximation of Distance on Graphs 谱仪:图上距离的平摊亚线性谱近似
Pub Date : 2016-09-15 DOI: 10.1109/3DV.2016.60
R. Litman, A. Bronstein
We present a method to approximate pairwise distance on a graph, having an amortized sub-linear complexity in its size. The proposed method follows the so called heat method due to Crane et al. The only additional input are the values of the eigenfunctions of the graph Laplacian at a subset of the vertices. Using these values we estimate a random walk from the source points, and normalize the result into a unit gradient function. The eigenfunctions are then used to synthesize distance values abiding by these constraints at desired locations. We show that this method works in practice on different types of inputs ranging from triangular meshes to general graphs. We also demonstrate that the resulting approximate distance is accurate enough to be used as the input to a recent method for intrinsic shape correspondence computation.
我们提出了一种在图上近似两两距离的方法,其大小具有平摊的次线性复杂度。该方法遵循Crane等人提出的热法。唯一额外的输入是图拉普拉斯在一个顶点子集上的特征函数的值。使用这些值,我们从源点估计随机游走,并将结果归一化为单位梯度函数。然后使用特征函数在期望位置合成符合这些约束的距离值。我们证明了这种方法在实践中适用于从三角形网格到一般图的不同类型的输入。我们还证明了所得到的近似距离足够精确,可以用作最近的固有形状对应计算方法的输入。
{"title":"SpectroMeter: Amortized Sublinear Spectral Approximation of Distance on Graphs","authors":"R. Litman, A. Bronstein","doi":"10.1109/3DV.2016.60","DOIUrl":"https://doi.org/10.1109/3DV.2016.60","url":null,"abstract":"We present a method to approximate pairwise distance on a graph, having an amortized sub-linear complexity in its size. The proposed method follows the so called heat method due to Crane et al. The only additional input are the values of the eigenfunctions of the graph Laplacian at a subset of the vertices. Using these values we estimate a random walk from the source points, and normalize the result into a unit gradient function. The eigenfunctions are then used to synthesize distance values abiding by these constraints at desired locations. We show that this method works in practice on different types of inputs ranging from triangular meshes to general graphs. We also demonstrate that the resulting approximate distance is accurate enough to be used as the input to a recent method for intrinsic shape correspondence computation.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"07 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129822377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
3D Face Reconstruction by Learning from Synthetic Data 基于合成数据学习的三维人脸重建
Pub Date : 2016-09-14 DOI: 10.1109/3DV.2016.56
Elad Richardson, Matan Sela, R. Kimmel
Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.
从单幅图像中快速、鲁棒地重建面部几何结构是一项具有挑战性的任务,具有众多的应用。在这里,我们介绍了一种基于学习的方法,用于从单个图像重建三维人脸。目前的人脸恢复方法依赖于关键特征点的精确定位。相比之下,该方法基于卷积神经网络(CNN),直接从图像中提取人脸几何形状。尽管这种深度架构在复杂的计算机视觉问题上优于其他模型,但正确训练它们需要大量带注释的示例数据集。对于三维人脸,目前还没有大容量的数据集,而获取这样的大数据是一项繁琐的任务。作为一种替代方案,我们建议生成随机的,但接近照片逼真的,几何形状已知的面部图像。所建议的模型成功地从真实图像中恢复面部形状,即使是在各种光照条件下具有极端表情的面部。
{"title":"3D Face Reconstruction by Learning from Synthetic Data","authors":"Elad Richardson, Matan Sela, R. Kimmel","doi":"10.1109/3DV.2016.56","DOIUrl":"https://doi.org/10.1109/3DV.2016.56","url":null,"abstract":"Fast and robust three-dimensional reconstruction of facial geometric structure from a single image is a challenging task with numerous applications. Here, we introduce a learning-based approach for reconstructing a three-dimensional face from a single image. Recent face recovery methods rely on accurate localization of key characteristic points. In contrast, the proposed approach is based on a Convolutional-Neural-Network (CNN) which extracts the face geometry directly from its image. Although such deep architectures outperform other models in complex computer vision problems, training them properly requires a large dataset of annotated examples. In the case of three-dimensional faces, currently, there are no large volume data sets, while acquiring such big-data is a tedious task. As an alternative, we propose to generate random, yet nearly photo-realistic, facial images for which the geometric form is known. The suggested model successfully recovers facial shapes from real images, even for faces with extreme expressions and under various lighting conditions.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124361419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 283
Single-Image RGB Photometric Stereo with Spatially-Varying Albedo 具有空间变化反照率的单图像RGB光度立体
Pub Date : 2016-09-14 DOI: 10.1109/3DV.2016.34
Ayan Chakrabarti, Kalyan Sunkavalli
We present a single-shot system to recover surface geometry of objects with spatially-varying albedos, from images captured under a calibrated RGB photometric stereo setup-with three light directions multiplexed across different color channels in the observed RGB image. Since the problem is ill-posed point-wise, we assume that the albedo map can be modeled as piece-wise constant with a restricted number of distinct albedo values. We show that under ideal conditions, the shape of a non-degenerate local constant albedo surface patch can theoretically be recovered exactly. Moreover, we present a practical and efficient algorithm that uses this model to robustly recover shape from real images. Our method first reasons about shape locally in a dense set of patches in the observed image, producing shape distributions for every patch. These local distributions are then combined to produce a single consistent surface normal map. We demonstrate the efficacy of the approach through experiments on both synthetic renderings as well as real captured images.
我们提出了一个单镜头系统,用于从在校准的RGB光度立体设置下捕获的图像中恢复具有空间变化反照率的物体的表面几何形状——在观察到的RGB图像中,三个光方向在不同的颜色通道上复用。由于问题是病态的,我们假设反照率图可以建模为具有有限数量的不同反照率值的分段常数。我们证明了在理想条件下,理论上可以精确地恢复非简并的局部恒定反照率表面斑块的形状。此外,我们还提出了一种实用高效的算法,利用该模型从真实图像中鲁棒地恢复形状。我们的方法首先在观察到的图像中密集的一组斑块中进行局部形状推理,生成每个斑块的形状分布。然后将这些局部分布组合在一起,生成一个统一的表面法线图。我们通过对合成渲染图和真实捕获图像的实验证明了该方法的有效性。
{"title":"Single-Image RGB Photometric Stereo with Spatially-Varying Albedo","authors":"Ayan Chakrabarti, Kalyan Sunkavalli","doi":"10.1109/3DV.2016.34","DOIUrl":"https://doi.org/10.1109/3DV.2016.34","url":null,"abstract":"We present a single-shot system to recover surface geometry of objects with spatially-varying albedos, from images captured under a calibrated RGB photometric stereo setup-with three light directions multiplexed across different color channels in the observed RGB image. Since the problem is ill-posed point-wise, we assume that the albedo map can be modeled as piece-wise constant with a restricted number of distinct albedo values. We show that under ideal conditions, the shape of a non-degenerate local constant albedo surface patch can theoretically be recovered exactly. Moreover, we present a practical and efficient algorithm that uses this model to robustly recover shape from real images. Our method first reasons about shape locally in a dense set of patches in the observed image, producing shape distributions for every patch. These local distributions are then combined to produce a single consistent surface normal map. We demonstrate the efficacy of the approach through experiments on both synthetic renderings as well as real captured images.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"371 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133933557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Multi-Body Non-Rigid Structure-from-Motion 基于运动的多体非刚性结构
Pub Date : 2016-07-15 DOI: 10.1109/3DV.2016.23
Suryansh Kumar, Yuchao Dai, Hongdong Li
In this paper, we present the first multi-body non-rigid structure-from-motion (SFM) method, which simultaneously reconstructs and segments multiple objects that are undergoing non-rigid deformation over time. Under our formulation, 3D trajectories for each non-rigid object can be well approximated with a sparse affine combination of other 3D trajectories from the same object. The resultant optimization is solved by the alternating direction method of multipliers (ADMM). We demonstrate the efficacy of the proposed method through extensive experiments on both synthetic and real data sequences. Our method outperforms other alternative methods, such as first clustering the 2D feature tracks to groups and then doing non-rigid reconstruction in each group or first conducting 3D reconstruction by using single subspace assumption and then clustering the 3D trajectories into groups.
在本文中,我们提出了第一个多体非刚性运动结构(SFM)方法,该方法可以同时重建和分割随时间变化而发生非刚性变形的多个物体。在我们的公式下,每个非刚性物体的3D轨迹可以很好地近似于来自同一物体的其他3D轨迹的稀疏仿射组合。利用乘法器交替方向法(ADMM)对优化结果进行求解。我们通过在合成和真实数据序列上的大量实验证明了所提出方法的有效性。我们的方法优于其他替代方法,例如首先将2D特征轨迹聚类成组,然后在每组中进行非刚性重建,或者首先通过使用单个子空间假设进行3D重建,然后将3D轨迹聚类成组。
{"title":"Multi-Body Non-Rigid Structure-from-Motion","authors":"Suryansh Kumar, Yuchao Dai, Hongdong Li","doi":"10.1109/3DV.2016.23","DOIUrl":"https://doi.org/10.1109/3DV.2016.23","url":null,"abstract":"In this paper, we present the first multi-body non-rigid structure-from-motion (SFM) method, which simultaneously reconstructs and segments multiple objects that are undergoing non-rigid deformation over time. Under our formulation, 3D trajectories for each non-rigid object can be well approximated with a sparse affine combination of other 3D trajectories from the same object. The resultant optimization is solved by the alternating direction method of multipliers (ADMM). We demonstrate the efficacy of the proposed method through extensive experiments on both synthetic and real data sequences. Our method outperforms other alternative methods, such as first clustering the 2D feature tracks to groups and then doing non-rigid reconstruction in each group or first conducting 3D reconstruction by using single subspace assumption and then clustering the 3D trajectories into groups.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124291522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Large Scale SfM with the Distributed Camera Model 基于分布式相机模型的大规模SfM
Pub Date : 2016-07-13 DOI: 10.1109/3DV.2016.31
Chris Sweeney, Victor Fragoso, Tobias Höllerer, M. Turk
We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and we describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS[21]. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM, however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.
本文介绍了一种新的基于运动结构的分布式摄像机模型。该模型根据光线的起源和方向而不是像素来描述图像观测。因此,所提出的模型能够将单个摄像机或多个摄像机同时描述为所观察到的所有光线的集合。我们展示了分布式相机模型是标准相机模型的泛化,并描述了适用于标准或分布式相机的绝对相机姿势问题的一般公式和解决方案。与gDLS[21]相比,该方法的求解效率和对旋转奇异点的鲁棒性提高了8倍。最后,将该方法应用于一种新型的大规模增量SfM管道中,实现了分布式摄像机的精确鲁棒合并。该管道是传统增量式SfM的直接推广,但是,它不是一次增量地增加一个摄像机来增长重建,而是通过增加一个分布式摄像机来增长重建。我们的管道通过避免许多束调整迭代的需要,有效地产生高精度的重建,并且能够在短短22分钟内从超过15,000张图像中计算出罗马的3D模型。
{"title":"Large Scale SfM with the Distributed Camera Model","authors":"Chris Sweeney, Victor Fragoso, Tobias Höllerer, M. Turk","doi":"10.1109/3DV.2016.31","DOIUrl":"https://doi.org/10.1109/3DV.2016.31","url":null,"abstract":"We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and we describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS[21]. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM, however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116684205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation V-Net:用于体积医学图像分割的全卷积神经网络
Pub Date : 2016-06-15 DOI: 10.1109/3DV.2016.79
F. Milletarì, N. Navab, Seyed-Ahmad Ahmadi
Convolutional Neural Networks (CNNs) have been recently employed to solve problems from both the computer vision and medical image analysis fields. Despite their popularity, most approaches are only able to process 2D images while most medical data used in clinical practice consists of 3D volumes. In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. Our CNN is trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once. We introduce a novel objective function, that we optimise during training, based on Dice coefficient. In this way we can deal with situations where there is a strong imbalance between the number of foreground and background voxels. To cope with the limited number of annotated volumes available for training, we augment the data applying random non-linear transformations and histogram matching. We show in our experimental evaluation that our approach achieves good performances on challenging test data while requiring only a fraction of the processing time needed by other previous methods.
卷积神经网络(cnn)最近被用于解决计算机视觉和医学图像分析领域的问题。尽管它们很受欢迎,但大多数方法只能处理2D图像,而临床实践中使用的大多数医学数据由3D体组成。在这项工作中,我们提出了一种基于体积,全卷积神经网络的3D图像分割方法。我们的CNN在描绘前列腺的MRI体积上进行端到端训练,并学会一次预测整个体积的分割。我们引入了一个新的目标函数,我们在训练过程中基于Dice系数对其进行优化。通过这种方式,我们可以处理前景和背景体素数量之间存在强烈不平衡的情况。为了应对可用于训练的有限数量的注释卷,我们使用随机非线性变换和直方图匹配来增加数据。我们在实验评估中表明,我们的方法在具有挑战性的测试数据上取得了良好的性能,而只需要其他先前方法所需的一小部分处理时间。
{"title":"V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation","authors":"F. Milletarì, N. Navab, Seyed-Ahmad Ahmadi","doi":"10.1109/3DV.2016.79","DOIUrl":"https://doi.org/10.1109/3DV.2016.79","url":null,"abstract":"Convolutional Neural Networks (CNNs) have been recently employed to solve problems from both the computer vision and medical image analysis fields. Despite their popularity, most approaches are only able to process 2D images while most medical data used in clinical practice consists of 3D volumes. In this work we propose an approach to 3D image segmentation based on a volumetric, fully convolutional, neural network. Our CNN is trained end-to-end on MRI volumes depicting prostate, and learns to predict segmentation for the whole volume at once. We introduce a novel objective function, that we optimise during training, based on Dice coefficient. In this way we can deal with situations where there is a strong imbalance between the number of foreground and background voxels. To cope with the limited number of annotated volumes available for training, we augment the data applying random non-linear transformations and histogram matching. We show in our experimental evaluation that our approach achieves good performances on challenging test data while requiring only a fraction of the processing time needed by other previous methods.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132926154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6402
Deeper Depth Prediction with Fully Convolutional Residual Networks 基于全卷积残差网络的深度预测
Pub Date : 2016-06-01 DOI: 10.1109/3DV.2016.32
Iro Laina, C. Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab
This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, we introduce the reverse Huber loss that is particularly suited for the task at hand and driven by the value distributions commonly present in depth maps. Our model is composed of a single architecture that is trained end-to-end and does not rely on post-processing techniques, such as CRFs or other additional refinement steps. As a result, it runs in real-time on images or videos. In the evaluation, we show that the proposed model contains fewer parameters and requires fewer training data than the current state of the art, while outperforming all approaches on depth estimation. Code and models are publicly available.
本文解决了给定单个RGB图像的场景深度图估计问题。我们提出了一个包含残差学习的全卷积架构,对单眼图像和深度图之间的模糊映射进行建模。为了提高输出分辨率,我们提出了一种在网络内有效学习特征映射上采样的新方法。为了优化,我们引入了反向Huber损失,它特别适合手头的任务,并由深度图中常见的值分布驱动。我们的模型由端到端训练的单一体系结构组成,不依赖于后处理技术,例如crf或其他额外的细化步骤。因此,它可以在图像或视频上实时运行。在评估中,我们表明所提出的模型包含更少的参数,并且需要更少的训练数据,而在深度估计方面优于所有方法。代码和模型是公开可用的。
{"title":"Deeper Depth Prediction with Fully Convolutional Residual Networks","authors":"Iro Laina, C. Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab","doi":"10.1109/3DV.2016.32","DOIUrl":"https://doi.org/10.1109/3DV.2016.32","url":null,"abstract":"This paper addresses the problem of estimating the depth map of a scene given a single RGB image. We propose a fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps. In order to improve the output resolution, we present a novel way to efficiently learn feature map up-sampling within the network. For optimization, we introduce the reverse Huber loss that is particularly suited for the task at hand and driven by the value distributions commonly present in depth maps. Our model is composed of a single architecture that is trained end-to-end and does not rely on post-processing techniques, such as CRFs or other additional refinement steps. As a result, it runs in real-time on images or videos. In the evaluation, we show that the proposed model contains fewer parameters and requires fewer training data than the current state of the art, while outperforming all approaches on depth estimation. Code and models are publicly available.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134063017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1598
Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks 深度卷积网络联合语义分割和深度估计
Pub Date : 2016-04-25 DOI: 10.1109/3DV.2016.69
Arsalan Mousavian, H. Pirsiavash, J. Kosecka
Multi-scale deep CNNs have been used successfully for problems mapping each pixel to a label, such as depth estimation and semantic segmentation. It has also been shown that such architectures are reusable and can be used for multiple tasks. These networks are typically trained independently for each task by varying the output layer(s) and training objective. In this work we present a new model for simultaneous depth estimation and semantic segmentation from a single RGB image. Our approach demonstrates the feasibility of training parts of the model for each task and then fine tuning the full, combined model on both tasks simultaneously using a single loss function. Furthermore we couple the deep CNN with fully connected CRF, which captures the contextual relationships and interactions between the semantic and depth cues improving the accuracy of the final results. The proposed model is trained and evaluated on NYUDepth V2 dataset [23] outperforming the state of the art methods on semantic segmentation and achieving comparable results on the task of depth estimation.
多尺度深度cnn已经成功地用于将每个像素映射到标签的问题,如深度估计和语义分割。研究还表明,这样的体系结构是可重用的,可以用于多个任务。这些网络通常通过改变输出层和训练目标来独立训练每个任务。在这项工作中,我们提出了一个新的模型,同时深度估计和语义分割从单一的RGB图像。我们的方法证明了为每个任务训练部分模型的可行性,然后使用单个损失函数同时微调两个任务上的完整组合模型。此外,我们将深度CNN与完全连接的CRF相结合,后者捕获语义和深度线索之间的上下文关系和交互,从而提高最终结果的准确性。所提出的模型在NYUDepth V2数据集上进行了训练和评估[23],在语义分割方面优于目前最先进的方法,在深度估计任务上也取得了相当的结果。
{"title":"Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks","authors":"Arsalan Mousavian, H. Pirsiavash, J. Kosecka","doi":"10.1109/3DV.2016.69","DOIUrl":"https://doi.org/10.1109/3DV.2016.69","url":null,"abstract":"Multi-scale deep CNNs have been used successfully for problems mapping each pixel to a label, such as depth estimation and semantic segmentation. It has also been shown that such architectures are reusable and can be used for multiple tasks. These networks are typically trained independently for each task by varying the output layer(s) and training objective. In this work we present a new model for simultaneous depth estimation and semantic segmentation from a single RGB image. Our approach demonstrates the feasibility of training parts of the model for each task and then fine tuning the full, combined model on both tasks simultaneously using a single loss function. Furthermore we couple the deep CNN with fully connected CRF, which captures the contextual relationships and interactions between the semantic and depth cues improving the accuracy of the final results. The proposed model is trained and evaluated on NYUDepth V2 dataset [23] outperforming the state of the art methods on semantic segmentation and achieving comparable results on the task of depth estimation.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124671836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 126
期刊
2016 Fourth International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1