首页 > 最新文献

2020 International Conference on 3D Vision (3DV)最新文献

英文 中文
LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization LM-Reloc:基于Levenberg-Marquardt的直接视觉定位
Pub Date : 2020-10-13 DOI: 10.1109/3DV50981.2020.00107
L. Stumberg, Patrick Wenzel, Nan Yang, D. Cremers
We present LM-Reloc–a novel approach for visual relocalization based on direct image alignment. In contrast to prior works that tackle the problem with a feature-based formulation, the proposed method does not rely on feature matching and RANSAC. Hence, the method can utilize not only corners but any region of the image with gradients. In particular, we propose a loss formulation inspired by the classical Levenberg-Marquardt algorithm to train LM-Net. The learned features significantly improve the robustness of direct image alignment, especially for relocalization across different conditions. To further improve the robustness of LM-Net against large image baselines, we propose a pose estimation network, CorrPoseNet, which regresses the relative pose to bootstrap the direct image alignment. Evaluations on the CARLA and Oxford RobotCar relocalization tracking benchmark show that our approach delivers more accurate results than previous state-of-the-art methods while being comparable in terms of robustness.
我们提出了一种基于直接图像对齐的视觉再定位新方法lm - reloc。与先前使用基于特征的公式解决问题的工作相比,本文提出的方法不依赖于特征匹配和RANSAC。因此,该方法不仅可以利用角,还可以利用图像的任何有梯度的区域。特别地,我们提出了一个受经典Levenberg-Marquardt算法启发的损失公式来训练LM-Net。学习到的特征显著提高了直接图像对齐的鲁棒性,特别是在不同条件下的重新定位。为了进一步提高LM-Net对大图像基线的鲁棒性,我们提出了一种姿态估计网络corposenet,该网络通过回归相对姿态来引导直接图像对齐。对CARLA和Oxford RobotCar重新定位跟踪基准的评估表明,我们的方法比以前最先进的方法提供了更准确的结果,同时在鲁棒性方面具有可比性。
{"title":"LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization","authors":"L. Stumberg, Patrick Wenzel, Nan Yang, D. Cremers","doi":"10.1109/3DV50981.2020.00107","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00107","url":null,"abstract":"We present LM-Reloc–a novel approach for visual relocalization based on direct image alignment. In contrast to prior works that tackle the problem with a feature-based formulation, the proposed method does not rely on feature matching and RANSAC. Hence, the method can utilize not only corners but any region of the image with gradients. In particular, we propose a loss formulation inspired by the classical Levenberg-Marquardt algorithm to train LM-Net. The learned features significantly improve the robustness of direct image alignment, especially for relocalization across different conditions. To further improve the robustness of LM-Net against large image baselines, we propose a pose estimation network, CorrPoseNet, which regresses the relative pose to bootstrap the direct image alignment. Evaluations on the CARLA and Oxford RobotCar relocalization tracking benchmark show that our approach delivers more accurate results than previous state-of-the-art methods while being comparable in terms of robustness.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"231 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115595723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A Progressive Conditional Generative Adversarial Network for Generating Dense and Colored 3D Point Clouds 用于生成密集和彩色三维点云的渐进式条件生成对抗网络
Pub Date : 2020-10-12 DOI: 10.1109/3DV50981.2020.00081
Mohammad Arshad, William J. Beksi
In this paper, we introduce a novel conditional generative adversarial network that creates dense 3D point clouds, with color, for assorted classes of objects in an unsupervised manner. To overcome the difficulty of capturing intricate details at high resolutions, we propose a point transformer that progressively grows the network through the use of graph convolutions. The network is composed of a leaf output layer and an initial set of branches. Every training iteration evolves a point vector into a point cloud of increasing resolution. After a fixed number of iterations, the number of branches is increased by replicating the last branch. Experimental results show that our network is capable of learning and mimicking a 3D data distribution, and produces colored point clouds with fine details at multiple resolutions.
在本文中,我们引入了一种新的条件生成对抗网络,该网络以无监督的方式为各种类型的对象创建密集的三维点云,并带有颜色。为了克服在高分辨率下捕获复杂细节的困难,我们提出了一个点转换器,通过使用图卷积逐步增长网络。该网络由一个叶输出层和一组初始分支组成。每次训练迭代都将一个点向量演化为一个分辨率不断增加的点云。经过固定次数的迭代后,通过复制最后一个分支来增加分支的数量。实验结果表明,我们的网络能够学习和模拟三维数据分布,并在多种分辨率下产生具有精细细节的彩色点云。
{"title":"A Progressive Conditional Generative Adversarial Network for Generating Dense and Colored 3D Point Clouds","authors":"Mohammad Arshad, William J. Beksi","doi":"10.1109/3DV50981.2020.00081","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00081","url":null,"abstract":"In this paper, we introduce a novel conditional generative adversarial network that creates dense 3D point clouds, with color, for assorted classes of objects in an unsupervised manner. To overcome the difficulty of capturing intricate details at high resolutions, we propose a point transformer that progressively grows the network through the use of graph convolutions. The network is composed of a leaf output layer and an initial set of branches. Every training iteration evolves a point vector into a point cloud of increasing resolution. After a fixed number of iterations, the number of branches is increased by replicating the last branch. Experimental results show that our network is capable of learning and mimicking a 3D data distribution, and produces colored point clouds with fine details at multiple resolutions.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134549348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Nighttime Stereo Depth Estimation using Joint Translation-Stereo Learning: Light Effects and Uninformative Regions 使用联合平移-立体学习的夜间立体深度估计:光效和无信息区域
Pub Date : 2020-10-11 DOI: 10.1109/3DV50981.2020.00012
Aashish Sharma, Lionel Heng, L. Cheong, R. Tan
Nighttime stereo depth estimation is still challenging, as assumptions associated with daytime lighting conditions do not hold any longer. Nighttime is not only about lowlight and dense noise, but also about glow/glare, flares, non-uniform distribution of light, etc. One of the possible solutions is to train a network on night stereo images in a fully supervised manner. However, to obtain proper disparity ground-truths that are dense, independent from glare/glow, and have sufficiently far depth ranges is extremely intractable. To address the problem, we introduce a network joining day/night translation and stereo. In training the network, our method does not require ground-truth disparities of the night images, or paired day/night images. We utilize a translation network that can render realistic night stereo images from day stereo images. We then train a stereo network on the rendered night stereo images using the available disparity supervision from the corresponding day stereo images, and simultaneously also train the day/night translation network. We handle the fake depth problem, which occurs due to the unsupervised/unpaired translation, for light effects (e.g., glow/glare) and uninformative regions (e.g., low-light and saturated regions), by adding structure-preservation and weighted-smoothness constraints. Our experiments show that our method outperforms the baseline methods on night images.
夜间立体景深估计仍然具有挑战性,因为与白天照明条件相关的假设不再成立。夜间不仅有低光和密集的噪音,而且还有辉光/眩光,耀斑,光的不均匀分布等。一种可能的解决方案是以完全监督的方式训练夜间立体图像网络。然而,要获得密度大、不受眩光/辉光影响、深度范围足够远的正确视差是非常困难的。为了解决这个问题,我们引入了一个连接昼夜翻译和立体声的网络。在训练网络时,我们的方法不需要夜间图像的真值差异,也不需要配对的白天/夜晚图像。我们利用一个翻译网络,可以将白天的立体图像渲染成逼真的夜间立体图像。然后,我们使用来自相应的白天立体图像的可用视差监督在渲染的夜间立体图像上训练立体网络,同时也训练昼夜转换网络。我们通过添加结构保存和加权平滑约束来处理由于无监督/未配对翻译而导致的假深度问题,该问题适用于光效果(例如,辉光/眩光)和无信息区域(例如,低光和饱和区域)。我们的实验表明,我们的方法在夜间图像上优于基线方法。
{"title":"Nighttime Stereo Depth Estimation using Joint Translation-Stereo Learning: Light Effects and Uninformative Regions","authors":"Aashish Sharma, Lionel Heng, L. Cheong, R. Tan","doi":"10.1109/3DV50981.2020.00012","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00012","url":null,"abstract":"Nighttime stereo depth estimation is still challenging, as assumptions associated with daytime lighting conditions do not hold any longer. Nighttime is not only about lowlight and dense noise, but also about glow/glare, flares, non-uniform distribution of light, etc. One of the possible solutions is to train a network on night stereo images in a fully supervised manner. However, to obtain proper disparity ground-truths that are dense, independent from glare/glow, and have sufficiently far depth ranges is extremely intractable. To address the problem, we introduce a network joining day/night translation and stereo. In training the network, our method does not require ground-truth disparities of the night images, or paired day/night images. We utilize a translation network that can render realistic night stereo images from day stereo images. We then train a stereo network on the rendered night stereo images using the available disparity supervision from the corresponding day stereo images, and simultaneously also train the day/night translation network. We handle the fake depth problem, which occurs due to the unsupervised/unpaired translation, for light effects (e.g., glow/glare) and uninformative regions (e.g., low-light and saturated regions), by adding structure-preservation and weighted-smoothness constraints. Our experiments show that our method outperforms the baseline methods on night images.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131632843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Torch-Points3D: A Modular Multi-Task Framework for Reproducible Deep Learning on 3D Point Clouds Torch-Points3D: 3D点云上可重复深度学习的模块化多任务框架
Pub Date : 2020-10-09 DOI: 10.1109/3DV50981.2020.00029
Thomas Chaton, N. Chaulet, Sofiane Horache, Loïc Landrieu
We introduce Torch-Points3D, an open-source framework designed to facilitate the use of deep networks on 3D data. Its modular design, efficient implementation, and user-friendly interfaces make it a relevant tool for research and productization alike. Beyond multiple quality-of-life features, our goal is to standardize a higher level of transparency and reproducibility in 3D deep learning research, and to lower its barrier to entry. In this paper, we present the design principles of Torch- Points3D, as well as extensive benchmarks of multiple stateof- the-art algorithms and inference schemes across several datasets and tasks. The modularity of Torch-Points3D allows us to design fair and rigorous experimental protocols in which all methods are evaluated in the same conditions. The Torch-Points3D repository: https://github. com/nicolas-chaulet/torch-points3d.
我们介绍Torch-Points3D,这是一个开源框架,旨在促进在3D数据上使用深度网络。它的模块化设计、高效的实现和用户友好的界面使其成为研究和产品化的相关工具。除了多种生活质量特征之外,我们的目标是在3D深度学习研究中标准化更高水平的透明度和可重复性,并降低其进入门槛。在本文中,我们介绍了Torch- Points3D的设计原则,以及跨多个数据集和任务的多个最先进算法和推理方案的广泛基准测试。Torch-Points3D的模块化使我们能够设计公平和严格的实验协议,其中所有方法在相同的条件下进行评估。Torch-Points3D资源库:https://github。com/nicolas-chaulet/torch-points3d。
{"title":"Torch-Points3D: A Modular Multi-Task Framework for Reproducible Deep Learning on 3D Point Clouds","authors":"Thomas Chaton, N. Chaulet, Sofiane Horache, Loïc Landrieu","doi":"10.1109/3DV50981.2020.00029","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00029","url":null,"abstract":"We introduce Torch-Points3D, an open-source framework designed to facilitate the use of deep networks on 3D data. Its modular design, efficient implementation, and user-friendly interfaces make it a relevant tool for research and productization alike. Beyond multiple quality-of-life features, our goal is to standardize a higher level of transparency and reproducibility in 3D deep learning research, and to lower its barrier to entry. In this paper, we present the design principles of Torch- Points3D, as well as extensive benchmarks of multiple stateof- the-art algorithms and inference schemes across several datasets and tasks. The modularity of Torch-Points3D allows us to design fair and rigorous experimental protocols in which all methods are evaluated in the same conditions. The Torch-Points3D repository: https://github. com/nicolas-chaulet/torch-points3d.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116108099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Deep SVBRDF Estimation on Real Materials 真实材料的深度SVBRDF估计
Pub Date : 2020-10-08 DOI: 10.1109/3DV50981.2020.00126
L. Asselin, D. Laurendeau, Jean-François Lalonde
Recent work has demonstrated that deep learning approaches can successfully be used to recover accurate estimates of the spatially-varying BRDF (SVBRDF) of a surface from as little as a single image. Closer inspection reveals, however, that most approaches in the literature are trained purely on synthetic data, which, while diverse and realistic, is often not representative of the richness of the real world. In this paper, we show that training such networks exclusively on synthetic data is insufficient to achieve adequate results when tested on real data. Our analysis leverages a new dataset of real materials obtained with a novel portable multi-light capture apparatus. Through an extensive series of experiments and with the use of a novel deep learning architecture, we explore two strategies for improving results on real data: finetuning, and a per-material optimization procedure. We show that adapting network weights to real data is of critical importance, resulting in an approach which significantly outperforms previous methods for SVBRDF estimation on real materials. Dataset and code are available at https://lvsn.github.io/ real-svbrdf.
最近的研究表明,深度学习方法可以成功地从单个图像中恢复对表面空间变化BRDF (SVBRDF)的准确估计。然而,仔细观察就会发现,文献中的大多数方法都是纯粹基于合成数据进行训练的,这些数据虽然多样且真实,但往往不能代表现实世界的丰富性。在本文中,我们表明,当在真实数据上进行测试时,仅在合成数据上训练此类网络不足以获得足够的结果。我们的分析利用了一种新型便携式多光捕获装置获得的真实材料的新数据集。通过一系列广泛的实验和使用新颖的深度学习架构,我们探索了两种改进真实数据结果的策略:微调和逐材料优化过程。我们表明,将网络权重适应于真实数据是至关重要的,这使得该方法显著优于以前在真实材料上进行SVBRDF估计的方法。数据集和代码可从https://lvsn.github.io/ real-svbrdf获取。
{"title":"Deep SVBRDF Estimation on Real Materials","authors":"L. Asselin, D. Laurendeau, Jean-François Lalonde","doi":"10.1109/3DV50981.2020.00126","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00126","url":null,"abstract":"Recent work has demonstrated that deep learning approaches can successfully be used to recover accurate estimates of the spatially-varying BRDF (SVBRDF) of a surface from as little as a single image. Closer inspection reveals, however, that most approaches in the literature are trained purely on synthetic data, which, while diverse and realistic, is often not representative of the richness of the real world. In this paper, we show that training such networks exclusively on synthetic data is insufficient to achieve adequate results when tested on real data. Our analysis leverages a new dataset of real materials obtained with a novel portable multi-light capture apparatus. Through an extensive series of experiments and with the use of a novel deep learning architecture, we explore two strategies for improving results on real data: finetuning, and a per-material optimization procedure. We show that adapting network weights to real data is of critical importance, resulting in an approach which significantly outperforms previous methods for SVBRDF estimation on real materials. Dataset and code are available at https://lvsn.github.io/ real-svbrdf.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129452098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs RANP: 3D cnn初始化时的资源感知神经元修剪
Pub Date : 2020-10-06 DOI: 10.1109/3DV50981.2020.00028
Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, R. Hartley
Although 3D Convolutional Neural Networks (CNNs) are essential for most learning based applications involving dense 3D data, their applicability is limited due to excessive memory and computational requirements. Compressing such networks by pruning therefore becomes highly desirable. However, pruning 3D CNNs is largely unexplored possibly because of the complex nature of typical pruning algorithms that embeds pruning into an iterative optimization paradigm. In this work, we introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs at initialization to high sparsity levels. Specifically, the core idea is to obtain an importance score for each neuron based on their sensitivity to the loss function. This neuron importance is then reweighted according to the neuron resource consumption related to FLOPs or memory. We demonstrate the effectiveness of our pruning method on 3D semantic segmentation with widely used 3D-UNets on ShapeNet and BraTS’18 as well as on video classification with MobileNetV2 and I3D on UCF101 dataset. In these experiments, our RANP leads to roughly 50%-95% reduction in FLOPs and 35%-80% reduction in memory with negligible loss in accuracy compared to the unpruned networks. This significantly reduces the computational resources required to train 3D CNNs. The pruned network obtained by our algorithm can also be easily scaled up and transferred to another dataset for training.
虽然3D卷积神经网络(cnn)对于大多数涉及密集3D数据的基于学习的应用是必不可少的,但由于过度的内存和计算需求,其适用性受到限制。因此,通过修剪来压缩这样的网络是非常可取的。然而,修剪3D cnn在很大程度上尚未被探索,这可能是因为典型修剪算法的复杂性,它将修剪嵌入到迭代优化范例中。在这项工作中,我们引入了一种资源感知神经元修剪(RANP)算法,该算法在初始化时将3D cnn修剪到高稀疏度水平。具体来说,核心思想是根据每个神经元对损失函数的敏感性来获得其重要性评分。然后根据与FLOPs或内存相关的神经元资源消耗重新加权该神经元的重要性。我们在ShapeNet和BraTS ' 18上广泛使用的3D- unets进行3D语义分割,以及在UCF101数据集上使用MobileNetV2和I3D进行视频分类时,证明了我们的修剪方法的有效性。在这些实验中,与未修剪的网络相比,我们的RANP使flop减少了大约50%-95%,内存减少了35%-80%,而准确性的损失可以忽略不计。这大大减少了训练3D cnn所需的计算资源。我们的算法得到的修剪后的网络也可以很容易地扩展并转移到另一个数据集进行训练。
{"title":"RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs","authors":"Zhiwei Xu, Thalaiyasingam Ajanthan, Vibhav Vineet, R. Hartley","doi":"10.1109/3DV50981.2020.00028","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00028","url":null,"abstract":"Although 3D Convolutional Neural Networks (CNNs) are essential for most learning based applications involving dense 3D data, their applicability is limited due to excessive memory and computational requirements. Compressing such networks by pruning therefore becomes highly desirable. However, pruning 3D CNNs is largely unexplored possibly because of the complex nature of typical pruning algorithms that embeds pruning into an iterative optimization paradigm. In this work, we introduce a Resource Aware Neuron Pruning (RANP) algorithm that prunes 3D CNNs at initialization to high sparsity levels. Specifically, the core idea is to obtain an importance score for each neuron based on their sensitivity to the loss function. This neuron importance is then reweighted according to the neuron resource consumption related to FLOPs or memory. We demonstrate the effectiveness of our pruning method on 3D semantic segmentation with widely used 3D-UNets on ShapeNet and BraTS’18 as well as on video classification with MobileNetV2 and I3D on UCF101 dataset. In these experiments, our RANP leads to roughly 50%-95% reduction in FLOPs and 35%-80% reduction in memory with negligible loss in accuracy compared to the unpruned networks. This significantly reduces the computational resources required to train 3D CNNs. The pruned network obtained by our algorithm can also be easily scaled up and transferred to another dataset for training.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127196581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video MonoClothCap:从单目RGB视频中实现时间连贯的服装捕获
Pub Date : 2020-09-22 DOI: 10.1109/3DV50981.2020.00042
Donglai Xiang, F. Prada, Chenglei Wu, J. Hodgins
We present a method to capture temporally coherent dynamic clothing deformation from a monocular RGB video input. In contrast to the existing literature, our method does not require a pre-scanned personalized mesh template, and thus can be applied to in-the-wild videos. To constrain the output to a valid deformation space, we build statistical deformation models for three types of clothing: T- shirt, short pants and long pants. A differentiable renderer is utilized to align our captured shapes to the input frames by minimizing the difference in both silhouette, segmentation, and texture. We develop a UV texture growing method which expands the visible texture region of the clothing sequentially in order to minimize drift in deformation tracking. We also extract fine-grained wrinkle detail from the input videos by fitting the clothed surface to the normal maps estimated by a convolutional neural network. Our method produces temporally coherent reconstruction of body and clothing from monocular video. We demonstrate successful clothing capture results from a variety of challenging videos. Extensive quantitative experiments demonstrate the effectiveness of our method on metrics including body pose error and surface reconstruction error of the clothing.
我们提出了一种从单目RGB视频输入中捕获时间相干动态服装变形的方法。与现有文献相比,我们的方法不需要预扫描的个性化网格模板,因此可以应用于野外视频。为了将输出约束到一个有效的变形空间,我们建立了三种服装类型的统计变形模型:T恤、短裤和长裤。一个可微分的渲染器被用来通过最小化轮廓、分割和纹理的差异来将我们捕获的形状与输入帧对齐。为了减少变形跟踪中的漂移,提出了一种UV纹理生长方法,该方法对服装的可见纹理区域进行逐次扩展。我们还通过将衣服表面拟合到卷积神经网络估计的法线映射中,从输入视频中提取细粒度的皱纹细节。我们的方法从单目视频中产生时间连贯的身体和衣服重建。我们展示了成功的服装捕获结果从各种具有挑战性的视频。大量的定量实验证明了该方法在人体姿态误差和服装表面重构误差等指标上的有效性。
{"title":"MonoClothCap: Towards Temporally Coherent Clothing Capture from Monocular RGB Video","authors":"Donglai Xiang, F. Prada, Chenglei Wu, J. Hodgins","doi":"10.1109/3DV50981.2020.00042","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00042","url":null,"abstract":"We present a method to capture temporally coherent dynamic clothing deformation from a monocular RGB video input. In contrast to the existing literature, our method does not require a pre-scanned personalized mesh template, and thus can be applied to in-the-wild videos. To constrain the output to a valid deformation space, we build statistical deformation models for three types of clothing: T- shirt, short pants and long pants. A differentiable renderer is utilized to align our captured shapes to the input frames by minimizing the difference in both silhouette, segmentation, and texture. We develop a UV texture growing method which expands the visible texture region of the clothing sequentially in order to minimize drift in deformation tracking. We also extract fine-grained wrinkle detail from the input videos by fitting the clothed surface to the normal maps estimated by a convolutional neural network. Our method produces temporally coherent reconstruction of body and clothing from monocular video. We demonstrate successful clothing capture results from a variety of challenging videos. Extensive quantitative experiments demonstrate the effectiveness of our method on metrics including body pose error and surface reconstruction error of the clothing.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121514019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion 非刚性剩余流和自我运动的自监督学习
Pub Date : 2020-09-22 DOI: 10.1109/3DV50981.2020.00025
Ivan Tishchenko, Sandro Lombardi, Martin R. Oswald, M. Pollefeys
Most of the current scene flow methods choose to model scene flow as a per point translation vector without differentiating between static and dynamic components of 3D motion. In this work we present an alternative method for end-to-end scene flow learning by joint estimation of non-rigid residual flow and ego-motion flow for dynamic 3D scenes. We propose to learn the relative rigid transformation from a pair of point clouds followed by an iterative refinement. We then learn the non-rigid flow from transformed inputs with the deducted rigid part of the flow. Furthermore, we extend the supervised framework with self-supervisory signals based on the temporal consistency property of a point cloud sequence. Our solution allows both training in a supervised mode complemented by self-supervisory loss terms as well as training in a fully self-supervised mode. We demonstrate that decomposition of scene flow into non-rigid flow and ego-motion flow along with an introduction of the self-supervisory signals allowed us to outperform the current state-of-the-art supervised methods.
目前大多数场景流方法选择将场景流建模为每个点的平移向量,而不区分3D运动的静态和动态组件。在这项工作中,我们提出了一种通过联合估计动态3D场景的非刚性残余流和自我运动流来进行端到端场景流学习的替代方法。我们提出从一对点云中学习相对刚性变换,然后进行迭代细化。然后,我们从转换后的输入中学习非刚性流,其中扣除了流的刚性部分。在此基础上,利用点云序列的时间一致性特性,将监督框架扩展为自监督信号。我们的解决方案既允许在自我监督损失条件下的监督模式下进行训练,也允许在完全自我监督模式下进行训练。我们证明,将场景流分解为非刚性流和自我运动流,并引入自我监督信号,使我们能够超越当前最先进的监督方法。
{"title":"Self-Supervised Learning of Non-Rigid Residual Flow and Ego-Motion","authors":"Ivan Tishchenko, Sandro Lombardi, Martin R. Oswald, M. Pollefeys","doi":"10.1109/3DV50981.2020.00025","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00025","url":null,"abstract":"Most of the current scene flow methods choose to model scene flow as a per point translation vector without differentiating between static and dynamic components of 3D motion. In this work we present an alternative method for end-to-end scene flow learning by joint estimation of non-rigid residual flow and ego-motion flow for dynamic 3D scenes. We propose to learn the relative rigid transformation from a pair of point clouds followed by an iterative refinement. We then learn the non-rigid flow from transformed inputs with the deducted rigid part of the flow. Furthermore, we extend the supervised framework with self-supervisory signals based on the temporal consistency property of a point cloud sequence. Our solution allows both training in a supervised mode complemented by self-supervisory loss terms as well as training in a fully self-supervised mode. We demonstrate that decomposition of scene flow into non-rigid flow and ego-motion flow along with an introduction of the self-supervisory signals allowed us to outperform the current state-of-the-art supervised methods.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127081281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Self-supervised Depth Denoising Using Lower- and Higher-quality RGB-D sensors 使用低质量和高质量RGB-D传感器的自监督深度去噪
Pub Date : 2020-09-10 DOI: 10.1109/3DV50981.2020.00084
Akhmedkhan Shabanov, Ilya Krotov, N. Chinaev, Vsevolod Poletaev, Sergei Kozlukov, I. Pasechnik, B. Yakupov, A. Sanakoyeu, V. Lebedev, Dmitry Ulyanov
Consumer-level depth cameras and depth sensors embedded in mobile devices enable numerous applications, such as AR games and face identification. However, the quality of the captured depth is sometimes insufficient for 3D reconstruction, tracking and other computer vision tasks. In this paper, we propose a self-supervised depth denoising approach to denoise and refine depth coming from a low quality sensor. We record simultaneous RGB-D sequences with unzynchronized lower- and higher-quality cameras and solve a challenging problem of aligning sequences both temporally and spatially. We then learn a deep neural network to denoise the lower-quality depth using the matched higher-quality data as a source of supervision signal. We experimentally validate our method against state-of-the-art filtering-based and deep denoising techniques and show its application for 3D object reconstruction tasks where our approach leads to more detailed fused surfaces and better tracking.
消费者级深度摄像头和嵌入移动设备的深度传感器支持许多应用,如增强现实游戏和面部识别。然而,捕获深度的质量有时不足以用于3D重建,跟踪和其他计算机视觉任务。在本文中,我们提出了一种自监督深度去噪方法来对来自低质量传感器的深度进行去噪和细化。我们记录同步RGB-D序列与非同步的低质量和高质量的相机,并解决了一个具有挑战性的问题,对准序列的时间和空间。然后,我们学习一个深度神经网络,使用匹配的高质量数据作为监督信号源,对低质量深度进行去噪。我们通过实验验证了我们的方法针对最先进的基于滤波和深度去噪技术,并展示了其在3D物体重建任务中的应用,其中我们的方法导致更详细的融合表面和更好的跟踪。
{"title":"Self-supervised Depth Denoising Using Lower- and Higher-quality RGB-D sensors","authors":"Akhmedkhan Shabanov, Ilya Krotov, N. Chinaev, Vsevolod Poletaev, Sergei Kozlukov, I. Pasechnik, B. Yakupov, A. Sanakoyeu, V. Lebedev, Dmitry Ulyanov","doi":"10.1109/3DV50981.2020.00084","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00084","url":null,"abstract":"Consumer-level depth cameras and depth sensors embedded in mobile devices enable numerous applications, such as AR games and face identification. However, the quality of the captured depth is sometimes insufficient for 3D reconstruction, tracking and other computer vision tasks. In this paper, we propose a self-supervised depth denoising approach to denoise and refine depth coming from a low quality sensor. We record simultaneous RGB-D sequences with unzynchronized lower- and higher-quality cameras and solve a challenging problem of aligning sequences both temporally and spatially. We then learn a deep neural network to denoise the lower-quality depth using the matched higher-quality data as a source of supervision signal. We experimentally validate our method against state-of-the-art filtering-based and deep denoising techniques and show its application for 3D object reconstruction tasks where our approach leads to more detailed fused surfaces and better tracking.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123828633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improved Modeling of 3D Shapes with Multi-view Depth Maps 改进的三维形状建模与多视图深度图
Pub Date : 2020-09-07 DOI: 10.1109/3DV50981.2020.00017
Kamal Gupta, S. Jabbireddy, Ketul Shah, Abhinav Shrivastava, Matthias Zwicker
We present a simple yet effective general-purpose framework for modeling 3D shapes by leveraging recent advances in 2D image generation using CNNs. Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects. Our simple encoder-decoder framework, comprised of a novel identity encoder and class-conditional viewpoint generator, generates 3D consistent depth maps. Our experimental results demonstrate the two-fold advantage of our approach. First, we can directly borrow architectures that work well in the 2D image domain to 3D. Second, we can effectively generate high-resolution 3D shapes with low computational memory. Our quantitative evaluations show that our method is superior to existing depth map methods for reconstructing and synthesizing 3D objects and is competitive with other representations, such as point clouds, voxel grids, and implicit functions. Code and other material will be made available at http://multiview-shapes. umiacs.io.
我们提出了一个简单而有效的通用框架,通过利用使用cnn的2D图像生成的最新进展来建模3D形状。仅使用对象的单个深度图像,我们就可以输出3D对象的密集多视图深度图表示。我们简单的编码器-解码器框架,由一个新颖的身份编码器和类条件视点生成器组成,生成3D一致的深度图。我们的实验结果证明了我们的方法的双重优势。首先,我们可以直接将在2D图像域中工作良好的架构借用到3D图像域中。其次,我们可以在低计算内存的情况下有效地生成高分辨率的3D形状。我们的定量评估表明,我们的方法优于现有的深度图方法,用于重建和合成3D物体,并且与其他表示(如点云,体素网格和隐式函数)具有竞争力。代码和其他材料将在http://multiview-shapes上提供。umiacs.io。
{"title":"Improved Modeling of 3D Shapes with Multi-view Depth Maps","authors":"Kamal Gupta, S. Jabbireddy, Ketul Shah, Abhinav Shrivastava, Matthias Zwicker","doi":"10.1109/3DV50981.2020.00017","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00017","url":null,"abstract":"We present a simple yet effective general-purpose framework for modeling 3D shapes by leveraging recent advances in 2D image generation using CNNs. Using just a single depth image of the object, we can output a dense multi-view depth map representation of 3D objects. Our simple encoder-decoder framework, comprised of a novel identity encoder and class-conditional viewpoint generator, generates 3D consistent depth maps. Our experimental results demonstrate the two-fold advantage of our approach. First, we can directly borrow architectures that work well in the 2D image domain to 3D. Second, we can effectively generate high-resolution 3D shapes with low computational memory. Our quantitative evaluations show that our method is superior to existing depth map methods for reconstructing and synthesizing 3D objects and is competitive with other representations, such as point clouds, voxel grids, and implicit functions. Code and other material will be made available at http://multiview-shapes. umiacs.io.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"10 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132579853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1