首页 > 最新文献

2020 International Conference on 3D Vision (3DV)最新文献

英文 中文
Distributed Photometric Bundle Adjustment 分布式光度束调整
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00024
Nikolaus Demmel, Maolin Gao, Emanuel Laude, Tao Wu, D. Cremers
In this paper we demonstrate that global photometric bundle adjustment (PBA) over all past keyframes can significantly improve the global accuracy of a monocular SLAM map compared to geometric techniques such as pose-graph optimization or traditional (geometric) bundle adjustment. However, PBA is computationally expensive in runtime, and memory usage can be prohibitively high. In order to address this scalability issue, we formulate PBA as an approximate consensus program. Due to its decomposable structure, the problem can be solved with block coordinate descent in parallel across multiple independent workers, each having lower requirements on memory and computational resources. For improved accuracy and convergence, we propose a novel gauge aware consensus update. Our experiments on real-world data show an average error reduction of 62% compared to odometry and 33% compared to intermediate pose-graph optimization, and that compared to the central optimization on a single machine, our distributed PBA achieves competitive pose-accuracy and cost.
在本文中,我们证明了与姿态图优化或传统(几何)束调整等几何技术相比,过去所有关键帧的全局光度束调整(PBA)可以显著提高单眼SLAM地图的全局精度。然而,PBA在运行时的计算成本很高,并且内存使用量可能高得令人望而却步。为了解决这个可扩展性问题,我们将PBA制定为近似共识程序。由于其可分解结构,该问题可以通过多个独立工作者并行的块坐标下降来解决,每个工作者对内存和计算资源的要求都较低。为了提高精度和收敛性,我们提出了一种新的规范感知共识更新。我们在真实世界数据上的实验表明,与odometry相比,我们的平均误差减少了62%,与中间姿态图优化相比减少了33%,并且与单机上的中央优化相比,我们的分布式PBA实现了具有竞争力的姿态精度和成本。
{"title":"Distributed Photometric Bundle Adjustment","authors":"Nikolaus Demmel, Maolin Gao, Emanuel Laude, Tao Wu, D. Cremers","doi":"10.1109/3DV50981.2020.00024","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00024","url":null,"abstract":"In this paper we demonstrate that global photometric bundle adjustment (PBA) over all past keyframes can significantly improve the global accuracy of a monocular SLAM map compared to geometric techniques such as pose-graph optimization or traditional (geometric) bundle adjustment. However, PBA is computationally expensive in runtime, and memory usage can be prohibitively high. In order to address this scalability issue, we formulate PBA as an approximate consensus program. Due to its decomposable structure, the problem can be solved with block coordinate descent in parallel across multiple independent workers, each having lower requirements on memory and computational resources. For improved accuracy and convergence, we propose a novel gauge aware consensus update. Our experiments on real-world data show an average error reduction of 62% compared to odometry and 33% compared to intermediate pose-graph optimization, and that compared to the central optimization on a single machine, our distributed PBA achieves competitive pose-accuracy and cost.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125577402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction PeeledHuman:纹理三维人体重建的鲁棒形状表示
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00098
Sai Sagar Jinka, R. Chacko, Avinash Sharma, P J Narayanan
We introduce PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing raytracing on the 3D body model and extending each ray beyond its first intersection. This formulation allows us to handle self-occlusions efficiently compared to other representations. Given a monocular RGB image, we learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. We train PeelGAN using a 3D Chamfer loss and other 2D losses to generate multiple depth values per-pixel and a corresponding RGB field per-vertex in a dual-branch setup. In our simple non-parametric solution, the generated Peeled Depth maps are back-projected to 3D space to obtain a complete textured 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method with current parametric and non-parametric methods in 3D reconstruction and find that we achieve state-of-theart-results. We demonstrate the effectiveness of our representation on publicly available BUFF and MonoPerfCap datasets as well as loose clothing data collected by our calibrated multi-Kinect setup.
我们介绍了PeeledHuman -一种新的人体形状表示,对自我闭塞具有鲁棒性。PeeledHuman将人体编码为一组2D的去皮深度和RGB地图,这些地图是通过对3D身体模型进行光线追踪并将每条光线延伸到其第一个交叉点之外获得的。与其他表示相比,这个公式允许我们有效地处理自遮挡。给定单目RGB图像,我们使用我们的新框架- PeelGAN以端到端生成对抗的方式学习这些剥皮地图。我们使用3D倒角损失和其他2D损失来训练PeelGAN,以在双分支设置中生成每个像素的多个深度值和每个顶点的相应RGB字段。在我们简单的非参数解决方案中,生成的去皮深度图被反投影到3D空间,以获得完整的纹理3D形状。相应的RGB贴图提供顶点级纹理细节。我们将我们的方法与当前三维重建中的参数和非参数方法进行了比较,发现我们达到了理想的结果。我们展示了我们在公开可用的BUFF和MonoPerfCap数据集以及由我们校准的多kinect设置收集的宽松服装数据上表示的有效性。
{"title":"PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction","authors":"Sai Sagar Jinka, R. Chacko, Avinash Sharma, P J Narayanan","doi":"10.1109/3DV50981.2020.00098","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00098","url":null,"abstract":"We introduce PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing raytracing on the 3D body model and extending each ray beyond its first intersection. This formulation allows us to handle self-occlusions efficiently compared to other representations. Given a monocular RGB image, we learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. We train PeelGAN using a 3D Chamfer loss and other 2D losses to generate multiple depth values per-pixel and a corresponding RGB field per-vertex in a dual-branch setup. In our simple non-parametric solution, the generated Peeled Depth maps are back-projected to 3D space to obtain a complete textured 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method with current parametric and non-parametric methods in 3D reconstruction and find that we achieve state-of-theart-results. We demonstrate the effectiveness of our representation on publicly available BUFF and MonoPerfCap datasets as well as loose clothing data collected by our calibrated multi-Kinect setup.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129341339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Message from the 3DV 2020 General Chair 3DV 2020总主席的讲话
Pub Date : 2020-11-01 DOI: 10.1109/3dv50981.2020.00005
{"title":"Message from the 3DV 2020 General Chair","authors":"","doi":"10.1109/3dv50981.2020.00005","DOIUrl":"https://doi.org/10.1109/3dv50981.2020.00005","url":null,"abstract":"","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129425305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scene Flow from Point Clouds with or without Learning 场景流从点云有或没有学习
Pub Date : 2020-10-31 DOI: 10.1109/3DV50981.2020.00036
J. K. Pontes, James Hays, S. Lucey
Scene flow is the three-dimensional (3D) motion field of a scene. It provides information about the spatial arrangement and rate of change of objects in dynamic environments. Current learning-based approaches seek to estimate the scene flow directly from point clouds and have achieved state-of-the-art performance. However, supervised learning methods are inherently domain specific and require a large amount of labeled data. Annotation of scene flow on real-world point clouds is expensive and challenging, and the lack of such datasets has recently sparked interest in self-supervised learning methods. How to accurately and robustly learn scene flow representations without labeled real-world data is still an open problem. Here we present a simple and interpretable objective function to recover the scene flow from point clouds. We use the graph Laplacian of a point cloud to regularize the scene flow to be “asrigid-as-possible”. Our proposed objective function can be used with or without learning—as a self-supervisory signal to learn scene flow representations, or as a non-learning-based method in which the scene flow is optimized during runtime. Our approach outperforms related works in many datasets. We also show the immediate applications of our proposed method for two applications: motion segmentation and point cloud densification.
场景流是场景的三维(3D)运动场。它提供了动态环境中物体的空间排列和变化率的信息。目前基于学习的方法寻求直接从点云估计场景流,并取得了最先进的性能。然而,监督学习方法本质上是特定于领域的,需要大量的标记数据。在现实世界的点云上标注场景流是昂贵且具有挑战性的,并且缺乏这样的数据集最近引发了对自监督学习方法的兴趣。如何在没有标记现实世界数据的情况下准确、鲁棒地学习场景流表示仍然是一个悬而未决的问题。在这里,我们提出了一个简单的、可解释的目标函数来从点云中恢复场景流。我们使用点云的拉普拉斯图来正则化场景流,使其“尽可能地刚性”。我们提出的目标函数可以在有或没有学习的情况下使用——作为一个自我监督信号来学习场景流表示,或者作为一个非基于学习的方法,在运行期间优化场景流。我们的方法在许多数据集中优于相关工作。我们还展示了我们提出的方法在两个应用中的直接应用:运动分割和点云密度化。
{"title":"Scene Flow from Point Clouds with or without Learning","authors":"J. K. Pontes, James Hays, S. Lucey","doi":"10.1109/3DV50981.2020.00036","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00036","url":null,"abstract":"Scene flow is the three-dimensional (3D) motion field of a scene. It provides information about the spatial arrangement and rate of change of objects in dynamic environments. Current learning-based approaches seek to estimate the scene flow directly from point clouds and have achieved state-of-the-art performance. However, supervised learning methods are inherently domain specific and require a large amount of labeled data. Annotation of scene flow on real-world point clouds is expensive and challenging, and the lack of such datasets has recently sparked interest in self-supervised learning methods. How to accurately and robustly learn scene flow representations without labeled real-world data is still an open problem. Here we present a simple and interpretable objective function to recover the scene flow from point clouds. We use the graph Laplacian of a point cloud to regularize the scene flow to be “asrigid-as-possible”. Our proposed objective function can be used with or without learning—as a self-supervisory signal to learn scene flow representations, or as a non-learning-based method in which the scene flow is optimized during runtime. Our approach outperforms related works in many datasets. We also show the immediate applications of our proposed method for two applications: motion segmentation and point cloud densification.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Correspondence Matrices are Underrated 对应矩阵被低估了
Pub Date : 2020-10-30 DOI: 10.1109/3DV50981.2020.00070
T. Zodage, Rahul Chakwate, Vinit Sarode, Rangaprasad Arun Srivatsan, H. Choset
Point-cloud registration (PCR) is an important task in various applications such as robotic manipulation, augmented and virtual reality, SLAM, etc. PCR is an optimization problem involving minimization over two different types of interdependent variables: transformation parameters and point-to-point correspondences. Recent developments in deep-learning have produced computationally fast approaches for PCR. The loss functions that are optimized in these networks are based on the error in the transformation parameters. We hypothesize that these methods would perform significantly better if they calculated their loss function using correspondence error instead of only using error in transformation parameters. We define correspondence error as a metric based on incorrectly matched point pairs. We provide a fundamental explanation for why this is the case and test our hypothesis by modifying existing methods to use correspondence-based loss instead of transformation-based loss. These experiments show that the modified networks converge faster and register more accurately even at larger misalignment when compared to the original networks.
点云配准(PCR)是机器人操作、增强现实和虚拟现实、SLAM等众多应用中的一项重要任务。PCR是一个涉及最小化两种不同类型的相互依赖变量的优化问题:转换参数和点对点对应。深度学习的最新发展为PCR提供了计算快速的方法。在这些网络中优化的损失函数是基于变换参数的误差。我们假设,如果这些方法使用对应误差而不是仅使用转换参数中的误差来计算损失函数,则这些方法将表现得更好。我们将对应误差定义为基于不正确匹配的点对的度量。我们对为什么会出现这种情况提供了一个基本的解释,并通过修改现有的方法来测试我们的假设,使用基于对应的损失而不是基于转换的损失。这些实验表明,与原始网络相比,改进后的网络收敛速度更快,即使在较大的偏差下也能更准确地注册。
{"title":"Correspondence Matrices are Underrated","authors":"T. Zodage, Rahul Chakwate, Vinit Sarode, Rangaprasad Arun Srivatsan, H. Choset","doi":"10.1109/3DV50981.2020.00070","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00070","url":null,"abstract":"Point-cloud registration (PCR) is an important task in various applications such as robotic manipulation, augmented and virtual reality, SLAM, etc. PCR is an optimization problem involving minimization over two different types of interdependent variables: transformation parameters and point-to-point correspondences. Recent developments in deep-learning have produced computationally fast approaches for PCR. The loss functions that are optimized in these networks are based on the error in the transformation parameters. We hypothesize that these methods would perform significantly better if they calculated their loss function using correspondence error instead of only using error in transformation parameters. We define correspondence error as a metric based on incorrectly matched point pairs. We provide a fundamental explanation for why this is the case and test our hypothesis by modifying existing methods to use correspondence-based loss instead of transformation-based loss. These experiments show that the modified networks converge faster and register more accurately even at larger misalignment when compared to the original networks.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124307703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion SCFusion:基于语义补全的实时增量场景重建
Pub Date : 2020-10-26 DOI: 10.1109/3DV50981.2020.00090
Shun-cheng Wu, Keisuke Tateno, N. Navab, Federico Tombari
Real-time scene reconstruction from depth data inevitably suffers from occlusion, thus leading to incomplete 3D models. Partial reconstructions, in turn, limit the performance of algorithms that leverage them for applications in the context of, e.g., augmented reality, robotic navigation, and 3D mapping. Most methods address this issue by predicting the missing geometry as an offline optimization, thus being incompatible with real-time applications. We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model. We evaluate the proposed approach quantitatively and qualitatively, demonstrating that our method can obtain accurate 3D semantic scene completion in real-time.
基于深度数据的实时场景重建不可避免地会受到遮挡的影响,从而导致三维模型的不完整。部分重建反过来又限制了在增强现实、机器人导航和3D地图等应用中利用它们的算法的性能。大多数方法解决这个问题的方法是将丢失的几何形状预测为离线优化,因此与实时应用程序不兼容。我们提出了一个框架,通过基于深度图的输入序列,以增量和实时的方式联合执行场景重建和语义场景补全来改善这个问题。我们的框架依赖于一种新的神经架构,该架构旨在处理占用地图,并利用体素状态准确有效地将语义补全与3D全局模型融合在一起。我们对所提出的方法进行了定量和定性评估,表明我们的方法可以实时获得准确的3D语义场景补全。
{"title":"SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion","authors":"Shun-cheng Wu, Keisuke Tateno, N. Navab, Federico Tombari","doi":"10.1109/3DV50981.2020.00090","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00090","url":null,"abstract":"Real-time scene reconstruction from depth data inevitably suffers from occlusion, thus leading to incomplete 3D models. Partial reconstructions, in turn, limit the performance of algorithms that leverage them for applications in the context of, e.g., augmented reality, robotic navigation, and 3D mapping. Most methods address this issue by predicting the missing geometry as an offline optimization, thus being incompatible with real-time applications. We propose a framework that ameliorates this issue by performing scene reconstruction and semantic scene completion jointly in an incremental and real-time manner, based on an input sequence of depth maps. Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model. We evaluate the proposed approach quantitatively and qualitatively, demonstrating that our method can obtain accurate 3D semantic scene completion in real-time.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126572212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Unsupervised Dense Shape Correspondence using Heat Kernels 使用热核的无监督稠密形状对应
Pub Date : 2020-10-23 DOI: 10.1109/3DV50981.2020.00067
Mehmet Aygün, Zorah Lähner, D. Cremers
In this work, we propose an unsupervised method for learning dense correspondences between shapes using a recent deep functional map framework. Instead of depending on ground-truth correspondences or the computationally expensive geodesic distances, we use heat kernels. These can be computed quickly during training as the supervisor signal. Moreover, we propose a curriculum learning strategy using different heat diffusion times which provide different levels of difficulty during optimization without any sampling mechanism or hard example mining. We present the results of our method on different benchmarks which have various challenges like partiality, topological noise and different connectivity.
在这项工作中,我们提出了一种无监督的方法,用于使用最近的深度功能图框架来学习形状之间的密集对应关系。我们使用热核,而不是依赖于真实对应或计算昂贵的测地线距离。这些可以在训练过程中作为主管信号快速计算出来。此外,我们提出了一种使用不同热扩散时间的课程学习策略,该策略在优化过程中提供了不同的难度级别,而无需任何采样机制或硬示例挖掘。我们在不同的基准上展示了我们的方法的结果,这些基准具有各种挑战,如偏袒,拓扑噪声和不同的连通性。
{"title":"Unsupervised Dense Shape Correspondence using Heat Kernels","authors":"Mehmet Aygün, Zorah Lähner, D. Cremers","doi":"10.1109/3DV50981.2020.00067","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00067","url":null,"abstract":"In this work, we propose an unsupervised method for learning dense correspondences between shapes using a recent deep functional map framework. Instead of depending on ground-truth correspondences or the computationally expensive geodesic distances, we use heat kernels. These can be computed quickly during training as the supervisor signal. Moreover, we propose a curriculum learning strategy using different heat diffusion times which provide different levels of difficulty during optimization without any sampling mechanism or hard example mining. We present the results of our method on different benchmarks which have various challenges like partiality, topological noise and different connectivity.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130940935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Error Bounds of Projection Models in Weakly Supervised 3D Human Pose Estimation 弱监督三维人体姿态估计中投影模型的误差范围
Pub Date : 2020-10-23 DOI: 10.1109/3DV50981.2020.00100
Nikolas Klug, Moritz Einfalt, Stephan Brehm, R. Lienhart
The current state-of-the-art in monocular 3D human pose estimation is heavily influenced by weakly supervised methods. These allow 2D labels to be used to learn effective 3D human pose recovery either directly from images or via 2D-to-3D pose uplifting. In this paper we present a detailed analysis of the most commonly used simplified projection models, which relate the estimated 3D pose representation to 2D labels: normalized perspective and weak perspective projections. Specifically, we derive theoretical lower bound errors for those projection models under the commonly used mean per-joint position error (MPJPE). Additionally, we show how the normalized perspective projection can be replaced to avoid this guaranteed minimal error. We evaluate the derived lower bounds on the most commonly used 3D human pose estimation benchmark datasets. Our results show that both projection models lead to an inherent minimal error between 19.3mm and 54.7mm, even after alignment in position and scale. This is a considerable share when comparing with recent state-of-the-art results. Our paper thus establishes a theoretical baseline that shows the importance of suitable projection models in weakly supervised 3D human pose estimation.
目前单眼三维人体姿态估计的研究进展严重受到弱监督方法的影响。这些允许使用2D标签直接从图像或通过2D到3D姿势提升来学习有效的3D人体姿势恢复。在本文中,我们详细分析了最常用的简化投影模型,这些模型将估计的3D姿态表示与2D标签联系起来:归一化透视和弱透视投影。具体而言,我们在常用的平均关节位置误差(MPJPE)下推导了这些投影模型的理论下界误差。此外,我们还展示了如何替换规范化透视图投影以避免这种保证最小的错误。我们在最常用的三维人体姿态估计基准数据集上评估了导出的下界。我们的结果表明,即使在位置和比例对齐之后,两种投影模型的固有误差都在19.3mm和54.7mm之间。与最近最先进的结果相比,这是一个相当大的份额。因此,我们的论文建立了一个理论基线,显示了在弱监督的3D人体姿态估计中合适的投影模型的重要性。
{"title":"Error Bounds of Projection Models in Weakly Supervised 3D Human Pose Estimation","authors":"Nikolas Klug, Moritz Einfalt, Stephan Brehm, R. Lienhart","doi":"10.1109/3DV50981.2020.00100","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00100","url":null,"abstract":"The current state-of-the-art in monocular 3D human pose estimation is heavily influenced by weakly supervised methods. These allow 2D labels to be used to learn effective 3D human pose recovery either directly from images or via 2D-to-3D pose uplifting. In this paper we present a detailed analysis of the most commonly used simplified projection models, which relate the estimated 3D pose representation to 2D labels: normalized perspective and weak perspective projections. Specifically, we derive theoretical lower bound errors for those projection models under the commonly used mean per-joint position error (MPJPE). Additionally, we show how the normalized perspective projection can be replaced to avoid this guaranteed minimal error. We evaluate the derived lower bounds on the most commonly used 3D human pose estimation benchmark datasets. Our results show that both projection models lead to an inherent minimal error between 19.3mm and 54.7mm, even after alignment in position and scale. This is a considerable share when comparing with recent state-of-the-art results. Our paper thus establishes a theoretical baseline that shows the importance of suitable projection models in weakly supervised 3D human pose estimation.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125338548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BP-MVSNet: Belief-Propagation-Layers for Multi-View-Stereo BP-MVSNet:多视点立体的信念-传播-层
Pub Date : 2020-10-23 DOI: 10.1109/3DV50981.2020.00049
Christian Sormann, Patrick Knöbelreiter, Andreas Kuhn, Mattia Rossi, T. Pock, F. Fraundorfer
In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer [16] and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF. This is required to make the BP layer invariant to different scales in the MVS setting. In order to also enable fractional label jumps, we propose a differentiable interpolation step, which we embed into the computation of the pairwise term. These extensions allow us to integrate the BP layer into a multi-scale MVS network, where we continuously improve a rough initial estimate until we get high quality depth maps as a result. We evaluate the proposed BP-MVSNet in an ablation study and conduct extensive experiments on the DTU, Tanks and Temples and ETH3D data sets. The experiments show that we can significantly outperform the baseline and achieve state-of-the-art results.
在这项工作中,我们提出了BP-MVSNet,这是一种基于卷积神经网络(CNN)的多视图立体(MVS)方法,它使用可微条件随机场(CRF)层进行正则化。为此,我们建议扩展BP层[16],并添加在MVS设置中成功使用它所必需的东西。因此,我们展示了如何基于预期的3D误差计算规范化,然后我们可以使用它来规范化CRF中的标签跳转。这需要使BP层在MVS设置中对不同的尺度保持不变。为了实现分数阶标签跳转,我们提出了一个可微插值步骤,并将其嵌入到两两项的计算中。这些扩展允许我们将BP层集成到多尺度MVS网络中,在那里我们不断改进粗略的初始估计,直到我们得到高质量的深度图。我们在消融研究中评估了BP-MVSNet,并在DTU、Tanks和Temples以及ETH3D数据集上进行了广泛的实验。实验表明,我们可以显著优于基线并获得最先进的结果。
{"title":"BP-MVSNet: Belief-Propagation-Layers for Multi-View-Stereo","authors":"Christian Sormann, Patrick Knöbelreiter, Andreas Kuhn, Mattia Rossi, T. Pock, F. Fraundorfer","doi":"10.1109/3DV50981.2020.00049","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00049","url":null,"abstract":"In this work, we propose BP-MVSNet, a convolutional neural network (CNN)-based Multi-View-Stereo (MVS) method that uses a differentiable Conditional Random Field (CRF) layer for regularization. To this end, we propose to extend the BP layer [16] and add what is necessary to successfully use it in the MVS setting. We therefore show how we can calculate a normalization based on the expected 3D error, which we can then use to normalize the label jumps in the CRF. This is required to make the BP layer invariant to different scales in the MVS setting. In order to also enable fractional label jumps, we propose a differentiable interpolation step, which we embed into the computation of the pairwise term. These extensions allow us to integrate the BP layer into a multi-scale MVS network, where we continuously improve a rough initial estimate until we get high quality depth maps as a result. We evaluate the proposed BP-MVSNet in an ablation study and conduct extensive experiments on the DTU, Tanks and Temples and ETH3D data sets. The experiments show that we can significantly outperform the baseline and achieve state-of-the-art results.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Convolutional Autoencoders for Human Motion Infilling 人体运动填充的卷积自编码器
Pub Date : 2020-10-22 DOI: 10.1109/3DV50981.2020.00102
Manuel Kaufmann, Emre Aksan, Jie Song, Fabrizio Pece, R. Ziegler, Otmar Hilliges
In this paper we propose a convolutional autoencoder to address the problem of motion infilling for 3D human motion data. Given a start and end sequence, motion infilling aims to complete the missing gap in between, such that the filled in poses plausibly forecast the start sequence and naturally transition into the end sequence. To this end, we propose a single, end-to-end trainable convolutional autoencoder. We show that a single model can be used to create natural transitions between different types of activities. Furthermore, our method is not only able to fill in entire missing frames, but it can also be used to complete gaps where partial poses are available (e.g. from end effectors), or to clean up other forms of noise (e.g. Gaussian). Also, the model can fill in an arbitrary number of gaps that potentially vary in length. In addition, no further post-processing on the model’s outputs is necessary such as smoothing or closing discontinuities at the end of the gap. At the heart of our approach lies the idea to cast motion infilling as an inpainting problem and to train a convolutional de-noising autoencoder on image-like representations of motion sequences. At training time, blocks of columns are removed from such images and we ask the model to fill in the gaps. We demonstrate the versatility of the approach via a number of complex motion sequences and report on thorough evaluations performed to better understand the capabilities and limitations of the proposed approach.
本文提出了一种卷积自编码器来解决三维人体运动数据的运动填充问题。给定一个开始和结束序列,动作填充的目的是填补中间缺失的空白,这样填充的姿势就能合理地预测开始序列,并自然地过渡到结束序列。为此,我们提出了一个单一的,端到端可训练的卷积自编码器。我们展示了单个模型可以用于在不同类型的活动之间创建自然转换。此外,我们的方法不仅能够填补整个缺失的帧,而且还可以用来完成部分姿势可用的间隙(例如,从末端执行器),或者清除其他形式的噪声(例如高斯)。此外,该模型可以填充任意数量的长度可能不同的间隙。此外,不需要对模型的输出进行进一步的后处理,例如平滑或关闭间隙末端的不连续点。我们的方法的核心是将运动填充作为一个图像绘制问题,并在运动序列的图像表示上训练卷积去噪自编码器。在训练时,从这些图像中移除列块,我们要求模型填补空白。我们通过一系列复杂的运动序列展示了该方法的多功能性,并报告了为了更好地理解所提出方法的能力和局限性而进行的全面评估。
{"title":"Convolutional Autoencoders for Human Motion Infilling","authors":"Manuel Kaufmann, Emre Aksan, Jie Song, Fabrizio Pece, R. Ziegler, Otmar Hilliges","doi":"10.1109/3DV50981.2020.00102","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00102","url":null,"abstract":"In this paper we propose a convolutional autoencoder to address the problem of motion infilling for 3D human motion data. Given a start and end sequence, motion infilling aims to complete the missing gap in between, such that the filled in poses plausibly forecast the start sequence and naturally transition into the end sequence. To this end, we propose a single, end-to-end trainable convolutional autoencoder. We show that a single model can be used to create natural transitions between different types of activities. Furthermore, our method is not only able to fill in entire missing frames, but it can also be used to complete gaps where partial poses are available (e.g. from end effectors), or to clean up other forms of noise (e.g. Gaussian). Also, the model can fill in an arbitrary number of gaps that potentially vary in length. In addition, no further post-processing on the model’s outputs is necessary such as smoothing or closing discontinuities at the end of the gap. At the heart of our approach lies the idea to cast motion infilling as an inpainting problem and to train a convolutional de-noising autoencoder on image-like representations of motion sequences. At training time, blocks of columns are removed from such images and we ask the model to fill in the gaps. We demonstrate the versatility of the approach via a number of complex motion sequences and report on thorough evaluations performed to better understand the capabilities and limitations of the proposed approach.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116210154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
期刊
2020 International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1