首页 > 最新文献

2020 International Conference on 3D Vision (3DV)最新文献

英文 中文
Smart Time-Multiplexing of Quads Solves the Multicamera Interference Problem 四元智能时复用解决了多摄像机干扰问题
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00091
T. Pribanić, T. Petković, David Bojanić, Kristijan Bartol
Time-of-flight (ToF) cameras are becoming increasingly popular for 3D imaging. Their optimal usage has been studied from the several aspects. One of the open research problems is the possibility of a multicamera interference problem when two or more ToF cameras are operating simultaneously. In this work we present an efficient method to synchronize multiple operating ToF cameras. Our method is based on the time-division multiplexing, but unlike traditional time multiplexing, it does not decrease the effective camera frame rate. Additionally, for unsynchronized cameras, we provide a robust method to extract from their corresponding video streams, frames which are not subject to multicamera interference problem. We demonstrate our approach through a series of experiments and with a different level of support available for triggering, ranging from a hardware triggering to purely random software triggering.
飞行时间(ToF)相机在3D成像中越来越受欢迎。从几个方面对其最佳利用进行了研究。其中一个开放的研究问题是,当两个或多个ToF相机同时工作时,多相机的干扰问题的可能性。本文提出了一种同步多台ToF摄像机的有效方法。我们的方法是基于时分复用,但与传统的时分复用不同,它不会降低有效的摄像机帧率。此外,对于非同步摄像机,我们提供了一种鲁棒的方法来从它们相应的视频流中提取不受多摄像机干扰问题的帧。我们通过一系列实验和不同级别的触发支持来演示我们的方法,从硬件触发到纯粹随机的软件触发。
{"title":"Smart Time-Multiplexing of Quads Solves the Multicamera Interference Problem","authors":"T. Pribanić, T. Petković, David Bojanić, Kristijan Bartol","doi":"10.1109/3DV50981.2020.00091","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00091","url":null,"abstract":"Time-of-flight (ToF) cameras are becoming increasingly popular for 3D imaging. Their optimal usage has been studied from the several aspects. One of the open research problems is the possibility of a multicamera interference problem when two or more ToF cameras are operating simultaneously. In this work we present an efficient method to synchronize multiple operating ToF cameras. Our method is based on the time-division multiplexing, but unlike traditional time multiplexing, it does not decrease the effective camera frame rate. Additionally, for unsynchronized cameras, we provide a robust method to extract from their corresponding video streams, frames which are not subject to multicamera interference problem. We demonstrate our approach through a series of experiments and with a different level of support available for triggering, ranging from a hardware triggering to purely random software triggering.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130317912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SMPLy Benchmarking 3D Human Pose Estimation in the Wild SMPLy基准在野外三维人体姿态估计
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00040
Vincent Leroy, Philippe Weinzaepfel, Romain Br'egier, Hadrien Combaluzier, Grégory Rogez
Predicting 3D human pose from images has seen great recent improvements. Novel approaches that can even predict both pose and shape from a single input image have been introduced, often relying on a parametric model of the human body such as SMPL. While qualitative results for such methods are often shown for images captured in-the-wild, a proper benchmark in such conditions is still missing, as it is cumbersome to obtain ground-truth 3D poses elsewhere than in a motion capture room. This paper presents a pipeline to easily produce and validate such a dataset with accurate ground-truth, with which we benchmark recent 3D human pose estimation methods in-the-wild. We make use of the recently introduced Mannequin Challenge dataset which contains in-the-wild videos of people frozen in action like statues and leverage the fact that people are static and the camera moving to accurately fit the SMPL model on the sequences. A total of 24,428 frames with registered body models are then selected from 567 scenes at almost no cost, using only online RGB videos. We benchmark state-of-the-art SMPL-based human pose estimation methods on this dataset. Our results highlight that challenges remain, in particular for difficult poses or for scenes where the persons are partially truncated or occluded.
从图像中预测3D人体姿势最近有了很大的进步。新方法甚至可以从单个输入图像中预测姿势和形状,通常依赖于人体的参数化模型,如SMPL。虽然这种方法的定性结果通常显示在野外捕获的图像中,但在这种条件下仍然缺乏适当的基准,因为在其他地方获得地面真实的3D姿势比在动作捕捉室中更麻烦。本文提出了一个管道,可以轻松地生成和验证具有准确的地面真实值的数据集,并在野外对最近的3D人体姿态估计方法进行了基准测试。我们利用最近推出的“人体模型挑战”数据集,该数据集包含像雕像一样冻结在行动中的人的野外视频,并利用人是静态的和相机移动的事实来准确地拟合序列上的SMPL模型。然后从567个场景中选择总共24,428帧具有注册的身体模型,几乎没有成本,只使用在线RGB视频。我们在这个数据集上对最先进的基于smpl的人体姿态估计方法进行了基准测试。我们的研究结果强调了挑战仍然存在,特别是对于困难的姿势或人物部分被截断或遮挡的场景。
{"title":"SMPLy Benchmarking 3D Human Pose Estimation in the Wild","authors":"Vincent Leroy, Philippe Weinzaepfel, Romain Br'egier, Hadrien Combaluzier, Grégory Rogez","doi":"10.1109/3DV50981.2020.00040","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00040","url":null,"abstract":"Predicting 3D human pose from images has seen great recent improvements. Novel approaches that can even predict both pose and shape from a single input image have been introduced, often relying on a parametric model of the human body such as SMPL. While qualitative results for such methods are often shown for images captured in-the-wild, a proper benchmark in such conditions is still missing, as it is cumbersome to obtain ground-truth 3D poses elsewhere than in a motion capture room. This paper presents a pipeline to easily produce and validate such a dataset with accurate ground-truth, with which we benchmark recent 3D human pose estimation methods in-the-wild. We make use of the recently introduced Mannequin Challenge dataset which contains in-the-wild videos of people frozen in action like statues and leverage the fact that people are static and the camera moving to accurately fit the SMPL model on the sequences. A total of 24,428 frames with registered body models are then selected from 567 scenes at almost no cost, using only online RGB videos. We benchmark state-of-the-art SMPL-based human pose estimation methods on this dataset. Our results highlight that challenges remain, in particular for difficult poses or for scenes where the persons are partially truncated or occluded.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130740396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Scalable Point Cloud-based Reconstruction with Local Implicit Functions 基于可伸缩点云的局部隐式函数重构
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00110
Sandro Lombardi, Martin R. Oswald, M. Pollefeys
Surface reconstruction from point clouds has been a well-studied research topic with applications in computer vision and computer graphics. Recently, several learningbased methods were proposed for 3D shape representation through implicit functions which among others can be used for point cloud-based reconstruction. Although delivering compelling results for synthetic object datasets of overseeable size, they fail to represent larger scenes accurately, presumably due to the use of only one global latent code for encoding an entire scene or object. We propose to encode only parts of objects with features attached to unstructured point clouds. To this end we use a hierarchical feature map in 3D space, extracted from the input point clouds, with which local latent shape encodings can be queried at arbitrary positions. We use a permutohedral lattice to process the hierarchical feature maps sparsely and efficiently. This enables accurate and detailed point cloud-based reconstructions for large amounts of points in a time-efficient manner, showing good generalization capabilities across different datasets. Experiments on synthetic and real world datasets demonstrate the reconstruction capability of our method and compare favorably to state-of-the-art methods.
从点云中重建表面是一个被广泛研究的研究课题,在计算机视觉和计算机图形学中有着广泛的应用。近年来,人们提出了几种基于学习的隐式函数三维形状表示方法,这些方法可用于基于点云的三维形状重建。尽管对于可监督大小的合成对象数据集提供了令人信服的结果,但它们无法准确地表示更大的场景,可能是因为只使用了一个全局潜在代码来编码整个场景或对象。我们建议只编码具有非结构化点云特征的部分对象。为此,我们在三维空间中使用从输入点云中提取的分层特征图,利用该特征图可以在任意位置查询局部潜在形状编码。我们使用一种互面体晶格来稀疏高效地处理层次化特征映射。这使得精确和详细的基于点云的重建能够以一种高效的方式对大量的点进行重建,并在不同的数据集上显示出良好的泛化能力。在合成和真实世界数据集上的实验证明了我们的方法的重建能力,并与最先进的方法进行了比较。
{"title":"Scalable Point Cloud-based Reconstruction with Local Implicit Functions","authors":"Sandro Lombardi, Martin R. Oswald, M. Pollefeys","doi":"10.1109/3DV50981.2020.00110","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00110","url":null,"abstract":"Surface reconstruction from point clouds has been a well-studied research topic with applications in computer vision and computer graphics. Recently, several learningbased methods were proposed for 3D shape representation through implicit functions which among others can be used for point cloud-based reconstruction. Although delivering compelling results for synthetic object datasets of overseeable size, they fail to represent larger scenes accurately, presumably due to the use of only one global latent code for encoding an entire scene or object. We propose to encode only parts of objects with features attached to unstructured point clouds. To this end we use a hierarchical feature map in 3D space, extracted from the input point clouds, with which local latent shape encodings can be queried at arbitrary positions. We use a permutohedral lattice to process the hierarchical feature maps sparsely and efficiently. This enables accurate and detailed point cloud-based reconstructions for large amounts of points in a time-efficient manner, showing good generalization capabilities across different datasets. Experiments on synthetic and real world datasets demonstrate the reconstruction capability of our method and compare favorably to state-of-the-art methods.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132014237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Rotation-Invariant Point Convolution With Multiple Equivariant Alignments. 具有多个等变对齐的旋转不变点卷积。
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00060
Hugues Thomas
Recent attempts at introducing rotation invariance or equivariance in 3D deep learning approaches have shown promising results, but these methods still struggle to reach the performances of standard 3D neural networks. In this work we study the relation between equivariance and invariance in 3D point convolutions. We show that using rotation-equivariant alignments, it is possible to make any convolutional layer rotation-invariant. Furthermore, we improve this simple alignment procedure by using the alignment themselves as features in the convolution, and by combining multiple alignments together. With this core layer, we design rotation-invariant architectures which improve state-of-the-art results in both object classification and semantic segmentation and reduces the gap between rotation-invariant and standard 3D deep learning approaches.
最近在3D深度学习方法中引入旋转不变性或等变性的尝试已经显示出有希望的结果,但这些方法仍然难以达到标准3D神经网络的性能。本文研究了三维点卷积的等变性和不变性之间的关系。我们证明了使用旋转等变对齐,可以使任何卷积层旋转不变。此外,我们通过使用对齐本身作为卷积中的特征,并将多个对齐组合在一起,改进了这个简单的对齐过程。通过这个核心层,我们设计了旋转不变架构,提高了对象分类和语义分割的最新结果,并减少了旋转不变和标准3D深度学习方法之间的差距。
{"title":"Rotation-Invariant Point Convolution With Multiple Equivariant Alignments.","authors":"Hugues Thomas","doi":"10.1109/3DV50981.2020.00060","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00060","url":null,"abstract":"Recent attempts at introducing rotation invariance or equivariance in 3D deep learning approaches have shown promising results, but these methods still struggle to reach the performances of standard 3D neural networks. In this work we study the relation between equivariance and invariance in 3D point convolutions. We show that using rotation-equivariant alignments, it is possible to make any convolutional layer rotation-invariant. Furthermore, we improve this simple alignment procedure by using the alignment themselves as features in the convolution, and by combining multiple alignments together. With this core layer, we design rotation-invariant architectures which improve state-of-the-art results in both object classification and semantic segmentation and reduces the gap between rotation-invariant and standard 3D deep learning approaches.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132942833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Using Image Sequences for Long-Term Visual Localization 利用图像序列进行长期视觉定位
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00104
Erik Stenborg, Torsten Sattler, Lars Hammarstrand
Estimating the pose of a camera in a known scene, i.e., visual localization, is a core task for applications such as self-driving cars. In many scenarios, image sequences are available and existing work on combining single-image localization with odometry offers to unlock their potential for improving localization performance. Still, the largest part of the literature focuses on single-image localization and ignores the availability of sequence data. The goal of this paper is to demonstrate the potential of image sequences in challenging scenarios, e.g., under day-night or seasonal changes. Combining ideas from the literature, we describe a sequence-based localization pipeline that combines odometry with both a coarse and a fine localization module. Experiments on long-term localization datasets show that combining single-image global localization against a prebuilt map with a visual odometry / SLAM pipeline improves performance to a level where the extended CMU Seasons dataset can be considered solved. We show that SIFT features can perform on par with modern state-of-the-art features in our framework, despite being much weaker and a magnitude faster to compute. Our code is publicly available at github.com/rulllars.
在已知场景中估计相机的姿态,即视觉定位,是自动驾驶汽车等应用的核心任务。在许多情况下,图像序列是可用的,现有的将单图像定位与里程计相结合的工作提供了释放其改进定位性能的潜力。然而,大部分文献都集中在单图像定位上,而忽略了序列数据的可用性。本文的目标是展示图像序列在具有挑战性的场景中的潜力,例如,在昼夜或季节变化下。结合文献中的思想,我们描述了一种基于序列的定位管道,该管道将里程计与粗定位模块和精细定位模块相结合。在长期定位数据集上的实验表明,将基于预构建地图的单图像全局定位与视觉里程计/ SLAM管道相结合可以提高性能,从而可以考虑解决扩展的CMU Seasons数据集。我们表明SIFT特征可以在我们的框架中与现代最先进的特征相媲美,尽管它要弱得多,计算速度要快得多。我们的代码可以在github.com/rulllars上公开获得。
{"title":"Using Image Sequences for Long-Term Visual Localization","authors":"Erik Stenborg, Torsten Sattler, Lars Hammarstrand","doi":"10.1109/3DV50981.2020.00104","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00104","url":null,"abstract":"Estimating the pose of a camera in a known scene, i.e., visual localization, is a core task for applications such as self-driving cars. In many scenarios, image sequences are available and existing work on combining single-image localization with odometry offers to unlock their potential for improving localization performance. Still, the largest part of the literature focuses on single-image localization and ignores the availability of sequence data. The goal of this paper is to demonstrate the potential of image sequences in challenging scenarios, e.g., under day-night or seasonal changes. Combining ideas from the literature, we describe a sequence-based localization pipeline that combines odometry with both a coarse and a fine localization module. Experiments on long-term localization datasets show that combining single-image global localization against a prebuilt map with a visual odometry / SLAM pipeline improves performance to a level where the extended CMU Seasons dataset can be considered solved. We show that SIFT features can perform on par with modern state-of-the-art features in our framework, despite being much weaker and a magnitude faster to compute. Our code is publicly available at github.com/rulllars.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131460905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Recalibration of Neural Networks for Point Cloud Analysis 点云分析中神经网络的再标定
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00054
Ignacio Sarasua, Sebastian Pölsterl, C. Wachinger
Spatial and channel re-calibration have become powerful concepts in computer vision. Their ability to capture long-range dependencies is especially useful for those networks that extract local features, such as CNNs. While recalibration has been widely studied for image analysis, it has not yet been used on shape representations. In this work, we introduce re-calibration modules on deep neural networks for 3D point clouds. We propose a set of re-calibration blocks that extend Squeeze and Excitation blocks [11] and that can be added to any network for 3D point cloud analysis that builds a global descriptor by hierarchically combining features from multiple local neighborhoods. We run two sets of experiments to validate our approach. First, we demonstrate the benefit and versatility of our proposed modules by incorporating them into three state-of-the-art networks for 3D point cloud analysis: PointNet++ [22], DGCNN [29], and RSCNN [18]. We evaluate each network on two tasks: object classification on ModelNet40, and object part segmentation on ShapeNet. Our results show an improvement of up to 1% in accuracy for ModelNet40 compared to the baseline method. In the second set of experiments, we investigate the benefits of re-calibration blocks on Alzheimer’s Disease (AD) diagnosis. Our results demonstrate that our proposed methods yield a 2% increase in accuracy for diagnosing AD and a 2.3% increase in concordance index for predicting AD onset with time-to-event analysis. Concluding, re-calibration improves the accuracy of point cloud architectures, while only minimally increasing the number of parameters.
空间和通道重新校准已经成为计算机视觉中一个非常重要的概念。它们捕获远程依赖关系的能力对于那些提取局部特征的网络(如cnn)尤其有用。虽然重新校准在图像分析中得到了广泛的研究,但它还没有被用于形状表示。在这项工作中,我们引入了三维点云的深度神经网络重新校准模块。我们提出了一组重新校准块,扩展了挤压和激励块[11],可以添加到任何网络中进行3D点云分析,通过分层组合来自多个局部邻域的特征来构建全局描述符。我们进行了两组实验来验证我们的方法。首先,我们展示了我们提出的模块的好处和多功能性,将它们纳入三个最先进的3D点云分析网络:pointnet++ [22], DGCNN[29]和RSCNN[18]。我们在两个任务上评估每个网络:ModelNet40上的对象分类和ShapeNet上的对象部分分割。我们的结果表明,与基线方法相比,ModelNet40的精度提高了1%。在第二组实验中,我们研究了重新校准块对阿尔茨海默病(AD)诊断的益处。我们的研究结果表明,我们提出的方法诊断AD的准确率提高了2%,通过事件时间分析预测AD发病的一致性指数提高了2.3%。综上所述,重新校准提高了点云架构的精度,而只增加了最小的参数数量。
{"title":"Recalibration of Neural Networks for Point Cloud Analysis","authors":"Ignacio Sarasua, Sebastian Pölsterl, C. Wachinger","doi":"10.1109/3DV50981.2020.00054","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00054","url":null,"abstract":"Spatial and channel re-calibration have become powerful concepts in computer vision. Their ability to capture long-range dependencies is especially useful for those networks that extract local features, such as CNNs. While recalibration has been widely studied for image analysis, it has not yet been used on shape representations. In this work, we introduce re-calibration modules on deep neural networks for 3D point clouds. We propose a set of re-calibration blocks that extend Squeeze and Excitation blocks [11] and that can be added to any network for 3D point cloud analysis that builds a global descriptor by hierarchically combining features from multiple local neighborhoods. We run two sets of experiments to validate our approach. First, we demonstrate the benefit and versatility of our proposed modules by incorporating them into three state-of-the-art networks for 3D point cloud analysis: PointNet++ [22], DGCNN [29], and RSCNN [18]. We evaluate each network on two tasks: object classification on ModelNet40, and object part segmentation on ShapeNet. Our results show an improvement of up to 1% in accuracy for ModelNet40 compared to the baseline method. In the second set of experiments, we investigate the benefits of re-calibration blocks on Alzheimer’s Disease (AD) diagnosis. Our results demonstrate that our proposed methods yield a 2% increase in accuracy for diagnosing AD and a 2.3% increase in concordance index for predicting AD onset with time-to-event analysis. Concluding, re-calibration improves the accuracy of point cloud architectures, while only minimally increasing the number of parameters.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131328905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Semantic Implicit Neural Scene Representations With Semi-Supervised Training 基于半监督训练的语义隐式神经场景表示
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00052
Amit Kohli, V. Sitzmann, Gordon Wetzstein
The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) [67] is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.
隐式神经场景表示最近的成功为我们捕获和存储3D场景提供了一种可行的新方法。与传统的3D表示(如点云)不同,点云明确地将场景属性存储在离散的局部单元中,这些隐式表示在神经网络的权重中编码场景,该神经网络可以在任何坐标上查询以产生相同的场景属性。到目前为止,隐式表示主要被优化为仅估计场景中的外观和/或3D几何信息。我们采取下一步并证明现有的隐式表示(srn)[67]实际上是多模态的;可以进一步利用它来执行逐点语义分割,同时保留其表示外观和几何形状的能力。为了实现这种多模态行为,我们在现有的预训练场景表示上使用了半监督学习策略。我们的方法简单、通用,只需要几十个带标签的二维分割掩码就可以实现密集的三维语义分割。我们探索了这种语义感知的隐式神经场景表示的两种新应用:仅给定单个输入RGB图像或2D标签掩码的3D新视图和语义标签合成,以及外观和语义的3D插值。
{"title":"Semantic Implicit Neural Scene Representations With Semi-Supervised Training","authors":"Amit Kohli, V. Sitzmann, Gordon Wetzstein","doi":"10.1109/3DV50981.2020.00052","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00052","url":null,"abstract":"The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) [67] is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116267631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Depthwise Separable Temporal Convolutional Network for Action Segmentation 用于动作分割的深度可分时态卷积网络
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00073
Basavaraj Hampiholi, Christian Jarvers, W. Mader, H. Neumann
Fine-grained temporal action segmentation in long, untrimmed RGB videos is a key topic in visual human-machine interaction. Recent temporal convolution based approaches either use encoder-decoder(ED) architecture or dilations with doubling factor in consecutive convolution layers to segment actions in videos. However ED networks operate on low temporal resolution and the dilations in successive layers cause gridding artifacts problem. We propose depthwise separable temporal convolution network (DS-TCN) that operates on full temporal resolution and with reduced gridding effects. The basic component of DS-TCN is residual depthwise dilated block (RDDB). We explore the trade-off between large kernels and small dilation rates using RDDB. We show that our DS-TCN is capable of capturing long-term dependencies as well as local temporal cues efficiently. Our evaluation on three benchmark datasets, GTEA, 50Salads, and Breakfast demonstrates that DS-TCN outperforms the existing ED-TCN and dilation based TCN baselines even with comparatively fewer parameters.
在长、未修剪的RGB视频中进行细粒度时间动作分割是视觉人机交互中的一个关键问题。最近基于时间卷积的方法要么使用编码器-解码器(ED)架构,要么在连续卷积层中使用倍因子扩展来分割视频中的动作。然而,在低时间分辨率下,连续层的膨胀会导致网格伪影问题。我们提出了深度可分离的时间卷积网络(DS-TCN),它在全时间分辨率和减少网格效应下工作。DS-TCN的基本成分是残余深度扩张块(RDDB)。我们使用RDDB探索大内核和小膨胀率之间的权衡。我们表明,我们的DS-TCN能够有效地捕获长期依赖关系以及局部时间线索。我们对GTEA、50salad和Breakfast三个基准数据集的评估表明,即使参数相对较少,DS-TCN也优于现有的ED-TCN和基于扩张的TCN基线。
{"title":"Depthwise Separable Temporal Convolutional Network for Action Segmentation","authors":"Basavaraj Hampiholi, Christian Jarvers, W. Mader, H. Neumann","doi":"10.1109/3DV50981.2020.00073","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00073","url":null,"abstract":"Fine-grained temporal action segmentation in long, untrimmed RGB videos is a key topic in visual human-machine interaction. Recent temporal convolution based approaches either use encoder-decoder(ED) architecture or dilations with doubling factor in consecutive convolution layers to segment actions in videos. However ED networks operate on low temporal resolution and the dilations in successive layers cause gridding artifacts problem. We propose depthwise separable temporal convolution network (DS-TCN) that operates on full temporal resolution and with reduced gridding effects. The basic component of DS-TCN is residual depthwise dilated block (RDDB). We explore the trade-off between large kernels and small dilation rates using RDDB. We show that our DS-TCN is capable of capturing long-term dependencies as well as local temporal cues efficiently. Our evaluation on three benchmark datasets, GTEA, 50Salads, and Breakfast demonstrates that DS-TCN outperforms the existing ED-TCN and dilation based TCN baselines even with comparatively fewer parameters.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122776509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Simulated Annealing for 3D Shape Correspondence 三维形状对应的模拟退火
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00035
Benjamin Holzschuh, Zorah Lähner, D. Cremers
We propose to use Simulated Annealing to solve the correspondence problem between near-isometric 3D shapes. Our method gains efficiency through quickly upsampling a sparse correspondence by minimizing the embedding error of new samples on the surfaces and applying simulated annealing to refine the result. The algorithm alternates between sampling additional points on the surface and swapping points within the current solution according to Simulated Annealing theory. Simulated Annealing is a probabilistic method and less prone to get stuck in local extrema which allows us to obtain good results on the NPhard quadratic assignment problem} (QAP). Our method can be used as a stand-alone correspondence pipeline through an initial seed generator as well as to densify a set of sparse input matches. Furthermore, the use of locality sensitive hashing to approximate geodesic distances reduces the computational complexity and memory consumption significantly. This allows our algorithm to run on meshes with over 100k points, an accomplishment that few approaches tackling the QAP directly achieve. We show convincing results on datasets like TOSCA and SHREC’19 Connecitvity.
我们提出使用模拟退火来解决近等距三维形状之间的对应问题。我们的方法通过最小化新样本在表面上的嵌入误差并应用模拟退火来改进结果,从而快速对稀疏对应进行上采样,从而提高了效率。该算法根据模拟退火理论,在表面上的附加点采样和当前解内的交换点之间交替进行。模拟退火是一种概率方法,不容易陷入局部极值,这使得我们在NPhard二次分配问题(QAP)上获得了很好的结果。我们的方法可以通过一个初始种子生成器作为一个独立的通信管道,也可以密集化一组稀疏的输入匹配。此外,使用局域敏感散列来近似测地线距离可以显著降低计算复杂度和内存消耗。这允许我们的算法在超过100,000个点的网格上运行,这是很少有方法直接解决QAP的成就。我们在TOSCA和SHREC ' 19连接性等数据集上展示了令人信服的结果。
{"title":"Simulated Annealing for 3D Shape Correspondence","authors":"Benjamin Holzschuh, Zorah Lähner, D. Cremers","doi":"10.1109/3DV50981.2020.00035","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00035","url":null,"abstract":"We propose to use Simulated Annealing to solve the correspondence problem between near-isometric 3D shapes. Our method gains efficiency through quickly upsampling a sparse correspondence by minimizing the embedding error of new samples on the surfaces and applying simulated annealing to refine the result. The algorithm alternates between sampling additional points on the surface and swapping points within the current solution according to Simulated Annealing theory. Simulated Annealing is a probabilistic method and less prone to get stuck in local extrema which allows us to obtain good results on the NPhard quadratic assignment problem} (QAP). Our method can be used as a stand-alone correspondence pipeline through an initial seed generator as well as to densify a set of sparse input matches. Furthermore, the use of locality sensitive hashing to approximate geodesic distances reduces the computational complexity and memory consumption significantly. This allows our algorithm to run on meshes with over 100k points, an accomplishment that few approaches tackling the QAP directly achieve. We show convincing results on datasets like TOSCA and SHREC’19 Connecitvity.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123475521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Adversarial Self-Supervised Scene Flow Estimation 对抗性自监督场景流估计
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00115
Victor Zuanazzi, Joris van Vugt, O. Booij, P. Mettes
This work proposes a metric learning approach for self-supervised scene flow estimation. Scene flow estimation is the task of estimating 3D flow vectors for consecutive 3D point clouds. Such flow vectors are fruitful, e.g. for recognizing actions, or avoiding collisions. Training a neural network via supervised learning for scene flow is impractical, as this requires manual annotations for each 3D point at each new timestamp for each scene. To that end, we seek for a self-supervised approach, where a network learns a latent metric to distinguish between points translated by flow estimations and the target point cloud. Our adversarial metric learning includes a multi-scale triplet loss on sequences of two-point clouds as well as a cycle consistency loss. Furthermore, we outline a benchmark for self-supervised scene flow estimation: the Scene Flow Sandbox. The benchmark consists of five datasets designed to study individual aspects of flow estimation in progressive order of complexity, from a moving object to real-world scenes. Experimental evaluation on the benchmark shows that our approach obtains state-of-the-art self-supervised scene flow results, outperforming recent neighbor-based approaches. We use our proposed benchmark to expose shortcomings and draw insights on various training setups. We find that our setup captures motion coherence and preserves local geometries. Dealing with occlusions, on the other hand, is still an open challenge.
本文提出了一种基于度量学习的自监督场景流估计方法。场景流估计是对连续的三维点云进行三维流矢量估计的任务。这样的流向量是富有成效的,例如用于识别动作或避免碰撞。通过监督学习对场景流进行训练神经网络是不切实际的,因为这需要在每个场景的每个新时间戳上对每个3D点进行手动注释。为此,我们寻求一种自监督方法,其中网络学习潜在度量来区分由流量估计翻译的点和目标点云。我们的对抗性度量学习包括两点云序列上的多尺度三重态损失以及周期一致性损失。此外,我们概述了一个自我监督场景流估计的基准:场景流沙盒。该基准测试由五个数据集组成,旨在按照复杂程度的先后顺序研究流量估计的各个方面,从移动的物体到现实世界的场景。在基准上的实验评估表明,我们的方法获得了最先进的自监督场景流结果,优于最近的基于邻居的方法。我们使用我们提出的基准来揭示各种培训设置的缺点和见解。我们发现我们的设置捕获运动相干性并保留局部几何形状。另一方面,处理闭塞仍然是一个开放的挑战。
{"title":"Adversarial Self-Supervised Scene Flow Estimation","authors":"Victor Zuanazzi, Joris van Vugt, O. Booij, P. Mettes","doi":"10.1109/3DV50981.2020.00115","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00115","url":null,"abstract":"This work proposes a metric learning approach for self-supervised scene flow estimation. Scene flow estimation is the task of estimating 3D flow vectors for consecutive 3D point clouds. Such flow vectors are fruitful, e.g. for recognizing actions, or avoiding collisions. Training a neural network via supervised learning for scene flow is impractical, as this requires manual annotations for each 3D point at each new timestamp for each scene. To that end, we seek for a self-supervised approach, where a network learns a latent metric to distinguish between points translated by flow estimations and the target point cloud. Our adversarial metric learning includes a multi-scale triplet loss on sequences of two-point clouds as well as a cycle consistency loss. Furthermore, we outline a benchmark for self-supervised scene flow estimation: the Scene Flow Sandbox. The benchmark consists of five datasets designed to study individual aspects of flow estimation in progressive order of complexity, from a moving object to real-world scenes. Experimental evaluation on the benchmark shows that our approach obtains state-of-the-art self-supervised scene flow results, outperforming recent neighbor-based approaches. We use our proposed benchmark to expose shortcomings and draw insights on various training setups. We find that our setup captures motion coherence and preserves local geometries. Dealing with occlusions, on the other hand, is still an open challenge.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129000386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2020 International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1