首页 > 最新文献

2020 International Conference on 3D Vision (3DV)最新文献

英文 中文
Differential Photometric Consistency 差光度一致性
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00023
Hongyi Fan, B. Kunsberg, B. Kimia
A key bottleneck in the use of Multiview Stereo (MVS) to produce high quality reconstructions is the gaps arising from textureless, shaded areas and lack of fine-scale detail. Shape-from-Shading (SfS) has been used in conjunction with MVS to obtain fine-scale detail and veridical reconstruction in the gap areas. The similarity metric that gauges candidate correspondences is critical to this process, typically a combination of photometric consistency and brightness gradient constancy. Two observations motivate this paper. First, brightness gradient constancy can be erroneous due to foreshortening. Second, the standard ZSSD/NCC patchwise photometric consistency measures when applied to shaded areas is, to a first-order approximation, a calculation of brightness gradient differences, which can be subject to foreshortening. The paper proposes a novel trinocular differential photometric consistency that constrains the brightness gradients in three views so that the image gradient in one view is completely determined by the image gradients at corresponding points in the the other two views. The theoretical developments here advocate the integration of this new measure, whose viability in practice has been demonstrated in a set of illustrative numerical experiments.
使用多视图立体(MVS)产生高质量重建的一个关键瓶颈是由无纹理、阴影区域和缺乏精细尺度细节引起的间隙。Shape-from-Shading (SfS)与MVS结合使用,以获得间隙区域的精细细节和验证重建。测量候选对应的相似性度量对这一过程至关重要,通常是光度一致性和亮度梯度恒定的组合。这篇论文的动机有两点。首先,由于预缩,亮度梯度恒定可能是错误的。其次,当应用于阴影区域时,标准的ZSSD/NCC补丁光度一致性测量是一阶近似,亮度梯度差异的计算,可能会受到缩短的影响。本文提出了一种新的三视差分光度一致性方法,该方法对三个视图中的亮度梯度进行约束,使一个视图中的图像梯度完全由另外两个视图中对应点的图像梯度决定。这里的理论发展主张整合这种新措施,其在实践中的可行性已在一组说明性数值实验中得到证明。
{"title":"Differential Photometric Consistency","authors":"Hongyi Fan, B. Kunsberg, B. Kimia","doi":"10.1109/3DV50981.2020.00023","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00023","url":null,"abstract":"A key bottleneck in the use of Multiview Stereo (MVS) to produce high quality reconstructions is the gaps arising from textureless, shaded areas and lack of fine-scale detail. Shape-from-Shading (SfS) has been used in conjunction with MVS to obtain fine-scale detail and veridical reconstruction in the gap areas. The similarity metric that gauges candidate correspondences is critical to this process, typically a combination of photometric consistency and brightness gradient constancy. Two observations motivate this paper. First, brightness gradient constancy can be erroneous due to foreshortening. Second, the standard ZSSD/NCC patchwise photometric consistency measures when applied to shaded areas is, to a first-order approximation, a calculation of brightness gradient differences, which can be subject to foreshortening. The paper proposes a novel trinocular differential photometric consistency that constrains the brightness gradients in three views so that the image gradient in one view is completely determined by the image gradients at corresponding points in the the other two views. The theoretical developments here advocate the integration of this new measure, whose viability in practice has been demonstrated in a set of illustrative numerical experiments.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116861183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RidgeSfM: Structure from Motion via Robust Pairwise Matching Under Depth Uncertainty RidgeSfM:基于深度不确定性的鲁棒成对匹配的运动结构
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00075
Benjamin Graham, David Novotný
We consider the problem of simultaneously estimating a dense depth map and camera pose for a large set of images of an indoor scene. While classical SfM pipelines rely on a two-step approach where cameras are first estimated using a bundle adjustment in order to ground the ensuing multi-view stereo stage, both our poses and dense reconstructions are a direct output of an altered bundle adjuster. To this end, we parametrize each depth map with a linear combination of a limited number of basis “depth-planes” predicted in a monocular fashion by a deep net. Using a set of high-quality sparse keypoint matches, we optimize over the per-frame linear combinations of depth planes and camera poses to form a geometrically consistent cloud of keypoints. Although our bundle adjustment only considers sparse keypoints, the inferred linear coefficients of the basis planes immediately give us dense depth maps. RidgeSfM is able to collectively align hundreds of frames, which is its main advantage over recent memory-heavy deep alternatives that are typically capable of aligning no more than 10 frames. Quantitative comparisons reveal performance superior to a state-of-the-art large-scale SfM pipeline.
我们考虑了同时估计大量室内场景图像的密集深度图和相机姿态的问题。而经典的SfM管道依赖于一个两步的方法,其中相机首先估计使用束调整,以便接地随后的多视图立体舞台,我们的姿势和密集重建是一个改变束调整器的直接输出。为此,我们用深度网络以单目方式预测的有限数量的基本“深度平面”的线性组合来参数化每个深度图。使用一组高质量的稀疏关键点匹配,我们优化了深度平面和相机姿势的每帧线性组合,以形成几何上一致的关键点云。虽然我们的束平差只考虑稀疏的关键点,但基平面的推断线性系数立即给出了密集的深度图。RidgeSfM能够集体对齐数百帧,这是它相对于最近内存繁重的深度替代方案的主要优势,后者通常能够对齐不超过10帧。定量比较显示性能优于最先进的大型SfM管道。
{"title":"RidgeSfM: Structure from Motion via Robust Pairwise Matching Under Depth Uncertainty","authors":"Benjamin Graham, David Novotný","doi":"10.1109/3DV50981.2020.00075","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00075","url":null,"abstract":"We consider the problem of simultaneously estimating a dense depth map and camera pose for a large set of images of an indoor scene. While classical SfM pipelines rely on a two-step approach where cameras are first estimated using a bundle adjustment in order to ground the ensuing multi-view stereo stage, both our poses and dense reconstructions are a direct output of an altered bundle adjuster. To this end, we parametrize each depth map with a linear combination of a limited number of basis “depth-planes” predicted in a monocular fashion by a deep net. Using a set of high-quality sparse keypoint matches, we optimize over the per-frame linear combinations of depth planes and camera poses to form a geometrically consistent cloud of keypoints. Although our bundle adjustment only considers sparse keypoints, the inferred linear coefficients of the basis planes immediately give us dense depth maps. RidgeSfM is able to collectively align hundreds of frames, which is its main advantage over recent memory-heavy deep alternatives that are typically capable of aligning no more than 10 frames. Quantitative comparisons reveal performance superior to a state-of-the-art large-scale SfM pipeline.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121033629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FC-vSLAM: Integrating Feature Credibility in Visual SLAM FC-vSLAM:集成视觉SLAM中的特征可信度
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00106
Shuai Xie, Wei Ma, Qiuyuan Wang, Ruchang Xu, H. Zha
Feature-based visual SLAM (vSLAM) systems compute camera poses and scene maps by detecting and matching 2D features, mostly being points and line segments, from image sequences. These systems often suffer from unreliable detections. In this paper, we define feature credibility (FC) for both points and line segments, formulate it into vSLAMs and develop an FC-vSLAM system based on the widely used ORB-SLAM framework. Compared with existing credibility definitions, the proposed one, considering both temporal observation stability and perspective triangulation reliability, is more comprehensive. We formulate the credibility in our SLAM system to suppress the influences from unreliable features on the pose and map optimization. We also present a way to improve the line end observations by their multi-view correspondences, to improve the integrity of the 3D maps. Experiments on both the TUM and 7-Scenes datasets demonstrate that our feature credibility and the multi-view line optimization are effective; the developed FC-vSLAM system outperforms existing popular feature-based systems in both localization and mapping.
基于特征的视觉SLAM (vSLAM)系统通过检测和匹配图像序列中的2D特征(主要是点和线段)来计算相机姿势和场景地图。这些系统经常受到不可靠检测的影响。在本文中,我们定义了点和线段的特征可信度(FC),并将其形式化为特征可信度(FC - vslam),并基于广泛使用的ORB-SLAM框架开发了FC- vslam系统。与已有的可信度定义相比,本文提出的可信度定义兼顾了时间观测稳定性和视角三角测量的可靠性,更加全面。我们在SLAM系统中制定了可信度来抑制不可靠特征对姿态和地图优化的影响。我们还提出了一种通过多视图对应来改善线端观测的方法,以提高3D地图的完整性。在TUM和7- scene数据集上的实验表明,我们的特征可信度和多视图线优化是有效的;开发的FC-vSLAM系统在定位和绘图方面都优于现有流行的基于特征的系统。
{"title":"FC-vSLAM: Integrating Feature Credibility in Visual SLAM","authors":"Shuai Xie, Wei Ma, Qiuyuan Wang, Ruchang Xu, H. Zha","doi":"10.1109/3DV50981.2020.00106","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00106","url":null,"abstract":"Feature-based visual SLAM (vSLAM) systems compute camera poses and scene maps by detecting and matching 2D features, mostly being points and line segments, from image sequences. These systems often suffer from unreliable detections. In this paper, we define feature credibility (FC) for both points and line segments, formulate it into vSLAMs and develop an FC-vSLAM system based on the widely used ORB-SLAM framework. Compared with existing credibility definitions, the proposed one, considering both temporal observation stability and perspective triangulation reliability, is more comprehensive. We formulate the credibility in our SLAM system to suppress the influences from unreliable features on the pose and map optimization. We also present a way to improve the line end observations by their multi-view correspondences, to improve the integrity of the 3D maps. Experiments on both the TUM and 7-Scenes datasets demonstrate that our feature credibility and the multi-view line optimization are effective; the developed FC-vSLAM system outperforms existing popular feature-based systems in both localization and mapping.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126094256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep LiDAR localization using optical flow sensor-map correspondences 使用光流传感器-地图对应的深度激光雷达定位
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00094
Anders Sunegård, L. Svensson, Torsten Sattler
In this paper we propose a method for accurate localization of a multi-layer LiDAR sensor in a pre-recorded map, given a coarse initialization pose. The foundation of the algorithm is the usage of neural network optical flow predictions. We train a network to encode representations of the sensor measurement and the map, and then regress flow vectors at each spatial position in the sensor feature map. The flow regression network is straight-forward to train, and the resulting flow field can be used with standard techniques for computing sensor pose from sensor-to-map correspondences. Additionally, the network can regress flow at different spatial scales, which means that it is able to handle both position recovery and high accuracy localization. We demonstrate average localization accuracy of $lt 0.04{mathrm {m}}$ position and $lt 0.1^{circ}$ heading angle for a vehicle driving application with simulated LiDAR measurements, which is similar to point-to-point iterative closest point (ICP). The algorithm typically manages to recover position with prior error of more than 20m and is significantly more robust to scenes with non-salient or repetitive structure than the baselines used for comparison.
在本文中,我们提出了一种在预先记录的地图中精确定位多层激光雷达传感器的方法,给出了粗糙的初始化姿态。该算法的基础是利用神经网络进行光流预测。我们训练一个网络来编码传感器测量和地图的表示,然后在传感器特征地图的每个空间位置回归流向量。流回归网络可以直接训练,得到的流场可以与标准技术一起使用,从传感器到地图的对应关系计算传感器位姿。此外,该网络可以在不同的空间尺度上对流量进行回归,这意味着它可以同时处理位置恢复和高精度定位。我们通过模拟LiDAR测量证明了车辆驾驶应用的平均定位精度为$lt 0.04{ mathm {m}}$位置和$lt 0.1^{circ}$航向角,这与点对点迭代最近点(ICP)相似。该算法通常能够在先验误差大于20m的情况下恢复位置,并且对具有非显著或重复结构的场景的鲁棒性明显优于用于比较的基线。
{"title":"Deep LiDAR localization using optical flow sensor-map correspondences","authors":"Anders Sunegård, L. Svensson, Torsten Sattler","doi":"10.1109/3DV50981.2020.00094","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00094","url":null,"abstract":"In this paper we propose a method for accurate localization of a multi-layer LiDAR sensor in a pre-recorded map, given a coarse initialization pose. The foundation of the algorithm is the usage of neural network optical flow predictions. We train a network to encode representations of the sensor measurement and the map, and then regress flow vectors at each spatial position in the sensor feature map. The flow regression network is straight-forward to train, and the resulting flow field can be used with standard techniques for computing sensor pose from sensor-to-map correspondences. Additionally, the network can regress flow at different spatial scales, which means that it is able to handle both position recovery and high accuracy localization. We demonstrate average localization accuracy of $lt 0.04{mathrm {m}}$ position and $lt 0.1^{circ}$ heading angle for a vehicle driving application with simulated LiDAR measurements, which is similar to point-to-point iterative closest point (ICP). The algorithm typically manages to recover position with prior error of more than 20m and is significantly more robust to scenes with non-salient or repetitive structure than the baselines used for comparison.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123565908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Wasserstein Isometric Embedding for Point Clouds 点云的Wasserstein等距嵌入学习
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00057
Keisuke Kawano, Satoshi Koide, Takuro Kutsuna
The Wasserstein distance has been employed for determining the distance between point clouds, which have variable numbers of points and invariance of point order. However, the high computational cost associated with the Wasserstein distance hinders its practical applications for large-scale datasets. We propose a new embedding method for point clouds, which aims to embed point clouds into a Euclidean space, isometric to the Wasserstein space defined on the point clouds. In numerical experiments, we demonstrate that the point clouds decoded from the Euclidean averages and the interpolations in the embedding space accurately mimic the Wasserstein barycenters and interpolations of the point clouds. Furthermore, we show that the embedding vectors can be utilized as inputs for machine learning models (e.g., principal component analysis and neural networks).
采用Wasserstein距离来确定点云之间的距离,点云的点数是可变的,点的顺序是不变的。然而,与Wasserstein距离相关的高计算成本阻碍了其在大规模数据集上的实际应用。提出了一种新的点云嵌入方法,该方法将点云嵌入到欧几里德空间中,该空间与点云上定义的沃瑟斯坦空间等距。在数值实验中,我们证明了由欧几里得平均解码的点云和嵌入空间内的插值能够准确地模拟点云的Wasserstein质心和插值。此外,我们表明嵌入向量可以用作机器学习模型(例如,主成分分析和神经网络)的输入。
{"title":"Learning Wasserstein Isometric Embedding for Point Clouds","authors":"Keisuke Kawano, Satoshi Koide, Takuro Kutsuna","doi":"10.1109/3DV50981.2020.00057","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00057","url":null,"abstract":"The Wasserstein distance has been employed for determining the distance between point clouds, which have variable numbers of points and invariance of point order. However, the high computational cost associated with the Wasserstein distance hinders its practical applications for large-scale datasets. We propose a new embedding method for point clouds, which aims to embed point clouds into a Euclidean space, isometric to the Wasserstein space defined on the point clouds. In numerical experiments, we demonstrate that the point clouds decoded from the Euclidean averages and the interpolations in the embedding space accurately mimic the Wasserstein barycenters and interpolations of the point clouds. Furthermore, we show that the embedding vectors can be utilized as inputs for machine learning models (e.g., principal component analysis and neural networks).","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129806098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Localising In Complex Scenes Using Balanced Adversarial Adaptation 在复杂场景中使用平衡对抗适应进行定位
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00116
Gil Avraham, Yan Zuo, T. Drummond
Domain adaptation and generative modelling have collectively mitigated the expensive nature of data collection and labelling by leveraging the rich abundance of accurate, labelled data in simulation environments. In this work, we study the performance gap that exists between representations optimised for localisation on simulation environments and the application of such representations in a real-world setting. Our method exploits the shared geometric similarities between simulation and real-world environments whilst maintaining invariance towards visual discrepancies. This is achieved by optimising a representation extractor to project both simulated and real representations into a shared representation space. Our method uses a symmetrical adversarial approach which encourages the representation extractor to conceal the domain that features are extracted from and simultaneously preserves robust attributes between source and target domains that are beneficial for localisation. We evaluate our method by adapting representations optimised for indoor Habitat simulated environments (Matterport3D and Replica) to a real-world indoor environment (Active Vision Dataset), showing that it compares favourably against fully-supervised approaches.
领域适应和生成建模通过利用模拟环境中丰富的准确、标记数据,共同减轻了数据收集和标记的昂贵性质。在这项工作中,我们研究了在模拟环境中为本地化优化的表示与在现实世界中应用这种表示之间存在的性能差距。我们的方法利用模拟和现实世界环境之间共享的几何相似性,同时保持对视觉差异的不变性。这是通过优化表示提取器来实现的,该提取器将模拟和真实表示投影到共享表示空间中。我们的方法使用对称对抗方法,该方法鼓励表示提取器隐藏特征提取的域,同时保留源域和目标域之间有利于定位的鲁棒属性。我们通过将为室内栖息地模拟环境(Matterport3D和Replica)优化的表示调整到现实世界的室内环境(主动视觉数据集)来评估我们的方法,表明它比完全监督的方法更有利。
{"title":"Localising In Complex Scenes Using Balanced Adversarial Adaptation","authors":"Gil Avraham, Yan Zuo, T. Drummond","doi":"10.1109/3DV50981.2020.00116","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00116","url":null,"abstract":"Domain adaptation and generative modelling have collectively mitigated the expensive nature of data collection and labelling by leveraging the rich abundance of accurate, labelled data in simulation environments. In this work, we study the performance gap that exists between representations optimised for localisation on simulation environments and the application of such representations in a real-world setting. Our method exploits the shared geometric similarities between simulation and real-world environments whilst maintaining invariance towards visual discrepancies. This is achieved by optimising a representation extractor to project both simulated and real representations into a shared representation space. Our method uses a symmetrical adversarial approach which encourages the representation extractor to conceal the domain that features are extracted from and simultaneously preserves robust attributes between source and target domains that are beneficial for localisation. We evaluate our method by adapting representations optimised for indoor Habitat simulated environments (Matterport3D and Replica) to a real-world indoor environment (Active Vision Dataset), showing that it compares favourably against fully-supervised approaches.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129243314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Based Single-Photon 3D Imaging with Multiple Returns 基于深度学习的多收益单光子三维成像
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00130
Hao Tan, Jiayong Peng, Zhiwei Xiong, Dong Liu, Xin Huang, Zheng-Ping Li, Yu Hong, Feihu Xu
photon avalanche diode (SPAD) has been widely used in active 3D imaging due to its extremely high photon sensitivity and picosecond time resolution. However, long-range active 3D imaging is still a great challenge, since only a few signal photons mixed with strong background noise can return from multiple reflectors of the scene due to the divergence of the light beam and the receiver’s field of view (FoV), which would bring considerable distortion and blur to the recovered depth map. In this paper, we propose a deep learning based depth reconstruction method for long range single-photon 3D imaging where the “multiple-returns” issue exists. Specifically, we model this problem as a deblurring task and design a multi-scale convolutional neural network combined with elaborate loss functions, which promote the reconstruction of an accurate depth map with fine details and clear boundaries of objects. The proposed method achieves superior performance over several different sizes of receiver’s FoV on a synthetic dataset compared with existing state-of-the-art methods and the trained model under a specific FoV has a strong generalization capability across different sizes of FoV, which is essential for practical applications. Moreover, we conduct outdoor experiments and demonstrate the effectiveness of our method in a real-world long range imaging system.
光子雪崩二极管(SPAD)由于具有极高的光子灵敏度和皮秒时间分辨率,在有源三维成像中得到了广泛的应用。然而,远程主动三维成像仍然是一个巨大的挑战,由于光束和接收器的视场(FoV)的发散,从场景的多个反射器返回的信号光子中只有少数混合了强背景噪声,这将给恢复的深度图带来相当大的失真和模糊。在本文中,我们提出了一种基于深度学习的深度重建方法,用于存在“多次返回”问题的远程单光子三维成像。具体而言,我们将该问题建模为去模糊任务,并设计了一种结合精细损失函数的多尺度卷积神经网络,以促进重建具有精细细节和清晰物体边界的精确深度图。与现有的先进方法相比,该方法在合成数据集上对多种不同大小的接收机视场都具有更好的性能,并且在特定视场下训练的模型在不同大小的视场上具有较强的泛化能力,这对实际应用至关重要。此外,我们进行了室外实验,并在现实世界的远程成像系统中验证了我们的方法的有效性。
{"title":"Deep Learning Based Single-Photon 3D Imaging with Multiple Returns","authors":"Hao Tan, Jiayong Peng, Zhiwei Xiong, Dong Liu, Xin Huang, Zheng-Ping Li, Yu Hong, Feihu Xu","doi":"10.1109/3DV50981.2020.00130","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00130","url":null,"abstract":"photon avalanche diode (SPAD) has been widely used in active 3D imaging due to its extremely high photon sensitivity and picosecond time resolution. However, long-range active 3D imaging is still a great challenge, since only a few signal photons mixed with strong background noise can return from multiple reflectors of the scene due to the divergence of the light beam and the receiver’s field of view (FoV), which would bring considerable distortion and blur to the recovered depth map. In this paper, we propose a deep learning based depth reconstruction method for long range single-photon 3D imaging where the “multiple-returns” issue exists. Specifically, we model this problem as a deblurring task and design a multi-scale convolutional neural network combined with elaborate loss functions, which promote the reconstruction of an accurate depth map with fine details and clear boundaries of objects. The proposed method achieves superior performance over several different sizes of receiver’s FoV on a synthetic dataset compared with existing state-of-the-art methods and the trained model under a specific FoV has a strong generalization capability across different sizes of FoV, which is essential for practical applications. Moreover, we conduct outdoor experiments and demonstrate the effectiveness of our method in a real-world long range imaging system.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121771878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Benchmarking Image Retrieval for Visual Localization 面向视觉定位的基准图像检索
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00058
No'e Pion, M. Humenberger, G. Csurka, Yohann Cabon, Torsten Sattler
Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two tasks: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for these tasks. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes. However, robustness to viewpoint changes is not necessarily desirable in the context of visual localization. This paper focuses on understanding the role of image retrieval for multiple visual localization tasks. We introduce a benchmark setup and compare state-of-the-art retrieval representations on multiple datasets. We show that retrieval performance on classical landmark retrieval/recognition tasks correlates only for some but not all tasks to localization performance. This indicates a need for retrieval approaches specifically designed for localization tasks. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.
视觉定位,即在已知场景下的相机姿态估计,是自动驾驶和增强现实等技术的核心组成部分。最先进的定位方法通常依赖于图像检索技术来完成以下两个任务之一:(1)提供近似的姿态估计或(2)确定在给定的查询图像中场景的哪些部分可能是可见的。通常的做法是使用最先进的图像检索算法来完成这些任务。这些算法通常是为了在大范围的视点变化下检索相同的地标而训练的。然而,在视觉定位的背景下,对视点变化的鲁棒性并不一定是理想的。本文的重点是了解图像检索在多个视觉定位任务中的作用。我们引入了一个基准设置,并比较了多个数据集上最先进的检索表示。我们发现经典地标检索/识别任务的检索性能与定位性能仅在部分而不是全部任务中相关。这表明需要专门为本地化任务设计的检索方法。我们的基准和评估协议可在https://github.com/naver/kapture-localization上获得。
{"title":"Benchmarking Image Retrieval for Visual Localization","authors":"No'e Pion, M. Humenberger, G. Csurka, Yohann Cabon, Torsten Sattler","doi":"10.1109/3DV50981.2020.00058","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00058","url":null,"abstract":"Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two tasks: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for these tasks. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes. However, robustness to viewpoint changes is not necessarily desirable in the context of visual localization. This paper focuses on understanding the role of image retrieval for multiple visual localization tasks. We introduce a benchmark setup and compare state-of-the-art retrieval representations on multiple datasets. We show that retrieval performance on classical landmark retrieval/recognition tasks correlates only for some but not all tasks to localization performance. This indicates a need for retrieval approaches specifically designed for localization tasks. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132479216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Fast Simultaneous Gravitational Alignment of Multiple Point Sets 多点集快速同步重力对准
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00019
Vladislav Golyanik, Soshi Shimada, C. Theobalt
The problem of simultaneous rigid alignment of multiple unordered point sets which is unbiased towards any of the inputs has recently attracted increasing interest, and several reliable methods have been newly proposed. While being remarkably robust towards noise and clustered outliers, current approaches require sophisticated initialisation schemes and do not scale well to large point sets. This paper proposes a new resilient technique for simultaneous registration of multiple point sets by interpreting the latter as particle swarms rigidly moving in the mutually induced force fields. Thanks to the improved simulation with altered physical laws and acceleration of globally multiply-linked point interactions with a 2D-tree (D is the space dimensionality), our Multi-Body Gravitational Approach (MBGA) is robust to noise and missing data while supporting more massive point sets than previous methods (with 105 points and more). In various experimental settings, MBGA is shown to outperform several baseline point set alignment approaches in terms of accuracy and runtime. We make our source code available for the community to facilitate the reproducibility of the results1.1http://gvv.mpi-inf.mpg.de/projects/MBGA/
对任意输入无偏的多个无序点集的同时刚性对准问题近年来引起了人们越来越多的关注,并提出了一些可靠的方法。虽然对噪声和聚类异常值具有显著的鲁棒性,但目前的方法需要复杂的初始化方案,并且不能很好地扩展到大型点集。将多点集解释为在相互感应力场中刚性运动的粒子群,提出了一种新的多点集同时配准的弹性技术。由于改进了与改变物理定律的模拟和与2d树(D是空间维度)的全局多重链接点相互作用的加速,我们的多体引力方法(MBGA)对噪声和缺失数据具有鲁棒性,同时支持比以前的方法更大的点集(有105个点或更多)。在各种实验设置中,MBGA在准确性和运行时间方面优于几种基线点集对齐方法。我们将源代码提供给社区,以方便结果的再现。http://gvv.mppi -inf.mpg.de/projects/MBGA/
{"title":"Fast Simultaneous Gravitational Alignment of Multiple Point Sets","authors":"Vladislav Golyanik, Soshi Shimada, C. Theobalt","doi":"10.1109/3DV50981.2020.00019","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00019","url":null,"abstract":"The problem of simultaneous rigid alignment of multiple unordered point sets which is unbiased towards any of the inputs has recently attracted increasing interest, and several reliable methods have been newly proposed. While being remarkably robust towards noise and clustered outliers, current approaches require sophisticated initialisation schemes and do not scale well to large point sets. This paper proposes a new resilient technique for simultaneous registration of multiple point sets by interpreting the latter as particle swarms rigidly moving in the mutually induced force fields. Thanks to the improved simulation with altered physical laws and acceleration of globally multiply-linked point interactions with a 2D-tree (D is the space dimensionality), our Multi-Body Gravitational Approach (MBGA) is robust to noise and missing data while supporting more massive point sets than previous methods (with 105 points and more). In various experimental settings, MBGA is shown to outperform several baseline point set alignment approaches in terms of accuracy and runtime. We make our source code available for the community to facilitate the reproducibility of the results1.1http://gvv.mpi-inf.mpg.de/projects/MBGA/","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132904503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Transformer-Based Network for Dynamic Hand Gesture Recognition 基于变压器的动态手势识别网络
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00072
Andrea D'Eusanio, A. Simoni, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara
Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.
基于变压器的神经网络代表了一种成功的自注意机制,它在语言理解和序列建模方面取得了最先进的结果。然而,它们在视觉数据中的应用,特别是在动态手势识别任务中的应用尚未得到深入研究。在本文中,我们提出了一种基于变压器的动态手势识别架构。我们表明,使用单个主动深度传感器,特别是使用深度图和从深度图中估计的表面法线,实现了最先进的结果,克服了文献中在两个汽车数据集(即NVidia Dynamic Hand Gesture和Briareo)上可用的所有方法。此外,我们用常见RGB-D设备提供的其他数据类型(如红外和彩色数据)测试了该方法。我们还从推理时间和参数数量方面评估了性能,表明所提出的框架适用于在线车载信息娱乐系统。
{"title":"A Transformer-Based Network for Dynamic Hand Gesture Recognition","authors":"Andrea D'Eusanio, A. Simoni, S. Pini, G. Borghi, R. Vezzani, R. Cucchiara","doi":"10.1109/3DV50981.2020.00072","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00072","url":null,"abstract":"Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133279379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2020 International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1