首页 > 最新文献

2020 International Conference on 3D Vision (3DV)最新文献

英文 中文
RotPredictor: Unsupervised Canonical Viewpoint Learning for Point Cloud Classification RotPredictor:用于点云分类的无监督规范视点学习
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00109
Jin Fang, Dingfu Zhou, Xibin Song, Sheng Jin, Ruigang Yang, Liangjun Zhang
Recently, significant progress has been achieved in analyzing the 3D point cloud with deep learning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deep learning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.
近年来,深度学习技术在三维点云分析方面取得了重大进展。然而,现有网络对输入点云的任意旋转泛化和鲁棒性较差。与传统的通过数据增强或专门设计的球面表示或基于谐波的核来提高旋转鲁棒性的策略不同,我们提出将点云旋转成一个规范的视点,以促进后续的目标任务,如物体分类和零件分割。具体来说,典型视点由网络RotPredictor以无监督的方式进行预测,损失函数仅建立在目标任务上。我们的RotPredictor近似满足式(3)中的旋转等方差性质,预测输出与应用的旋转变换具有线性关系。此外,RotPredictor是一个独立的即插即用模块,可以被任何基于点的深度学习框架使用,而不需要额外的负担。在公共模型分类数据集ModelNet40上的实验结果表明,集成所提出的模块可以提高所有基线的性能。此外,通过添加我们提出的模块,我们可以在旋转增强的ModelNet40基准上实现90.2%的最先进的分类精度。
{"title":"RotPredictor: Unsupervised Canonical Viewpoint Learning for Point Cloud Classification","authors":"Jin Fang, Dingfu Zhou, Xibin Song, Sheng Jin, Ruigang Yang, Liangjun Zhang","doi":"10.1109/3DV50981.2020.00109","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00109","url":null,"abstract":"Recently, significant progress has been achieved in analyzing the 3D point cloud with deep learning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deep learning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122999187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Time Shifted IMU Preintegration for Temporal Calibration in Incremental Visual-Inertial Initialization 基于时移IMU预积分的增量视觉惯性初始化时序标定
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00027
Bruno Petit, Richard Guillemard, V. Gay-Bellile
Tightly coupled Visual-Inertial SLAM (VISLAM) algorithms are now state of the art approaches for indoor localization. There are many implementations of VISLAM, like filter-based and non-linear optimization based algorithms. They all require an accurate temporal alignment between sensors clock and an initial IMU state gyroscope and accelerometer biases value, gravity direction and initial velocity) for precise localization. In this paper we propose an initialization procedure of VISLAM that estimates simultaneously IMU-camera temporal calibration and the initial IMU state. To this end, the concept of Time Shifted IMU Preintegration} (TSIP) measurements is introduced. an interpolation of IMU preintegration that takes into account the effect of sensors clock misalignment. These TSIP measurements are included along with visual odometry measurements in a graph that is incrementally optimized. It results in a real time, accurate and robust initialization for VISLAM as demonstrated in the experiments on real data.
紧密耦合视觉惯性SLAM (VISLAM)算法是目前室内定位的最先进方法。VISLAM有许多实现,如基于过滤器和基于非线性优化的算法。它们都需要在传感器时钟和初始IMU状态(陀螺仪和加速度计偏差值,重力方向和初始速度)之间进行精确的时间对齐,以进行精确定位。本文提出了一种同时估计IMU摄像机时间定标和IMU初始状态的VISLAM初始化方法。为此,引入了时移IMU预积分(TSIP)测量的概念。一种考虑了传感器时钟偏差影响的IMU预积分插值方法。这些TSIP测量与视觉里程计测量一起包含在增量优化的图形中。在实际数据上的实验表明,该方法可以实现实时、准确和鲁棒的初始化。
{"title":"Time Shifted IMU Preintegration for Temporal Calibration in Incremental Visual-Inertial Initialization","authors":"Bruno Petit, Richard Guillemard, V. Gay-Bellile","doi":"10.1109/3DV50981.2020.00027","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00027","url":null,"abstract":"Tightly coupled Visual-Inertial SLAM (VISLAM) algorithms are now state of the art approaches for indoor localization. There are many implementations of VISLAM, like filter-based and non-linear optimization based algorithms. They all require an accurate temporal alignment between sensors clock and an initial IMU state gyroscope and accelerometer biases value, gravity direction and initial velocity) for precise localization. In this paper we propose an initialization procedure of VISLAM that estimates simultaneously IMU-camera temporal calibration and the initial IMU state. To this end, the concept of Time Shifted IMU Preintegration} (TSIP) measurements is introduced. an interpolation of IMU preintegration that takes into account the effect of sensors clock misalignment. These TSIP measurements are included along with visual odometry measurements in a graph that is incrementally optimized. It results in a real time, accurate and robust initialization for VISLAM as demonstrated in the experiments on real data.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115357023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KeystoneDepth: History in 3D KeystoneDepth: 3D历史
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00056
Xuan Luo, Yanmeng Kong, Jason Lawrence, Ricardo Martin-Brualla, S. Seitz
This paper introduces KeystoneDepth, the largest and most diverse collection of rectified historical stereo image pairs to date, consisting of tens of thousands of stereographs of people, events, objects, and scenes recorded between 1864 and 1966. Leveraging the Keystone-Mast Collection of stereographs from the California Museum of Photography, we apply multiple processing steps to produce clean stereo image pairs, complete with calibration data, rectification transforms, and disparity maps. We introduce a novel stereo rectification technique based on the unique properties of antique stereo cameras. To better visualize the results on 2D displays, we also introduce a self-supervised deep view synthesis technique trained on historical imagery. Our dataset is available at http://keystonedepth.cs.washington.edu/.
本文介绍了KeystoneDepth,这是迄今为止最大和最多样化的校正历史立体图像对的集合,包括1864年至1966年间记录的数万人,事件,物体和场景的立体照片。利用来自加利福尼亚摄影博物馆的Keystone-Mast系列立体照片,我们应用多个处理步骤来生成干净的立体图像对,并完成校准数据,校正变换和视差图。基于古董立体相机的独特特性,提出了一种新的立体整流技术。为了在2D显示器上更好地显示结果,我们还引入了一种基于历史图像训练的自监督深度视图合成技术。我们的数据集可以在http://keystonedepth.cs.washington.edu/上找到。
{"title":"KeystoneDepth: History in 3D","authors":"Xuan Luo, Yanmeng Kong, Jason Lawrence, Ricardo Martin-Brualla, S. Seitz","doi":"10.1109/3DV50981.2020.00056","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00056","url":null,"abstract":"This paper introduces KeystoneDepth, the largest and most diverse collection of rectified historical stereo image pairs to date, consisting of tens of thousands of stereographs of people, events, objects, and scenes recorded between 1864 and 1966. Leveraging the Keystone-Mast Collection of stereographs from the California Museum of Photography, we apply multiple processing steps to produce clean stereo image pairs, complete with calibration data, rectification transforms, and disparity maps. We introduce a novel stereo rectification technique based on the unique properties of antique stereo cameras. To better visualize the results on 2D displays, we also introduce a self-supervised deep view synthesis technique trained on historical imagery. Our dataset is available at http://keystonedepth.cs.washington.edu/.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Screen-space Regularization on Differentiable Rasterization 可微栅格化的屏幕空间正则化
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00032
Kunyao Chen, Cheolhong An, Truong Q. Nguyen
Rasterization bridges 3D meshes of a scene and 2D visual appearance on different viewpoints. It plays a vital role in vision and graphics area. Many researches focus on designing a differentiable rasterization and make it compatible with current learning-based frameworks. Although some global-gradient methods achieve promising results, they still ignore one substantial issue existing in most of the situations that the series of 2D silhouettes may not precisely represent the underlying 3D object. To directly tackle this problem, we propose a screen-space regularization method. Unlike the common geometric regularization, our method targets the unbalanced deformation due to the limited viewpoints. By applying the regularization to both multi-view deformation and single-view reconstruction tasks, the proposed method can significantly enhance the visual appearance for the results of a local-gradient differentiable rasterizer, i.e. reducing the visual hull redundancy. Comparing to the state-of-the-art global-gradient method, the proposed method achieves better numerical results with much lower complexity.
栅格化将场景的3D网格和不同视点的2D视觉外观连接起来。它在视觉和图形领域起着至关重要的作用。许多研究都集中在设计一个可微的光栅化,并使其与现有的基于学习的框架兼容。尽管一些全局梯度方法取得了令人满意的结果,但它们仍然忽略了在大多数情况下存在的一个实质性问题,即一系列2D轮廓可能无法精确地表示底层3D对象。为了直接解决这个问题,我们提出了一种屏幕空间正则化方法。不同于一般的几何正则化,该方法针对视点有限导致的不平衡变形。通过将正则化应用于多视图变形和单视图重建任务,该方法可以显著增强局部梯度可微光栅化结果的视觉外观,即减少视觉船体冗余。与目前最先进的全局梯度方法相比,该方法具有较好的数值结果和较低的计算复杂度。
{"title":"Screen-space Regularization on Differentiable Rasterization","authors":"Kunyao Chen, Cheolhong An, Truong Q. Nguyen","doi":"10.1109/3DV50981.2020.00032","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00032","url":null,"abstract":"Rasterization bridges 3D meshes of a scene and 2D visual appearance on different viewpoints. It plays a vital role in vision and graphics area. Many researches focus on designing a differentiable rasterization and make it compatible with current learning-based frameworks. Although some global-gradient methods achieve promising results, they still ignore one substantial issue existing in most of the situations that the series of 2D silhouettes may not precisely represent the underlying 3D object. To directly tackle this problem, we propose a screen-space regularization method. Unlike the common geometric regularization, our method targets the unbalanced deformation due to the limited viewpoints. By applying the regularization to both multi-view deformation and single-view reconstruction tasks, the proposed method can significantly enhance the visual appearance for the results of a local-gradient differentiable rasterizer, i.e. reducing the visual hull redundancy. Comparing to the state-of-the-art global-gradient method, the proposed method achieves better numerical results with much lower complexity.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114371941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PanoNet3D: Combining Semantic and Geometric Understanding for LiDAR Point Cloud Detection PanoNet3D:结合语义和几何理解的激光雷达点云检测
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00085
Xia Chen, Jianren Wang, David Held, M. Hebert
Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure. Semantics come from the appearance and context of objects to the sensor, while geometric structure is the actual 3D shape of point clouds. Most detectors on LiDAR point clouds focus only on analyzing the geometric structure of objects in real 3D space. Unlike previous works, we propose to learn both semantic feature and geometric structure via a unified multi-view framework. Our method exploits the nature of LiDAR scans – 2D range images, and applies well-studied 2D convolutions to extract semantic features. By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin. The methodology of combining semantic and geometric features provides a unique perspective of looking at the problems in real-world 3D point cloud detection.
自动驾驶感知中的视觉数据,如摄像头图像和LiDAR点云,可以理解为语义特征和几何结构两方面的混合。语义来自物体的外观和上下文,而几何结构是点云的实际三维形状。大多数激光雷达点云探测器只专注于分析真实三维空间中物体的几何结构。与以往的工作不同,我们提出通过统一的多视图框架来学习语义特征和几何结构。我们的方法利用了激光雷达扫描的本质-二维距离图像,并应用经过充分研究的二维卷积来提取语义特征。通过融合语义和几何特征,我们的方法在所有类别中都大大优于最先进的方法。结合语义和几何特征的方法提供了一个独特的视角来看待现实世界中的三维点云检测问题。
{"title":"PanoNet3D: Combining Semantic and Geometric Understanding for LiDAR Point Cloud Detection","authors":"Xia Chen, Jianren Wang, David Held, M. Hebert","doi":"10.1109/3DV50981.2020.00085","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00085","url":null,"abstract":"Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure. Semantics come from the appearance and context of objects to the sensor, while geometric structure is the actual 3D shape of point clouds. Most detectors on LiDAR point clouds focus only on analyzing the geometric structure of objects in real 3D space. Unlike previous works, we propose to learn both semantic feature and geometric structure via a unified multi-view framework. Our method exploits the nature of LiDAR scans – 2D range images, and applies well-studied 2D convolutions to extract semantic features. By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin. The methodology of combining semantic and geometric features provides a unique perspective of looking at the problems in real-world 3D point cloud detection.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"327 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127572096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Restoration of Motion Blur in Time-of-Flight Depth Image Using Data Alignment 利用数据对齐恢复飞行时间深度图像中的运动模糊
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00092
Zhuo Chen, Peilin Liu, Fei Wen, Jun Wang, R. Ying
Time-of-flight (ToF) sensors are vulnerable to motion blur in the presence of moving objects. This is due to the principle of ToF camera that it estimates depth from the phase-shift between emitted and received modulated signals. And the phase-shift is measured by four sequential phase-shifted images, which is assumed to be consistent in an integration time. However, object motion would give rise to disparity among the four phase-shifted images, contributing to unreliable depth measurement. In this paper, we propose a novel method that is capable of aligning the four phase-shifted images through investigating the electronic value of each pixel in the phase images. It consists of two steps, motion detecting and deblurring. Furthermore, a refinement utilizing an additional group of phase-shifted images is adopted to further improve the accuracy of depth measurement. Experiment results on a new elaborated dataset with ground-truth demonstrate that the proposed method compares favorably over existing methods in both accuracy and runtime. Particularly, the new method can achieve the best accuracy while being computationally efficient that can support real-time running.
飞行时间(ToF)传感器在存在运动物体时容易受到运动模糊的影响。这是由于ToF相机的原理,它估计深度从发射和接收调制信号之间的相移。通过四幅连续相移图像来测量相移,并假设在积分时间内相移图像是一致的。然而,物体的运动会引起四幅相移图像之间的视差,导致深度测量不可靠。在本文中,我们提出了一种新的方法,能够通过研究相位图像中每个像素的电子值来对齐四相移图像。它包括两个步骤,运动检测和去模糊。此外,利用一组额外的相移图像进行细化,进一步提高深度测量的精度。实验结果表明,该方法在精度和运行时间上都优于现有方法。特别是,该方法在计算效率高、支持实时运行的情况下,可以达到最佳的精度。
{"title":"Restoration of Motion Blur in Time-of-Flight Depth Image Using Data Alignment","authors":"Zhuo Chen, Peilin Liu, Fei Wen, Jun Wang, R. Ying","doi":"10.1109/3DV50981.2020.00092","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00092","url":null,"abstract":"Time-of-flight (ToF) sensors are vulnerable to motion blur in the presence of moving objects. This is due to the principle of ToF camera that it estimates depth from the phase-shift between emitted and received modulated signals. And the phase-shift is measured by four sequential phase-shifted images, which is assumed to be consistent in an integration time. However, object motion would give rise to disparity among the four phase-shifted images, contributing to unreliable depth measurement. In this paper, we propose a novel method that is capable of aligning the four phase-shifted images through investigating the electronic value of each pixel in the phase images. It consists of two steps, motion detecting and deblurring. Furthermore, a refinement utilizing an additional group of phase-shifted images is adopted to further improve the accuracy of depth measurement. Experiment results on a new elaborated dataset with ground-truth demonstrate that the proposed method compares favorably over existing methods in both accuracy and runtime. Particularly, the new method can achieve the best accuracy while being computationally efficient that can support real-time running.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Depth Estimation on 360° Images with a Double Quaternion Loss 基于双四元数损失的360°图像深度估计
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00062
Brandon Yushan Feng, Wangjue Yao, Zhe-Yu Liu, A. Varshney
While 360° images are becoming ubiquitous due to popularity of panoramic content, they cannot directly work with most of the existing depth estimation techniques developed for perspective images. In this paper, we present a deep-learning-based framework of estimating depth from 360° images. We present an adaptive depth refinement procedure that refines depth estimates using normal estimates and pixel-wise uncertainty scores. We introduce double quaternion approximation to combine the loss of the joint estimation of depth and surface normal. Furthermore, we use the double quaternion formulation to also measure stereo consistency between the horizontally displaced depth maps, leading to a new loss function for training a depth estimation CNN. Results show that the new double-quaternion-based loss and the adaptive depth refinement procedure lead to better network performance. Our proposed method can be used with monocular as well as stereo images. When evaluated on several datasets, our method surpasses state-of-the-art methods on most metrics.
虽然由于全景内容的普及,360°图像变得无处不在,但它们不能直接与大多数现有的透视图像深度估计技术一起工作。在本文中,我们提出了一种基于深度学习的框架,用于从360°图像中估计深度。我们提出了一种自适应深度改进程序,该程序使用正态估计和像素不确定性分数来改进深度估计。我们引入双四元数近似来结合深度和表面法线联合估计的损失。此外,我们还使用双四元数公式来测量水平位移深度图之间的立体一致性,从而产生用于训练深度估计CNN的新损失函数。结果表明,新的基于双四元数的损失和自适应深度改进方法可以提高网络性能。该方法既适用于单眼图像,也适用于立体图像。当在几个数据集上进行评估时,我们的方法在大多数指标上超过了最先进的方法。
{"title":"Deep Depth Estimation on 360° Images with a Double Quaternion Loss","authors":"Brandon Yushan Feng, Wangjue Yao, Zhe-Yu Liu, A. Varshney","doi":"10.1109/3DV50981.2020.00062","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00062","url":null,"abstract":"While 360° images are becoming ubiquitous due to popularity of panoramic content, they cannot directly work with most of the existing depth estimation techniques developed for perspective images. In this paper, we present a deep-learning-based framework of estimating depth from 360° images. We present an adaptive depth refinement procedure that refines depth estimates using normal estimates and pixel-wise uncertainty scores. We introduce double quaternion approximation to combine the loss of the joint estimation of depth and surface normal. Furthermore, we use the double quaternion formulation to also measure stereo consistency between the horizontally displaced depth maps, leading to a new loss function for training a depth estimation CNN. Results show that the new double-quaternion-based loss and the adaptive depth refinement procedure lead to better network performance. Our proposed method can be used with monocular as well as stereo images. When evaluated on several datasets, our method surpasses state-of-the-art methods on most metrics.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126449920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
3D-Aware Ellipse Prediction for Object-Based Camera Pose Estimation 基于物体的相机姿态估计的3d感知椭圆预测
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00038
Matthieu Zins, Gilles Simon, M. Berger
In this paper, we propose a method for coarse camera pose computation which is robust to viewing conditions and does not require a detailed model of the scene. This method meets the growing need of easy deployment of robotics or augmented reality applications in any environments, especially those for which no accurate 3D model nor huge amount of ground truth data are available. It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions. Previous works have also shown that abstracting the geometry of a scene of objects by an ellipsoid cloud allows to compute the camera pose accurately enough for various application needs. Though promising, these approaches use the ellipses fitted to the detection bounding boxes as an approximation of the imaged objects. In this paper, we go one step further and propose a learning-based method which detects improved elliptic approximations of objects which are coherent with the 3D ellipsoid in terms of perspective projection. Experiments prove that the accuracy of the computed pose significantly increases thanks to our method and is more robust to the variability of the boundaries of the detection boxes. This is achieved with very little effort in terms of training data acquisition – a few hundred calibrated images of which only three need manual object annotation.
在本文中,我们提出了一种粗糙相机姿态计算方法,该方法对观看条件具有鲁棒性,并且不需要详细的场景模型。该方法满足了在任何环境中轻松部署机器人或增强现实应用的日益增长的需求,特别是那些没有精确的3D模型或大量地面真实数据的环境。它利用深度学习技术的能力,在任何观看条件下都能可靠地检测物体。以前的工作也表明,通过椭球云抽象物体场景的几何形状,可以精确地计算相机姿态,以满足各种应用需求。虽然这些方法很有前途,但它们使用拟合到检测边界框的椭圆作为图像对象的近似值。在本文中,我们更进一步,提出了一种基于学习的方法,从透视投影的角度检测与三维椭球体相关的物体的改进椭圆近似。实验证明,该方法大大提高了姿态计算的精度,并且对检测框边界的可变性具有更强的鲁棒性。这在训练数据获取方面非常容易实现-几百个校准图像中只有三个需要手动对象注释。
{"title":"3D-Aware Ellipse Prediction for Object-Based Camera Pose Estimation","authors":"Matthieu Zins, Gilles Simon, M. Berger","doi":"10.1109/3DV50981.2020.00038","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00038","url":null,"abstract":"In this paper, we propose a method for coarse camera pose computation which is robust to viewing conditions and does not require a detailed model of the scene. This method meets the growing need of easy deployment of robotics or augmented reality applications in any environments, especially those for which no accurate 3D model nor huge amount of ground truth data are available. It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions. Previous works have also shown that abstracting the geometry of a scene of objects by an ellipsoid cloud allows to compute the camera pose accurately enough for various application needs. Though promising, these approaches use the ellipses fitted to the detection bounding boxes as an approximation of the imaged objects. In this paper, we go one step further and propose a learning-based method which detects improved elliptic approximations of objects which are coherent with the 3D ellipsoid in terms of perspective projection. Experiments prove that the accuracy of the computed pose significantly increases thanks to our method and is more robust to the variability of the boundaries of the detection boxes. This is achieved with very little effort in terms of training data acquisition – a few hundred calibrated images of which only three need manual object annotation.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132938697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Learning 3D Faces from Photo-Realistic Facial Synthesis 从逼真的面部合成学习3D面孔
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00096
Ruizhe Wang, Chih-Fan Chen, Hao Peng, Xudong Liu, Xin Li
We present an approach to efficiently learn an accurate and complete 3D face model from a single image. Previous methods heavily rely on 3D Morphable Models to populate the facial shape space as well as an over-simplified shading model for image formulation. By contrast, our method directly augments a large set of 3D faces from a compact collection of facial scans and employs a high-quality rendering engine to synthesize the corresponding photo-realistic facial images. We first use a deep neural network to regress vertex coordinates from the given image and then refine them by a non-rigid deformation process to more accurately capture local shape similarity. We have conducted extensive experiments to demonstrate the superiority of the proposed approach on 2D-to-3D facial shape inference, especially its excellent generalization property on real-world selfie images.
我们提出了一种从单幅图像中高效地学习准确完整的三维人脸模型的方法。以前的方法严重依赖于3D变形模型来填充面部形状空间,以及过度简化的阴影模型用于图像制定。相比之下,我们的方法直接从紧凑的面部扫描集合中增强大量3D面部,并使用高质量的渲染引擎合成相应的逼真的面部图像。我们首先使用深度神经网络从给定图像中回归顶点坐标,然后通过非刚性变形过程对其进行细化,以更准确地捕获局部形状相似性。我们已经进行了大量的实验来证明所提出的方法在2d到3d面部形状推断方面的优越性,特别是其对现实世界自拍图像的出色泛化特性。
{"title":"Learning 3D Faces from Photo-Realistic Facial Synthesis","authors":"Ruizhe Wang, Chih-Fan Chen, Hao Peng, Xudong Liu, Xin Li","doi":"10.1109/3DV50981.2020.00096","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00096","url":null,"abstract":"We present an approach to efficiently learn an accurate and complete 3D face model from a single image. Previous methods heavily rely on 3D Morphable Models to populate the facial shape space as well as an over-simplified shading model for image formulation. By contrast, our method directly augments a large set of 3D faces from a compact collection of facial scans and employs a high-quality rendering engine to synthesize the corresponding photo-realistic facial images. We first use a deep neural network to regress vertex coordinates from the given image and then refine them by a non-rigid deformation process to more accurately capture local shape similarity. We have conducted extensive experiments to demonstrate the superiority of the proposed approach on 2D-to-3D facial shape inference, especially its excellent generalization property on real-world selfie images.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129805881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Structure from Motion with Reliable Resectioning 通过可靠的切除从运动中改善结构
Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00014
Rajbir Kataria, Joseph DeGol, Derek Hoiem
A common cause of failure in structure-from-motion (SfM) is misregistration of images due to visual patterns that occur in more than one scene location. Most work to solve this problem ignores image matches that are inconsistent according to the statistics of the tracks graph, but these methods often need to be tuned for each dataset and can lead to reduced completeness of normally good reconstructions when valid matches are removed. Our key idea is to address ambiguity directly in the reconstruction process by using only a subset of reliable matches to determine resectioning order and the initial pose. We also introduce a new measure of similarity that adjusts the influence of feature matches based on their track length. We show this improves reconstruction robustness for two state-of-the-art SfM algorithms on many diverse datasets.
运动结构(SfM)失败的一个常见原因是由于在多个场景位置出现的视觉模式导致图像配准错误。解决这个问题的大多数工作都忽略了根据轨迹图的统计数据不一致的图像匹配,但是这些方法通常需要针对每个数据集进行调整,并且当有效匹配被删除时,可能导致正常情况下良好重建的完整性降低。我们的关键思想是通过仅使用可靠匹配的子集来确定切除顺序和初始姿态,直接解决重建过程中的歧义。我们还引入了一种新的相似性度量,该度量根据特征匹配的轨迹长度来调整特征匹配的影响。我们表明,这提高了两种最先进的SfM算法在许多不同数据集上的重建鲁棒性。
{"title":"Improving Structure from Motion with Reliable Resectioning","authors":"Rajbir Kataria, Joseph DeGol, Derek Hoiem","doi":"10.1109/3DV50981.2020.00014","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00014","url":null,"abstract":"A common cause of failure in structure-from-motion (SfM) is misregistration of images due to visual patterns that occur in more than one scene location. Most work to solve this problem ignores image matches that are inconsistent according to the statistics of the tracks graph, but these methods often need to be tuned for each dataset and can lead to reduced completeness of normally good reconstructions when valid matches are removed. Our key idea is to address ambiguity directly in the reconstruction process by using only a subset of reliable matches to determine resectioning order and the initial pose. We also introduce a new measure of similarity that adjusts the influence of feature matches based on their track length. We show this improves reconstruction robustness for two state-of-the-art SfM algorithms on many diverse datasets.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128306207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2020 International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1