首页 > 最新文献

2016 Fourth International Conference on 3D Vision (3DV)最新文献

英文 中文
Single-Shot Time-of-Flight Phase Unwrapping Using Two Modulation Frequencies 使用两个调制频率的单次飞行时间相位展开
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.74
Changpeng Ti, Ruigang Yang, James Davis
We present a novel phase unwrapping framework for the Time-of-Flight sensor that can match the performance of systems using two modulation frequencies, within a single shot. Our framework is based on an interleaved pixel arrangement, where a pixel measures phase at a different modulation frequency from its neighboring pixels. We demonstrate that: (1) it is practical to capture ToF images that contain phases from two frequencies in a single shot, with no loss in signal fidelity, (2) phase unwrapping can be effectively performed on such an interleaved phase image, and (3) our method preserves the original spatial resolution. We find that the output of our framework is comparable to results using two shots under separate modulation frequencies, and is significantly better than using a single modulation frequency.
我们提出了一种新的飞行时间传感器相位展开框架,该框架可以在单次射击中匹配使用两个调制频率的系统的性能。我们的框架基于交错像素排列,其中像素以不同的调制频率测量与其相邻像素的相位。我们证明:(1)在一次拍摄中捕获包含两个频率相位的ToF图像是可行的,而信号保真度没有损失;(2)可以在这样的交错相位图像上有效地进行相位解包裹;(3)我们的方法保留了原始的空间分辨率。我们发现,我们的框架的输出与在单独的调制频率下使用两个镜头的结果相当,并且明显优于使用单个调制频率。
{"title":"Single-Shot Time-of-Flight Phase Unwrapping Using Two Modulation Frequencies","authors":"Changpeng Ti, Ruigang Yang, James Davis","doi":"10.1109/3DV.2016.74","DOIUrl":"https://doi.org/10.1109/3DV.2016.74","url":null,"abstract":"We present a novel phase unwrapping framework for the Time-of-Flight sensor that can match the performance of systems using two modulation frequencies, within a single shot. Our framework is based on an interleaved pixel arrangement, where a pixel measures phase at a different modulation frequency from its neighboring pixels. We demonstrate that: (1) it is practical to capture ToF images that contain phases from two frequencies in a single shot, with no loss in signal fidelity, (2) phase unwrapping can be effectively performed on such an interleaved phase image, and (3) our method preserves the original spatial resolution. We find that the output of our framework is comparable to results using two shots under separate modulation frequencies, and is significantly better than using a single modulation frequency.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Plane-Based Calibration of Multiple Non-Overlapping Cameras 基于鲁棒平面的多相机非重叠标定
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.73
Chen Zhu, Zihan Zhou, Ziran Xing, Yanbing Dong, Yi Ma, Jingyi Yu
The availability of commodity multi-camera systems such as Google Jump, Jaunt, and Lytro Immerge have brought new demand for reliable and efficient extrinsic camera calibration. State-of-the-art solutions generally require that adjacent, if not all, cameras observe a common area or employ known scene structures. In this paper, we present a novel multi-camera calibration technique that eliminates such requirements. Our approach extends the single-pair hand-eye calibration used in robotics to multi-camera systems. Specifically, we make use of (possibly unknown) planar structures in the scene and combine plane-based structure from motion, camera pose estimation, and task-specific bundle adjustment for extrinsic calibration. Experiments on several multi-camera setups demonstrate that our scheme is highly accurate, robust, and efficient.
谷歌Jump、Jaunt和Lytro Immerge等商用多相机系统的出现,为可靠、高效的外部相机校准带来了新的需求。最先进的解决方案通常要求相邻的(如果不是全部的话)摄像机观察公共区域或使用已知的场景结构。在本文中,我们提出了一种新的多相机校准技术,消除了这些要求。我们的方法将机器人中使用的单对手眼校准扩展到多相机系统。具体来说,我们在场景中使用(可能未知的)平面结构,并结合来自运动、相机姿态估计和任务特定束调整的平面结构进行外部校准。实验结果表明,该方法具有较高的精度、鲁棒性和有效性。
{"title":"Robust Plane-Based Calibration of Multiple Non-Overlapping Cameras","authors":"Chen Zhu, Zihan Zhou, Ziran Xing, Yanbing Dong, Yi Ma, Jingyi Yu","doi":"10.1109/3DV.2016.73","DOIUrl":"https://doi.org/10.1109/3DV.2016.73","url":null,"abstract":"The availability of commodity multi-camera systems such as Google Jump, Jaunt, and Lytro Immerge have brought new demand for reliable and efficient extrinsic camera calibration. State-of-the-art solutions generally require that adjacent, if not all, cameras observe a common area or employ known scene structures. In this paper, we present a novel multi-camera calibration technique that eliminates such requirements. Our approach extends the single-pair hand-eye calibration used in robotics to multi-camera systems. Specifically, we make use of (possibly unknown) planar structures in the scene and combine plane-based structure from motion, camera pose estimation, and task-specific bundle adjustment for extrinsic calibration. Experiments on several multi-camera setups demonstrate that our scheme is highly accurate, robust, and efficient.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129800075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Depth Restoration Occlusionless Temporal Dataset 一个深度恢复无遮挡时间数据集
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.26
Daniel Rotman, Guy Gilboa
Depth restoration, the task of correcting depth noise and artifacts, has recently risen in popularity due to the increase in commodity depth cameras. When assessing the quality of existing methods, most researchers resort to the popular Middlebury dataset, however, this dataset was not created for depth enhancement, and therefore lacks the option of comparing genuine low-quality depth images with their high-quality, ground-truth counterparts. To address this shortcoming, we present the Depth Restoration Occlusionless Temporal (DROT) dataset. This dataset offers real depth sensor input coupled with registered pixel-to-pixel color images, and the ground-truth depth to which we wish to compare. Our dataset includes not only Kinect 1 and Kinect 2 data, but also an Intel R200 sensor intended for integration into hand-held devices. Beyond this, we present a new temporal depth-restoration method. Utilizing multiple frames, we create a number of possibilities for an initial degraded depth map, which allows us to arrive at a more educated decision when refining depth images. Evaluating this method with our dataset shows significant benefits, particularly for overcoming real sensor-noise artifacts.
深度恢复,纠正深度噪声和伪影的任务,最近由于商品深度相机的增加而越来越受欢迎。在评估现有方法的质量时,大多数研究人员采用流行的Middlebury数据集,然而,该数据集不是为深度增强而创建的,因此缺乏将真正的低质量深度图像与高质量的地面真实图像进行比较的选项。为了解决这一缺点,我们提出了深度恢复无遮挡时间(DROT)数据集。该数据集提供了真实的深度传感器输入,加上注册的像素到像素的彩色图像,以及我们希望比较的真实深度。我们的数据集不仅包括Kinect 1和Kinect 2数据,还包括一个用于集成到手持设备的英特尔R200传感器。在此基础上,提出了一种新的时间深度恢复方法。利用多帧,我们为初始退化深度图创建了许多可能性,这使我们能够在细化深度图像时做出更有根据的决定。用我们的数据集评估这种方法显示出显著的好处,特别是在克服真实的传感器噪声伪影方面。
{"title":"A Depth Restoration Occlusionless Temporal Dataset","authors":"Daniel Rotman, Guy Gilboa","doi":"10.1109/3DV.2016.26","DOIUrl":"https://doi.org/10.1109/3DV.2016.26","url":null,"abstract":"Depth restoration, the task of correcting depth noise and artifacts, has recently risen in popularity due to the increase in commodity depth cameras. When assessing the quality of existing methods, most researchers resort to the popular Middlebury dataset, however, this dataset was not created for depth enhancement, and therefore lacks the option of comparing genuine low-quality depth images with their high-quality, ground-truth counterparts. To address this shortcoming, we present the Depth Restoration Occlusionless Temporal (DROT) dataset. This dataset offers real depth sensor input coupled with registered pixel-to-pixel color images, and the ground-truth depth to which we wish to compare. Our dataset includes not only Kinect 1 and Kinect 2 data, but also an Intel R200 sensor intended for integration into hand-held devices. Beyond this, we present a new temporal depth-restoration method. Utilizing multiple frames, we create a number of possibilities for an initial degraded depth map, which allows us to arrive at a more educated decision when refining depth images. Evaluating this method with our dataset shows significant benefits, particularly for overcoming real sensor-noise artifacts.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Quaternionic Upsampling: Hyperspherical Techniques for 6 DoF Pose Tracking 四元数上采样:超球面技术的6自由度姿态跟踪
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.71
Benjamin Busam, M. Esposito, B. Frisch, Nassir Navab
Fast real-time tracking is an integral component of modern 3D computer vision pipelines. Despite their advantages in accuracy and reliability, optical trackers suffer from limited acquisition rates depending either on intrinsic sensor capabilities or physical limitations such as exposure time. Moreover, data transmission and image processing produce latency in the pose stream. We introduce quaternionic upsampling to overcome these problems. The technique models the pose parameters as points on multidimensional hyperspheres in (dual) quaternion space. In order to upsample the pose stream, we present several methods to sample points on geodesics and piecewise continuous curves on these manifolds and compare them regarding accuracy and computation efficiency. With the unified approach of quaternionic upsampling, both interpolation and extrapolation in pose space can be done by continuous linear variation of only one sampling parameter. Since the method can be implemented rather efficiently, pose rates of over 4 kHz and future pose predictions with an accuracy of 128 μm and 0.5° are possible in real-time. The method does not depend on a special tracking algorithm and can thus be used for any arbitrary 3 DoF or 6 DoF rotation or pose tracking system.
快速实时跟踪是现代三维计算机视觉管道的重要组成部分。尽管光学跟踪器在准确性和可靠性方面具有优势,但由于固有传感器能力或曝光时间等物理限制,其采集率有限。此外,数据传输和图像处理在姿态流中产生延迟。我们引入四元元上采样来克服这些问题。该技术将位姿参数建模为对偶四元数空间多维超球上的点。为了对姿态流进行上采样,我们提出了几种方法对测地线上的点和这些流形上的分段连续曲线进行采样,并比较了它们的精度和计算效率。采用统一的四元数上采样方法,仅需一个采样参数的连续线性变化即可实现位姿空间内插和外推。由于该方法可以非常有效地实现,因此可以实时实现超过4 kHz的位姿率和128 μm和0.5°精度的未来位姿预测。该方法不依赖于特殊的跟踪算法,因此可以用于任意3自由度或6自由度的旋转或姿态跟踪系统。
{"title":"Quaternionic Upsampling: Hyperspherical Techniques for 6 DoF Pose Tracking","authors":"Benjamin Busam, M. Esposito, B. Frisch, Nassir Navab","doi":"10.1109/3DV.2016.71","DOIUrl":"https://doi.org/10.1109/3DV.2016.71","url":null,"abstract":"Fast real-time tracking is an integral component of modern 3D computer vision pipelines. Despite their advantages in accuracy and reliability, optical trackers suffer from limited acquisition rates depending either on intrinsic sensor capabilities or physical limitations such as exposure time. Moreover, data transmission and image processing produce latency in the pose stream. We introduce quaternionic upsampling to overcome these problems. The technique models the pose parameters as points on multidimensional hyperspheres in (dual) quaternion space. In order to upsample the pose stream, we present several methods to sample points on geodesics and piecewise continuous curves on these manifolds and compare them regarding accuracy and computation efficiency. With the unified approach of quaternionic upsampling, both interpolation and extrapolation in pose space can be done by continuous linear variation of only one sampling parameter. Since the method can be implemented rather efficiently, pose rates of over 4 kHz and future pose predictions with an accuracy of 128 μm and 0.5° are possible in real-time. The method does not depend on a special tracking algorithm and can thus be used for any arbitrary 3 DoF or 6 DoF rotation or pose tracking system.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Synthetic Prior Design for Real-Time Face Tracking 实时人脸跟踪的综合先验设计
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.72
Steven G. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, Iain Matthews, Kenny Mitchell
Real-time facial performance capture has recently been gaining popularity in virtual film production, driven by advances in machine learning, which allows for fast inference of facial geometry from video streams. These learning-based approaches are significantly influenced by the quality and amount of labelled training data. Tedious construction of training sets from real imagery can be replaced by rendering a facial animation rig under on-set conditions expected at runtime. We learn a synthetic actor-specific prior by adapting a state-of-the-art facial tracking method. Synthetic training significantly reduces the capture and annotation burden and in theory allows generation of an arbitrary amount of data. But practical realities such as training time and compute resources still limit the size of any training set. We construct better and smaller training sets by investigating which facial image appearances are crucial for tracking accuracy, covering the dimensions of expression, viewpoint and illumination. A reduction of training data in 1-2 orders of magnitude is demonstrated whilst tracking accuracy is retained for challenging on-set footage.
由于机器学习的进步,实时面部表现捕捉最近在虚拟电影制作中越来越受欢迎,机器学习允许从视频流中快速推断面部几何形状。这些基于学习的方法受到标记训练数据的质量和数量的显著影响。从真实图像中构建训练集的繁琐工作可以通过在运行时预期的设置条件下渲染面部动画来取代。我们通过采用最先进的面部跟踪方法来学习合成的特定于演员的先验。综合训练显著地减少了捕获和注释的负担,并且理论上允许生成任意数量的数据。但实际情况,如训练时间和计算资源,仍然限制任何训练集的大小。我们通过研究哪些面部图像外观对跟踪精度至关重要来构建更好和更小的训练集,涵盖表情,视点和照明的维度。在1-2个数量级的训练数据的减少被证明,同时跟踪精度保留具有挑战性的集镜头。
{"title":"Synthetic Prior Design for Real-Time Face Tracking","authors":"Steven G. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, Iain Matthews, Kenny Mitchell","doi":"10.1109/3DV.2016.72","DOIUrl":"https://doi.org/10.1109/3DV.2016.72","url":null,"abstract":"Real-time facial performance capture has recently been gaining popularity in virtual film production, driven by advances in machine learning, which allows for fast inference of facial geometry from video streams. These learning-based approaches are significantly influenced by the quality and amount of labelled training data. Tedious construction of training sets from real imagery can be replaced by rendering a facial animation rig under on-set conditions expected at runtime. We learn a synthetic actor-specific prior by adapting a state-of-the-art facial tracking method. Synthetic training significantly reduces the capture and annotation burden and in theory allows generation of an arbitrary amount of data. But practical realities such as training time and compute resources still limit the size of any training set. We construct better and smaller training sets by investigating which facial image appearances are crucial for tracking accuracy, covering the dimensions of expression, viewpoint and illumination. A reduction of training data in 1-2 orders of magnitude is demonstrated whilst tracking accuracy is retained for challenging on-set footage.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126399742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Robust Tracking in Low Light and Sudden Illumination Changes 弱光和突然光照变化下的鲁棒跟踪
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.48
Hatem Alismail, Brett Browning, S. Lucey
We present an algorithm for robust and real-time visual tracking under challenging illumination conditions characterized by poor lighting as well as sudden and drastic changes in illumination. Robustness is achieved by adapting illumination-invariant binary descriptors to dense image alignment using the Lucas and Kanade algorithm. The proposed adaptation preserves the Hamming distance under least-squares minimization, thus preserving the photometric invariance properties of binary descriptors. Due to the compactness of the descriptor, the algorithm runs in excess of 400 fps on laptops and 100 fps on mobile devices.
我们提出了一种算法,在具有挑战性的照明条件下进行鲁棒和实时的视觉跟踪,其特征是光照不足以及光照的突然和剧烈变化。鲁棒性是通过使用Lucas和Kanade算法使光照不变二进制描述符适应密集图像对齐来实现的。该方法保留了最小二乘最小化条件下的汉明距离,从而保持了二元描述符的光度不变性。由于描述符的紧凑性,该算法在笔记本电脑上的运行速度超过400 fps,在移动设备上的运行速度超过100 fps。
{"title":"Robust Tracking in Low Light and Sudden Illumination Changes","authors":"Hatem Alismail, Brett Browning, S. Lucey","doi":"10.1109/3DV.2016.48","DOIUrl":"https://doi.org/10.1109/3DV.2016.48","url":null,"abstract":"We present an algorithm for robust and real-time visual tracking under challenging illumination conditions characterized by poor lighting as well as sudden and drastic changes in illumination. Robustness is achieved by adapting illumination-invariant binary descriptors to dense image alignment using the Lucas and Kanade algorithm. The proposed adaptation preserves the Hamming distance under least-squares minimization, thus preserving the photometric invariance properties of binary descriptors. Due to the compactness of the descriptor, the algorithm runs in excess of 400 fps on laptops and 100 fps on mobile devices.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Model-Based Outdoor Performance Capture 基于模型的户外表现捕捉
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.25
Nadia Robertini, D. Casas, Helge Rhodin, H. Seidel, C. Theobalt
We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.
我们提出了一种新的基于模型的方法来准确地重建在多摄像机设置下室外拍摄的人类表演。从角色模型的模板出发,我们引入了一种新的统一的隐式表示,用于铰接骨架跟踪和非刚性表面形状的细化。该方法分两个阶段将模板拟合到未分割的视频帧中,首先对粗骨架姿态进行估计,然后对非刚性表面形状和身体姿态进行联合细化。特别是对于表面形状优化,我们提出了一种新的3D高斯组合,旨在对齐投影模型与可能的轮廓轮廓,而不需要明确的分割或边缘检测。我们在室外环境中获得了比现有方法高得多的质量重建,并表明我们在室内场景中与最先进的方法相媲美。
{"title":"Model-Based Outdoor Performance Capture","authors":"Nadia Robertini, D. Casas, Helge Rhodin, H. Seidel, C. Theobalt","doi":"10.1109/3DV.2016.25","DOIUrl":"https://doi.org/10.1109/3DV.2016.25","url":null,"abstract":"We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133183116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Multiview RGB-D Dataset for Object Instance Detection 用于对象实例检测的多视图RGB-D数据集
Pub Date : 2016-09-26 DOI: 10.1109/3DV.2016.52
G. Georgakis, Md. Alimoor Reza, Arsalan Mousavian, P. Le, J. Kosecka
This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: (i) a new multi-view 3D proposal generation method and (ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.
本文提出了一个新的多视图RGB-D数据集,包含9个厨房场景,每个场景包含现实混乱环境中的几个对象,其中包括来自BigBird数据集的对象子集。对场景的视点进行密集采样,并用边界框和三维点云对场景中的物体进行标注。此外,还提出了一种检测和识别方法,该方法由两部分组成:(i)一种新的多视图3D提案生成方法;(ii)使用AlexNet开发几个识别基线来对我们的提案进行评分,该基线可以在数据集的作物上进行训练,也可以在综合合成的训练图像上进行训练。最后,我们将目标建议和检测基线的性能与华盛顿RGB-D场景(WRGB-D)数据集进行了比较,并证明我们的厨房场景数据集对于目标检测和识别更具挑战性。该数据集可从http://cs.gmu.edu/~robot/gmu-kitchens.html获取。
{"title":"Multiview RGB-D Dataset for Object Instance Detection","authors":"G. Georgakis, Md. Alimoor Reza, Arsalan Mousavian, P. Le, J. Kosecka","doi":"10.1109/3DV.2016.52","DOIUrl":"https://doi.org/10.1109/3DV.2016.52","url":null,"abstract":"This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: (i) a new multi-view 3D proposal generation method and (ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128180600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Fast Single Shot Detection and Pose Estimation 快速单镜头检测和姿态估计
Pub Date : 2016-09-19 DOI: 10.1109/3DV.2016.78
Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, J. Kosecka, A. Berg
For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.
在导航和机器人应用中,估计物体的三维姿态与检测同样重要。许多姿态估计方法依赖于检测或跟踪部件或关键点[11,21]。在本文中,我们建立了一个最新的最先进的卷积网络,用于滑动窗口检测[10],在单个镜头中提供检测和粗略的姿态估计,而不需要检测部件或初始边界框的中间阶段。虽然这不是第一个将姿态估计视为分类问题的系统,但这是第一次尝试使用深度学习方法将检测和姿态估计结合在同一层次上。该架构的关键是一个深度卷积网络,其中对象类别存在的分数,其位置的偏移量和近似姿态都是在图像中位置的规则网格上估计的。由此产生的系统与最近在姿态估计上的工作一样准确(42.4% 8在Pascal 3D+[21]上查看mAVP),并且明显更快(在TITAN X GPU上每秒46帧(FPS))。这种检测和粗略姿态估计方法足够快速和准确,可以广泛应用于高精度姿态估计、目标跟踪和定位以及vSLAM等任务的预处理步骤。
{"title":"Fast Single Shot Detection and Pose Estimation","authors":"Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, J. Kosecka, A. Berg","doi":"10.1109/3DV.2016.78","DOIUrl":"https://doi.org/10.1109/3DV.2016.78","url":null,"abstract":"For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132554540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Consistent Discretization and Minimization of the L1 Norm on Manifolds 流形上L1范数的一致离散化与最小化
Pub Date : 2016-09-18 DOI: 10.1109/3DV.2016.53
A. Bronstein, Yoni Choukroun, R. Kimmel, Matan Sela
The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes [14]. The continuous L1 norm on the manifold is often replaced by the vector ℓ1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed ℓ2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.
在过去的二十年中,L1范数由于其促进稀疏性的特性而在信号和图像处理中非常流行。最近,它的推广到非欧几里得域已发现有用的形状分析应用。例如,结合Dirichlet能量的最小化,它被证明可以产生紧支撑的准调和正交基,称为压缩流形模态[14]。流形上的连续L1范数通常被应用于采样函数的向量L1范数所取代。我们表明,这种方法是不正确的,因为它不能一致地离散连续范数,并警告其对特定采样的敏感性。我们提出了两种可选的离散化方法,从而得到迭代加权的l2范数。我们在压缩模态问题上演示了所提出的策略,该策略将压缩模态问题简化为一系列简单的特征分解问题,不需要在Stiefel流形上进行非凸优化,并且产生更稳定和准确的结果。
{"title":"Consistent Discretization and Minimization of the L1 Norm on Manifolds","authors":"A. Bronstein, Yoni Choukroun, R. Kimmel, Matan Sela","doi":"10.1109/3DV.2016.53","DOIUrl":"https://doi.org/10.1109/3DV.2016.53","url":null,"abstract":"The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes [14]. The continuous L1 norm on the manifold is often replaced by the vector ℓ1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed ℓ2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2016 Fourth International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1