We present a novel phase unwrapping framework for the Time-of-Flight sensor that can match the performance of systems using two modulation frequencies, within a single shot. Our framework is based on an interleaved pixel arrangement, where a pixel measures phase at a different modulation frequency from its neighboring pixels. We demonstrate that: (1) it is practical to capture ToF images that contain phases from two frequencies in a single shot, with no loss in signal fidelity, (2) phase unwrapping can be effectively performed on such an interleaved phase image, and (3) our method preserves the original spatial resolution. We find that the output of our framework is comparable to results using two shots under separate modulation frequencies, and is significantly better than using a single modulation frequency.
{"title":"Single-Shot Time-of-Flight Phase Unwrapping Using Two Modulation Frequencies","authors":"Changpeng Ti, Ruigang Yang, James Davis","doi":"10.1109/3DV.2016.74","DOIUrl":"https://doi.org/10.1109/3DV.2016.74","url":null,"abstract":"We present a novel phase unwrapping framework for the Time-of-Flight sensor that can match the performance of systems using two modulation frequencies, within a single shot. Our framework is based on an interleaved pixel arrangement, where a pixel measures phase at a different modulation frequency from its neighboring pixels. We demonstrate that: (1) it is practical to capture ToF images that contain phases from two frequencies in a single shot, with no loss in signal fidelity, (2) phase unwrapping can be effectively performed on such an interleaved phase image, and (3) our method preserves the original spatial resolution. We find that the output of our framework is comparable to results using two shots under separate modulation frequencies, and is significantly better than using a single modulation frequency.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The availability of commodity multi-camera systems such as Google Jump, Jaunt, and Lytro Immerge have brought new demand for reliable and efficient extrinsic camera calibration. State-of-the-art solutions generally require that adjacent, if not all, cameras observe a common area or employ known scene structures. In this paper, we present a novel multi-camera calibration technique that eliminates such requirements. Our approach extends the single-pair hand-eye calibration used in robotics to multi-camera systems. Specifically, we make use of (possibly unknown) planar structures in the scene and combine plane-based structure from motion, camera pose estimation, and task-specific bundle adjustment for extrinsic calibration. Experiments on several multi-camera setups demonstrate that our scheme is highly accurate, robust, and efficient.
{"title":"Robust Plane-Based Calibration of Multiple Non-Overlapping Cameras","authors":"Chen Zhu, Zihan Zhou, Ziran Xing, Yanbing Dong, Yi Ma, Jingyi Yu","doi":"10.1109/3DV.2016.73","DOIUrl":"https://doi.org/10.1109/3DV.2016.73","url":null,"abstract":"The availability of commodity multi-camera systems such as Google Jump, Jaunt, and Lytro Immerge have brought new demand for reliable and efficient extrinsic camera calibration. State-of-the-art solutions generally require that adjacent, if not all, cameras observe a common area or employ known scene structures. In this paper, we present a novel multi-camera calibration technique that eliminates such requirements. Our approach extends the single-pair hand-eye calibration used in robotics to multi-camera systems. Specifically, we make use of (possibly unknown) planar structures in the scene and combine plane-based structure from motion, camera pose estimation, and task-specific bundle adjustment for extrinsic calibration. Experiments on several multi-camera setups demonstrate that our scheme is highly accurate, robust, and efficient.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129800075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Depth restoration, the task of correcting depth noise and artifacts, has recently risen in popularity due to the increase in commodity depth cameras. When assessing the quality of existing methods, most researchers resort to the popular Middlebury dataset, however, this dataset was not created for depth enhancement, and therefore lacks the option of comparing genuine low-quality depth images with their high-quality, ground-truth counterparts. To address this shortcoming, we present the Depth Restoration Occlusionless Temporal (DROT) dataset. This dataset offers real depth sensor input coupled with registered pixel-to-pixel color images, and the ground-truth depth to which we wish to compare. Our dataset includes not only Kinect 1 and Kinect 2 data, but also an Intel R200 sensor intended for integration into hand-held devices. Beyond this, we present a new temporal depth-restoration method. Utilizing multiple frames, we create a number of possibilities for an initial degraded depth map, which allows us to arrive at a more educated decision when refining depth images. Evaluating this method with our dataset shows significant benefits, particularly for overcoming real sensor-noise artifacts.
{"title":"A Depth Restoration Occlusionless Temporal Dataset","authors":"Daniel Rotman, Guy Gilboa","doi":"10.1109/3DV.2016.26","DOIUrl":"https://doi.org/10.1109/3DV.2016.26","url":null,"abstract":"Depth restoration, the task of correcting depth noise and artifacts, has recently risen in popularity due to the increase in commodity depth cameras. When assessing the quality of existing methods, most researchers resort to the popular Middlebury dataset, however, this dataset was not created for depth enhancement, and therefore lacks the option of comparing genuine low-quality depth images with their high-quality, ground-truth counterparts. To address this shortcoming, we present the Depth Restoration Occlusionless Temporal (DROT) dataset. This dataset offers real depth sensor input coupled with registered pixel-to-pixel color images, and the ground-truth depth to which we wish to compare. Our dataset includes not only Kinect 1 and Kinect 2 data, but also an Intel R200 sensor intended for integration into hand-held devices. Beyond this, we present a new temporal depth-restoration method. Utilizing multiple frames, we create a number of possibilities for an initial degraded depth map, which allows us to arrive at a more educated decision when refining depth images. Evaluating this method with our dataset shows significant benefits, particularly for overcoming real sensor-noise artifacts.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Busam, M. Esposito, B. Frisch, Nassir Navab
Fast real-time tracking is an integral component of modern 3D computer vision pipelines. Despite their advantages in accuracy and reliability, optical trackers suffer from limited acquisition rates depending either on intrinsic sensor capabilities or physical limitations such as exposure time. Moreover, data transmission and image processing produce latency in the pose stream. We introduce quaternionic upsampling to overcome these problems. The technique models the pose parameters as points on multidimensional hyperspheres in (dual) quaternion space. In order to upsample the pose stream, we present several methods to sample points on geodesics and piecewise continuous curves on these manifolds and compare them regarding accuracy and computation efficiency. With the unified approach of quaternionic upsampling, both interpolation and extrapolation in pose space can be done by continuous linear variation of only one sampling parameter. Since the method can be implemented rather efficiently, pose rates of over 4 kHz and future pose predictions with an accuracy of 128 μm and 0.5° are possible in real-time. The method does not depend on a special tracking algorithm and can thus be used for any arbitrary 3 DoF or 6 DoF rotation or pose tracking system.
{"title":"Quaternionic Upsampling: Hyperspherical Techniques for 6 DoF Pose Tracking","authors":"Benjamin Busam, M. Esposito, B. Frisch, Nassir Navab","doi":"10.1109/3DV.2016.71","DOIUrl":"https://doi.org/10.1109/3DV.2016.71","url":null,"abstract":"Fast real-time tracking is an integral component of modern 3D computer vision pipelines. Despite their advantages in accuracy and reliability, optical trackers suffer from limited acquisition rates depending either on intrinsic sensor capabilities or physical limitations such as exposure time. Moreover, data transmission and image processing produce latency in the pose stream. We introduce quaternionic upsampling to overcome these problems. The technique models the pose parameters as points on multidimensional hyperspheres in (dual) quaternion space. In order to upsample the pose stream, we present several methods to sample points on geodesics and piecewise continuous curves on these manifolds and compare them regarding accuracy and computation efficiency. With the unified approach of quaternionic upsampling, both interpolation and extrapolation in pose space can be done by continuous linear variation of only one sampling parameter. Since the method can be implemented rather efficiently, pose rates of over 4 kHz and future pose predictions with an accuracy of 128 μm and 0.5° are possible in real-time. The method does not depend on a special tracking algorithm and can thus be used for any arbitrary 3 DoF or 6 DoF rotation or pose tracking system.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven G. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, Iain Matthews, Kenny Mitchell
Real-time facial performance capture has recently been gaining popularity in virtual film production, driven by advances in machine learning, which allows for fast inference of facial geometry from video streams. These learning-based approaches are significantly influenced by the quality and amount of labelled training data. Tedious construction of training sets from real imagery can be replaced by rendering a facial animation rig under on-set conditions expected at runtime. We learn a synthetic actor-specific prior by adapting a state-of-the-art facial tracking method. Synthetic training significantly reduces the capture and annotation burden and in theory allows generation of an arbitrary amount of data. But practical realities such as training time and compute resources still limit the size of any training set. We construct better and smaller training sets by investigating which facial image appearances are crucial for tracking accuracy, covering the dimensions of expression, viewpoint and illumination. A reduction of training data in 1-2 orders of magnitude is demonstrated whilst tracking accuracy is retained for challenging on-set footage.
{"title":"Synthetic Prior Design for Real-Time Face Tracking","authors":"Steven G. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, Iain Matthews, Kenny Mitchell","doi":"10.1109/3DV.2016.72","DOIUrl":"https://doi.org/10.1109/3DV.2016.72","url":null,"abstract":"Real-time facial performance capture has recently been gaining popularity in virtual film production, driven by advances in machine learning, which allows for fast inference of facial geometry from video streams. These learning-based approaches are significantly influenced by the quality and amount of labelled training data. Tedious construction of training sets from real imagery can be replaced by rendering a facial animation rig under on-set conditions expected at runtime. We learn a synthetic actor-specific prior by adapting a state-of-the-art facial tracking method. Synthetic training significantly reduces the capture and annotation burden and in theory allows generation of an arbitrary amount of data. But practical realities such as training time and compute resources still limit the size of any training set. We construct better and smaller training sets by investigating which facial image appearances are crucial for tracking accuracy, covering the dimensions of expression, viewpoint and illumination. A reduction of training data in 1-2 orders of magnitude is demonstrated whilst tracking accuracy is retained for challenging on-set footage.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126399742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an algorithm for robust and real-time visual tracking under challenging illumination conditions characterized by poor lighting as well as sudden and drastic changes in illumination. Robustness is achieved by adapting illumination-invariant binary descriptors to dense image alignment using the Lucas and Kanade algorithm. The proposed adaptation preserves the Hamming distance under least-squares minimization, thus preserving the photometric invariance properties of binary descriptors. Due to the compactness of the descriptor, the algorithm runs in excess of 400 fps on laptops and 100 fps on mobile devices.
{"title":"Robust Tracking in Low Light and Sudden Illumination Changes","authors":"Hatem Alismail, Brett Browning, S. Lucey","doi":"10.1109/3DV.2016.48","DOIUrl":"https://doi.org/10.1109/3DV.2016.48","url":null,"abstract":"We present an algorithm for robust and real-time visual tracking under challenging illumination conditions characterized by poor lighting as well as sudden and drastic changes in illumination. Robustness is achieved by adapting illumination-invariant binary descriptors to dense image alignment using the Lucas and Kanade algorithm. The proposed adaptation preserves the Hamming distance under least-squares minimization, thus preserving the photometric invariance properties of binary descriptors. Due to the compactness of the descriptor, the algorithm runs in excess of 400 fps on laptops and 100 fps on mobile devices.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nadia Robertini, D. Casas, Helge Rhodin, H. Seidel, C. Theobalt
We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.
{"title":"Model-Based Outdoor Performance Capture","authors":"Nadia Robertini, D. Casas, Helge Rhodin, H. Seidel, C. Theobalt","doi":"10.1109/3DV.2016.25","DOIUrl":"https://doi.org/10.1109/3DV.2016.25","url":null,"abstract":"We propose a new model-based method to accurately reconstruct human performances captured outdoors in a multi-camera setup. Starting from a template of the actor model, we introduce a new unified implicit representation for both, articulated skeleton tracking and non-rigid surface shape refinement. Our method fits the template to unsegmented video frames in two stages - first, the coarse skeletal pose is estimated, and subsequently non-rigid surface shape and body pose are jointly refined. Particularly for surface shape refinement we propose a new combination of 3D Gaussians designed to align the projected model with likely silhouette contours without explicit segmentation or edge detection. We obtain reconstructions of much higher quality in outdoor settings than existing methods, and show that we are on par with state-of-the-art methods on indoor scenes for which they were designed.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133183116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Georgakis, Md. Alimoor Reza, Arsalan Mousavian, P. Le, J. Kosecka
This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: (i) a new multi-view 3D proposal generation method and (ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.
{"title":"Multiview RGB-D Dataset for Object Instance Detection","authors":"G. Georgakis, Md. Alimoor Reza, Arsalan Mousavian, P. Le, J. Kosecka","doi":"10.1109/3DV.2016.52","DOIUrl":"https://doi.org/10.1109/3DV.2016.52","url":null,"abstract":"This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: (i) a new multi-view 3D proposal generation method and (ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128180600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, J. Kosecka, A. Berg
For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.
在导航和机器人应用中,估计物体的三维姿态与检测同样重要。许多姿态估计方法依赖于检测或跟踪部件或关键点[11,21]。在本文中,我们建立了一个最新的最先进的卷积网络,用于滑动窗口检测[10],在单个镜头中提供检测和粗略的姿态估计,而不需要检测部件或初始边界框的中间阶段。虽然这不是第一个将姿态估计视为分类问题的系统,但这是第一次尝试使用深度学习方法将检测和姿态估计结合在同一层次上。该架构的关键是一个深度卷积网络,其中对象类别存在的分数,其位置的偏移量和近似姿态都是在图像中位置的规则网格上估计的。由此产生的系统与最近在姿态估计上的工作一样准确(42.4% 8在Pascal 3D+[21]上查看mAVP),并且明显更快(在TITAN X GPU上每秒46帧(FPS))。这种检测和粗略姿态估计方法足够快速和准确,可以广泛应用于高精度姿态估计、目标跟踪和定位以及vSLAM等任务的预处理步骤。
{"title":"Fast Single Shot Detection and Pose Estimation","authors":"Patrick Poirson, Phil Ammirato, Cheng-Yang Fu, Wei Liu, J. Kosecka, A. Berg","doi":"10.1109/3DV.2016.78","DOIUrl":"https://doi.org/10.1109/3DV.2016.78","url":null,"abstract":"For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for sliding-window detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132554540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bronstein, Yoni Choukroun, R. Kimmel, Matan Sela
The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes [14]. The continuous L1 norm on the manifold is often replaced by the vector ℓ1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed ℓ2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.
{"title":"Consistent Discretization and Minimization of the L1 Norm on Manifolds","authors":"A. Bronstein, Yoni Choukroun, R. Kimmel, Matan Sela","doi":"10.1109/3DV.2016.53","DOIUrl":"https://doi.org/10.1109/3DV.2016.53","url":null,"abstract":"The L1 norm has been tremendously popular in signal and image processing in the past two decades due to its sparsity-promoting properties. More recently, its generalization to non-Euclidean domains has been found useful in shape analysis applications. For example, in conjunction with the minimization of the Dirichlet energy, it was shown to produce a compactly supported quasi-harmonic orthonormal basis, dubbed as compressed manifold modes [14]. The continuous L1 norm on the manifold is often replaced by the vector ℓ1 norm applied to sampled functions. We show that such an approach is incorrect in the sense that it does not consistently discretize the continuous norm and warn against its sensitivity to the specific sampling. We propose two alternative discretizations resulting in an iteratively-reweighed ℓ2 norm. We demonstrate the proposed strategy on the compressed modes problem, which reduces to a sequence of simple eigendecomposition problems not requiring non-convex optimization on Stiefel manifolds and producing more stable and accurate results.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}