Pub Date : 2022-09-01DOI: 10.1109/3dv57658.2022.00006
Angela Dai, J. Kosecka, Gin Hee Lee, K. Schindler
{"title":"Message from the Program Chairs: 3DV 2022","authors":"Angela Dai, J. Kosecka, Gin Hee Lee, K. Schindler","doi":"10.1109/3dv57658.2022.00006","DOIUrl":"https://doi.org/10.1109/3dv57658.2022.00006","url":null,"abstract":"","PeriodicalId":91162,"journal":{"name":"Proceedings. International Conference on 3D Vision","volume":"88 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78708115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ldav51489.2020.00005
A. Hilton, Z. Kukelova, Stephen Lin, J. Sato
{"title":"Message from the 3DV 2020 Program Chairs","authors":"A. Hilton, Z. Kukelova, Stephen Lin, J. Sato","doi":"10.1109/ldav51489.2020.00005","DOIUrl":"https://doi.org/10.1109/ldav51489.2020.00005","url":null,"abstract":"","PeriodicalId":91162,"journal":{"name":"Proceedings. International Conference on 3D Vision","volume":"1 1","pages":"xx"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79162892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lyne P. Tchapmi, C. Choy, Iro Armeni, JunYoung Gwak, S. Savarese
3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks (NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation. Then the FC-CRF enforces global consistency and provides fine-grained semantics on the points. We implement the latter as a differentiable Recurrent NN to allow joint optimization. We evaluate the framework on two indoor and two outdoor 3D datasets (NYU V2, S3DIS, KITTI, this http URL), and show performance comparable or superior to the state-of-the-art on all datasets.
{"title":"SEGCloud: Semantic Segmentation of 3D Point Clouds","authors":"Lyne P. Tchapmi, C. Choy, Iro Armeni, JunYoung Gwak, S. Savarese","doi":"10.1109/3DV.2017.00067","DOIUrl":"https://doi.org/10.1109/3DV.2017.00067","url":null,"abstract":"3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks (NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation. Then the FC-CRF enforces global consistency and provides fine-grained semantics on the points. We implement the latter as a differentiable Recurrent NN to allow joint optimization. We evaluate the framework on two indoor and two outdoor 3D datasets (NYU V2, S3DIS, KITTI, this http URL), and show performance comparable or superior to the state-of-the-art on all datasets.","PeriodicalId":91162,"journal":{"name":"Proceedings. International Conference on 3D Vision","volume":"43 1","pages":"537-547"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86432312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a thorough evaluation of several widely-used 3D correspondence grouping algorithms, motived by their significance in vision tasks relying on correct feature correspondences. A good correspondence grouping algorithm is desired to retrieve as many as inliers from initial feature matches, giving a rise in both precision and recall. Towards this rule, we deploy the experiments on three benchmarks respectively addressing shape retrieval, 3D object recognition and point cloud registration scenarios. The variety in application context brings a rich category of nuisances including noise, varying point densities, clutter, occlusion and partial overlaps. It also results to different ratios of inliers and correspondence distributions for comprehensive evaluation. Based on the quantitative outcomes, we give a summarization of the merits/demerits of the evaluated algorithms from both performance and efficiency perspectives.
{"title":"Performance Evaluation of 3D Correspondence Grouping Algorithms","authors":"Jiaqi Yang, Ke Xian, Yang Xiao, ZHIGUO CAO","doi":"10.1109/3DV.2017.00060","DOIUrl":"https://doi.org/10.1109/3DV.2017.00060","url":null,"abstract":"This paper presents a thorough evaluation of several widely-used 3D correspondence grouping algorithms, motived by their significance in vision tasks relying on correct feature correspondences. A good correspondence grouping algorithm is desired to retrieve as many as inliers from initial feature matches, giving a rise in both precision and recall. Towards this rule, we deploy the experiments on three benchmarks respectively addressing shape retrieval, 3D object recognition and point cloud registration scenarios. The variety in application context brings a rich category of nuisances including noise, varying point densities, clutter, occlusion and partial overlaps. It also results to different ratios of inliers and correspondence distributions for comprehensive evaluation. Based on the quantitative outcomes, we give a summarization of the merits/demerits of the evaluated algorithms from both performance and efficiency perspectives.","PeriodicalId":91162,"journal":{"name":"Proceedings. International Conference on 3D Vision","volume":"48 1","pages":"467-476"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82617045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many monocular visual SLAM algorithms are derived from incremental structure-from-motion (SfM) methods. This work proposes a novel monocular SLAM method which integrates recent advances made in global SfM. In particular, we present two main contributions to visual SLAM. First, we solve the visual odometry problem by a novel rank-1 matrix factorization technique which is more robust to the errors in map initialization. Second, we adopt a recent global SfM method for the pose-graph optimization, which leads to a multi-stage linear formulation and enables L1 optimization for better robustness to false loops. The combination of these two approaches generates more robust reconstruction and is significantly faster (4X) than recent state-of-the-art SLAM systems. We also present a new dataset recorded with ground truth camera motion in a Vicon motion capture room, and compare our method to prior systems on it and established benchmark datasets.
{"title":"GSLAM: Initialization-Robust Monocular Visual SLAM via Global Structure-from-Motion","authors":"Chengzhou Tang, Oliver Wang, P. Tan","doi":"10.1109/3DV.2017.00027","DOIUrl":"https://doi.org/10.1109/3DV.2017.00027","url":null,"abstract":"Many monocular visual SLAM algorithms are derived from incremental structure-from-motion (SfM) methods. This work proposes a novel monocular SLAM method which integrates recent advances made in global SfM. In particular, we present two main contributions to visual SLAM. First, we solve the visual odometry problem by a novel rank-1 matrix factorization technique which is more robust to the errors in map initialization. Second, we adopt a recent global SfM method for the pose-graph optimization, which leads to a multi-stage linear formulation and enables L1 optimization for better robustness to false loops. The combination of these two approaches generates more robust reconstruction and is significantly faster (4X) than recent state-of-the-art SLAM systems. We also present a new dataset recorded with ground truth camera motion in a Vicon motion capture room, and compare our method to prior systems on it and established benchmark datasets.","PeriodicalId":91162,"journal":{"name":"Proceedings. International Conference on 3D Vision","volume":"6 1","pages":"155-164"},"PeriodicalIF":0.0,"publicationDate":"2017-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82151666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shu Liang, Ira Kemelmacher-Shlizerman, Linda G Shapiro
We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a highresolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-of-the-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities.
{"title":"3D Face Hallucination from a Single Depth Frame.","authors":"Shu Liang, Ira Kemelmacher-Shlizerman, Linda G Shapiro","doi":"10.1109/3DV.2014.67","DOIUrl":"https://doi.org/10.1109/3DV.2014.67","url":null,"abstract":"<p><p>We present an algorithm that takes a single frame of a person's face from a depth camera, e.g., Kinect, and produces a high-resolution 3D mesh of the input face. We leverage a dataset of 3D face meshes of 1204 distinct individuals ranging from age 3 to 40, captured in a neutral expression. We divide the input depth frame into semantically significant regions (eyes, nose, mouth, cheeks) and search the database for the best matching shape per region. We further combine the input depth frame with the matched database shapes into a single mesh that results in a highresolution shape of the input person. Our system is fully automatic and uses only depth data for matching, making it invariant to imaging conditions. We evaluate our results using ground truth shapes, as well as compare to state-of-the-art shape estimation methods. We demonstrate the robustness of our local matching approach with high-quality reconstruction of faces that fall outside of the dataset span, e.g., faces older than 40 years old, facial expressions, and different ethnicities.</p>","PeriodicalId":91162,"journal":{"name":"Proceedings. International Conference on 3D Vision","volume":"2014 ","pages":"31-38"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/3DV.2014.67","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33994948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}