{"title":"Bayesian Self-Training for Semi-Supervised 3D Segmentation","authors":"Ozan Unal, Christos Sakaridis, Luc Van Gool","doi":"arxiv-2409.08102","DOIUrl":null,"url":null,"abstract":"3D segmentation is a core problem in computer vision and, similarly to many\nother dense prediction tasks, it requires large amounts of annotated data for\nadequate training. However, densely labeling 3D point clouds to employ\nfully-supervised training remains too labor intensive and expensive.\nSemi-supervised training provides a more practical alternative, where only a\nsmall set of labeled data is given, accompanied by a larger unlabeled set. This\narea thus studies the effective use of unlabeled data to reduce the performance\ngap that arises due to the lack of annotations. In this work, inspired by\nBayesian deep learning, we first propose a Bayesian self-training framework for\nsemi-supervised 3D semantic segmentation. Employing stochastic inference, we\ngenerate an initial set of pseudo-labels and then filter these based on\nestimated point-wise uncertainty. By constructing a heuristic $n$-partite\nmatching algorithm, we extend the method to semi-supervised 3D instance\nsegmentation, and finally, with the same building blocks, to dense 3D visual\ngrounding. We demonstrate state-of-the-art results for our semi-supervised\nmethod on SemanticKITTI and ScribbleKITTI for 3D semantic segmentation and on\nScanNet and S3DIS for 3D instance segmentation. We further achieve substantial\nimprovements in dense 3D visual grounding over supervised-only baselines on\nScanRefer. Our project page is available at ouenal.github.io/bst/.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
3D segmentation is a core problem in computer vision and, similarly to many
other dense prediction tasks, it requires large amounts of annotated data for
adequate training. However, densely labeling 3D point clouds to employ
fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a
small set of labeled data is given, accompanied by a larger unlabeled set. This
area thus studies the effective use of unlabeled data to reduce the performance
gap that arises due to the lack of annotations. In this work, inspired by
Bayesian deep learning, we first propose a Bayesian self-training framework for
semi-supervised 3D semantic segmentation. Employing stochastic inference, we
generate an initial set of pseudo-labels and then filter these based on
estimated point-wise uncertainty. By constructing a heuristic $n$-partite
matching algorithm, we extend the method to semi-supervised 3D instance
segmentation, and finally, with the same building blocks, to dense 3D visual
grounding. We demonstrate state-of-the-art results for our semi-supervised
method on SemanticKITTI and ScribbleKITTI for 3D semantic segmentation and on
ScanNet and S3DIS for 3D instance segmentation. We further achieve substantial
improvements in dense 3D visual grounding over supervised-only baselines on
ScanRefer. Our project page is available at ouenal.github.io/bst/.