Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images

IF 2.1 4区 地球科学 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Pub Date : 2024-09-18 DOI:10.1007/s41064-024-00308-9
S. Abualhanud, E. Erahan, M. Mehltretter
{"title":"Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images","authors":"S. Abualhanud, E. Erahan, M. Mehltretter","doi":"10.1007/s41064-024-00308-9","DOIUrl":null,"url":null,"abstract":"<p>An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.</p>","PeriodicalId":56035,"journal":{"name":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","volume":"6 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s41064-024-00308-9","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
根据多视角二维环绕图像进行自监督三维语义占用预测
对环境的几何形状和语义进行精确的三维表示,为各种下游任务奠定了基础,对于自动驾驶相关任务(如路径规划和避障)也至关重要。这项工作的重点是三维语义占位预测,即以体素网格的形式重建场景,其中每个体素都被分配了占位和语义标签。我们提出了一种基于卷积神经网络的方法,该方法利用环视设置中重叠最少的多幅彩色图像以及相关的内部和外部相机参数作为输入,将观察到的环境重建为三维语义占位图。考虑到从单目二维图像重建三维表示的不确定性,图像信息是随时间整合的:假设摄像机是移动的,那么连续时间步骤的图像将被用于形成多视角立体设置。在详尽的实验中,我们研究了动态物体带来的挑战,以及使用三维或二维参考数据训练所提方法的可能性。后者是因为生成和注释三维地面实况数据的成本相对较高。此外,我们还提出并研究了一种新颖的自监督训练方案,它不需要任何几何参考数据,而只依赖于稀疏的语义地面实况。在 Occ3D 数据集上进行的评估,包括与目前文献中最先进的自监督方法的比较,证明了我们的自监督变体的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.20
自引率
2.40%
发文量
38
期刊介绍: PFG is an international scholarly journal covering the progress and application of photogrammetric methods, remote sensing technology and the interconnected field of geoinformation science. It places special editorial emphasis on the communication of new methodologies in data acquisition and new approaches to optimized processing and interpretation of all types of data which were acquired by photogrammetric methods, remote sensing, image processing and the computer-aided interpretation of such data in general. The journal hence addresses both researchers and students of these disciplines at academic institutions and universities as well as the downstream users in both the private sector and public administration. Founded in 1926 under the former name Bildmessung und Luftbildwesen, PFG is worldwide the oldest journal on photogrammetry. It is the official journal of the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).
期刊最新文献
Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images Characterization of transient movements within the Joshimath hillslope complex: Results from multi-sensor InSAR observations Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN Assessing the Impact of Data-resolution On Ocean Frontal Characteristics Challenges and Opportunities of Sentinel-1 InSAR for Transport Infrastructure Monitoring
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1