Learning Virtual View Selection for 3D Scene Semantic Segmentation

Tai-Jiang Mu;Ming-Yuan Shen;Yu-Kun Lai;Shi-Min Hu
{"title":"Learning Virtual View Selection for 3D Scene Semantic Segmentation","authors":"Tai-Jiang Mu;Ming-Yuan Shen;Yu-Kun Lai;Shi-Min Hu","doi":"10.1109/TIP.2024.3421952","DOIUrl":null,"url":null,"abstract":"2D-3D joint learning is essential and effective for fundamental 3D vision tasks, such as 3D semantic segmentation, due to the complementary information these two visual modalities contain. Most current 3D scene semantic segmentation methods process 2D images “as they are”, i.e., only real captured 2D images are used. However, such captured 2D images may be redundant, with abundant occlusion and/or limited field of view (FoV), leading to poor performance for the current methods involving 2D inputs. In this paper, we propose a general learning framework for joint 2D-3D scene understanding by selecting informative virtual 2D views of the underlying 3D scene. We then feed both the 3D geometry and the generated virtual 2D views into any joint 2D-3D-input or pure 3D-input based deep neural models for improving 3D scene understanding. Specifically, we generate virtual 2D views based on an information score map learned from the current 3D scene semantic segmentation results. To achieve this, we formalize the learning of the information score map as a deep reinforcement learning process, which rewards good predictions using a deep neural network. To obtain a compact set of virtual 2D views that jointly cover informative surfaces of the 3D scene as much as possible, we further propose an efficient greedy virtual view coverage strategy in the normal-sensitive 6D space, including 3-dimensional point coordinates and 3-dimensional normal. We have validated our proposed framework for various joint 2D-3D-input or pure 3D-input based deep neural models on two real-world 3D scene datasets, i.e., ScanNet v2 and S3DIS, and the results demonstrate that our method obtains a consistent gain over baseline models and achieves new top accuracy for joint 2D and 3D scene semantic segmentation. Code is available at \n<uri>https://github.com/smy-THU/VirtualViewSelection</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10594720/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

2D-3D joint learning is essential and effective for fundamental 3D vision tasks, such as 3D semantic segmentation, due to the complementary information these two visual modalities contain. Most current 3D scene semantic segmentation methods process 2D images “as they are”, i.e., only real captured 2D images are used. However, such captured 2D images may be redundant, with abundant occlusion and/or limited field of view (FoV), leading to poor performance for the current methods involving 2D inputs. In this paper, we propose a general learning framework for joint 2D-3D scene understanding by selecting informative virtual 2D views of the underlying 3D scene. We then feed both the 3D geometry and the generated virtual 2D views into any joint 2D-3D-input or pure 3D-input based deep neural models for improving 3D scene understanding. Specifically, we generate virtual 2D views based on an information score map learned from the current 3D scene semantic segmentation results. To achieve this, we formalize the learning of the information score map as a deep reinforcement learning process, which rewards good predictions using a deep neural network. To obtain a compact set of virtual 2D views that jointly cover informative surfaces of the 3D scene as much as possible, we further propose an efficient greedy virtual view coverage strategy in the normal-sensitive 6D space, including 3-dimensional point coordinates and 3-dimensional normal. We have validated our proposed framework for various joint 2D-3D-input or pure 3D-input based deep neural models on two real-world 3D scene datasets, i.e., ScanNet v2 and S3DIS, and the results demonstrate that our method obtains a consistent gain over baseline models and achieves new top accuracy for joint 2D and 3D scene semantic segmentation. Code is available at https://github.com/smy-THU/VirtualViewSelection .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
学习虚拟视图选择,实现 3D 场景语义分割。
2D-3D 联合学习对于基本的 3D 视觉任务(如 3D 语义分割)至关重要且非常有效,因为这两种视觉模式包含互补信息。目前大多数三维场景语义分割方法都是 "按原样 "处理二维图像,即只使用真实拍摄的二维图像。然而,这些捕捉到的 2D 图像可能是冗余的,存在大量遮挡和/或有限的视场(FoV),导致当前涉及 2D 输入的方法性能不佳。在本文中,我们为 2D-3D 场景联合理解提出了一个通用学习框架,即从底层 3D 场景中选择信息丰富的虚拟 2D 视图。然后,我们将三维几何图形和生成的虚拟二维视图输入任何基于二维-三维联合输入或纯三维输入的深度神经模型,以提高三维场景理解能力。具体来说,我们根据从当前三维场景语义分割结果中学到的信息分数图生成虚拟二维视图。为此,我们将信息评分图的学习形式化为深度强化学习过程,利用深度神经网络对良好预测进行奖励。为了获得一组紧凑的虚拟二维视图,尽可能地共同覆盖三维场景的信息面,我们进一步提出了一种在法线敏感的六维空间(包括三维点坐标和三维法线)中高效的贪婪虚拟视图覆盖策略。我们在两个真实世界的三维场景数据集(即 ScanNet v2 [1] 和 S3DIS [2])上验证了我们提出的各种基于 2D-3D 联合输入或纯 3D 输入的深度神经模型框架,结果表明,我们的方法比基线模型获得了一致的增益,并在 2D 和 3D 场景语义联合分割方面达到了新的最高精度。代码见 https://github.com/smy-THU/VirtualViewSelection。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Balanced Destruction-Reconstruction Dynamics for Memory-Replay Class Incremental Learning Blind Video Quality Prediction by Uncovering Human Video Perceptual Representation. Contrastive Open-set Active Learning based Sample Selection for Image Classification. Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target Dataset Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1