GS-Net：可通用的即插即用 3D 高斯拼接模块

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-17 DOI:arxiv-2409.11307

Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li

{"title":"GS-Net：可通用的即插即用 3D 高斯拼接模块","authors":"Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li","doi":"arxiv-2409.11307","DOIUrl":null,"url":null,"abstract":"3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based\nrepresentations and volumetric rendering techniques, enabling real-time,\nhigh-quality rendering. However, 3DGS models typically overfit to single-scene\ntraining and are highly sensitive to the initialization of Gaussian ellipsoids,\nheuristically derived from Structure from Motion (SfM) point clouds, which\nlimits both generalization and practicality. To address these limitations, we\npropose GS-Net, a generalizable, plug-and-play 3DGS module that densifies\nGaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure\nrepresentation. To the best of our knowledge, GS-Net is the first plug-and-play\n3DGS module with cross-scene generalization capabilities. Additionally, we\nintroduce the CARLA-NVS dataset, which incorporates additional camera\nviewpoints to thoroughly evaluate reconstruction and rendering quality.\nExtensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR\nimprovement of 2.08 dB for conventional viewpoints and 1.86 dB for novel\nviewpoints, confirming the method's effectiveness and robustness.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"65 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module\",\"authors\":\"Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li\",\"doi\":\"arxiv-2409.11307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based\\nrepresentations and volumetric rendering techniques, enabling real-time,\\nhigh-quality rendering. However, 3DGS models typically overfit to single-scene\\ntraining and are highly sensitive to the initialization of Gaussian ellipsoids,\\nheuristically derived from Structure from Motion (SfM) point clouds, which\\nlimits both generalization and practicality. To address these limitations, we\\npropose GS-Net, a generalizable, plug-and-play 3DGS module that densifies\\nGaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure\\nrepresentation. To the best of our knowledge, GS-Net is the first plug-and-play\\n3DGS module with cross-scene generalization capabilities. Additionally, we\\nintroduce the CARLA-NVS dataset, which incorporates additional camera\\nviewpoints to thoroughly evaluate reconstruction and rendering quality.\\nExtensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR\\nimprovement of 2.08 dB for conventional viewpoints and 1.86 dB for novel\\nviewpoints, confirming the method's effectiveness and robustness.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"65 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11307\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

3D Gaussian Splatting（3DGS）集成了基于基元的表示和体积渲染技术的优势，可实现实时、高质量的渲染。然而，3DGS 模型通常会过度适应单场景训练，并且对高斯椭圆的初始化高度敏感，而高斯椭圆是从运动结构（SfM）点云中启发式导出的，这限制了其通用性和实用性。为了解决这些局限性，我们提出了 GS-Net，这是一种可通用、即插即用的 3DGS 模块，可从稀疏的 SfM 点云中密集化高斯椭圆，从而增强几何结构表示。据我们所知，GS-Net 是第一个具有跨场景通用能力的即插即用 3DGS 模块。此外，我们还引入了 CARLA-NVS 数据集，该数据集包含了额外的摄像机视点，可全面评估重建和渲染质量。大量实验证明，将 GS-Net 应用于 3DGS 可使传统视点的 PSNR 提高 2.08 dB，新视点的 PSNR 提高 1.86 dB，从而证实了该方法的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey