3D- umamba:基于状态空间模型的三维U-Net多源激光雷达点云语义分割

Dening Lu , Linlin Xu , Jun Zhou , Kyle Gao , Zheng Gong , Dedong Zhang
{"title":"3D- umamba:基于状态空间模型的三维U-Net多源激光雷达点云语义分割","authors":"Dening Lu ,&nbsp;Linlin Xu ,&nbsp;Jun Zhou ,&nbsp;Kyle Gao ,&nbsp;Zheng Gong ,&nbsp;Dedong Zhang","doi":"10.1016/j.jag.2025.104401","DOIUrl":null,"url":null,"abstract":"<div><div>Segmentation of point clouds is foundational to numerous remote sensing applications. Recently, the development of Transformers has further improved segmentation techniques thanks to their great long-range context modeling capability. However, Transformers have quadratic complexity in inference time and memory, which both limits the input size and poses a strict hardware requirement. This paper presents a novel 3D-UMamba network with linear complexity, which is the earliest to introduce the Selective State Space Model (i.e., Mamba) to multi-source LiDAR point cloud processing. 3D-UMamba integrates Mamba into the classic U-Net architecture, presenting outstanding global context modeling with high efficiency and achieving an effective combination of local and global information. In addition, we propose a simple yet efficient 3D-token serialization approach (Voxel-based Token Serialization, i.e., VTS) for Mamba, where the Bi-Scanning strategy enables the model to collect features from all input points in different directions effectively. The performance of 3D-UMamba on three challenging LiDAR point cloud datasets (airborne MultiSpectral LiDAR (MS-LiDAR), aerial DALES, and vehicle-mounted Toronto-3D) demonstrated its superiority in multi-source LiDAR point cloud semantic segmentation, as well as the strong adaptability of Mamba to different types of LiDAR data, exceeding current state-of-the-art models. Ablation studies demonstrated the higher efficiency and lower memory costs of 3D-UMamba than its Transformer-based counterparts.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"136 ","pages":"Article 104401"},"PeriodicalIF":8.6000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"3D-UMamba: 3D U-Net with state space model for semantic segmentation of multi-source LiDAR point clouds\",\"authors\":\"Dening Lu ,&nbsp;Linlin Xu ,&nbsp;Jun Zhou ,&nbsp;Kyle Gao ,&nbsp;Zheng Gong ,&nbsp;Dedong Zhang\",\"doi\":\"10.1016/j.jag.2025.104401\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Segmentation of point clouds is foundational to numerous remote sensing applications. Recently, the development of Transformers has further improved segmentation techniques thanks to their great long-range context modeling capability. However, Transformers have quadratic complexity in inference time and memory, which both limits the input size and poses a strict hardware requirement. This paper presents a novel 3D-UMamba network with linear complexity, which is the earliest to introduce the Selective State Space Model (i.e., Mamba) to multi-source LiDAR point cloud processing. 3D-UMamba integrates Mamba into the classic U-Net architecture, presenting outstanding global context modeling with high efficiency and achieving an effective combination of local and global information. In addition, we propose a simple yet efficient 3D-token serialization approach (Voxel-based Token Serialization, i.e., VTS) for Mamba, where the Bi-Scanning strategy enables the model to collect features from all input points in different directions effectively. The performance of 3D-UMamba on three challenging LiDAR point cloud datasets (airborne MultiSpectral LiDAR (MS-LiDAR), aerial DALES, and vehicle-mounted Toronto-3D) demonstrated its superiority in multi-source LiDAR point cloud semantic segmentation, as well as the strong adaptability of Mamba to different types of LiDAR data, exceeding current state-of-the-art models. Ablation studies demonstrated the higher efficiency and lower memory costs of 3D-UMamba than its Transformer-based counterparts.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"136 \",\"pages\":\"Article 104401\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225000482\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225000482","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0

摘要

点云的分割是众多遥感应用的基础。最近,变压器的发展进一步改进了分割技术,这得益于其强大的远程上下文建模能力。然而,变压器在推理时间和内存方面具有二次复杂度,这既限制了输入的大小,又对硬件提出了严格的要求。本文提出了一种新颖的具有线性复杂度的3D-UMamba网络,最早将选择状态空间模型(即Mamba)引入到多源激光雷达点云处理中。3D-UMamba将Mamba集成到经典的U-Net架构中,以高效率呈现出色的全球上下文建模,实现本地和全球信息的有效结合。此外,我们为曼巴提出了一种简单而高效的3d令牌序列化方法(基于体素的令牌序列化,即VTS),其中Bi-Scanning策略使模型能够有效地从不同方向的所有输入点收集特征。3D-UMamba在三个具有挑战性的激光雷达点云数据集(机载多光谱激光雷达(MS-LiDAR)、机载DALES和车载Toronto-3D)上的表现证明了其在多源激光雷达点云语义分割方面的优势,以及Mamba对不同类型激光雷达数据的强大适应性,超过了目前最先进的模型。烧蚀研究表明,与基于变压器的同类产品相比,3D-UMamba的效率更高,存储成本更低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
3D-UMamba: 3D U-Net with state space model for semantic segmentation of multi-source LiDAR point clouds
Segmentation of point clouds is foundational to numerous remote sensing applications. Recently, the development of Transformers has further improved segmentation techniques thanks to their great long-range context modeling capability. However, Transformers have quadratic complexity in inference time and memory, which both limits the input size and poses a strict hardware requirement. This paper presents a novel 3D-UMamba network with linear complexity, which is the earliest to introduce the Selective State Space Model (i.e., Mamba) to multi-source LiDAR point cloud processing. 3D-UMamba integrates Mamba into the classic U-Net architecture, presenting outstanding global context modeling with high efficiency and achieving an effective combination of local and global information. In addition, we propose a simple yet efficient 3D-token serialization approach (Voxel-based Token Serialization, i.e., VTS) for Mamba, where the Bi-Scanning strategy enables the model to collect features from all input points in different directions effectively. The performance of 3D-UMamba on three challenging LiDAR point cloud datasets (airborne MultiSpectral LiDAR (MS-LiDAR), aerial DALES, and vehicle-mounted Toronto-3D) demonstrated its superiority in multi-source LiDAR point cloud semantic segmentation, as well as the strong adaptability of Mamba to different types of LiDAR data, exceeding current state-of-the-art models. Ablation studies demonstrated the higher efficiency and lower memory costs of 3D-UMamba than its Transformer-based counterparts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International journal of applied earth observation and geoinformation : ITC journal
International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences
CiteScore
12.00
自引率
0.00%
发文量
0
审稿时长
77 days
期刊介绍: The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.
期刊最新文献
A cloud-computing framework for downscaled global 300 m SIF retrieval from Sentinel-3 and TROPOSIF Improving terrestrial net primary productivity estimation through process-consistent FPAR retrieval from multi-source remote sensing data A robust framework for estimating precipitable water vapor based on near-infrared remote sensing data and reanalysis data fusion (NRF) method Quantifying post-fire differences in forest structure and aboveground biomass using the TECIS full-waveform LiDAR system Spectral assessment of senescent sourveld and sweetveld grass quality using continuum-removed absorption features from EnMAP spaceborne imaging spectrometer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1