Enhanced multi-scale feature adaptive fusion sparse convolutional network for large-scale scenes semantic segmentation

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computers & Graphics-Uk Pub Date : 2025-02-01 DOI:10.1016/j.cag.2024.104105

Lingfeng Shen , Yanlong Cao , Wenbin Zhu , Kai Ren , Yejun Shou , Haocheng Wang , Zhijie Xu

{"title":"Enhanced multi-scale feature adaptive fusion sparse convolutional network for large-scale scenes semantic segmentation","authors":"Lingfeng Shen , Yanlong Cao , Wenbin Zhu , Kai Ren , Yejun Shou , Haocheng Wang , Zhijie Xu","doi":"10.1016/j.cag.2024.104105","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic segmentation has made notable strides in analyzing homogeneous large-scale 3D scenes, yet its application to varied scenes with diverse characteristics poses considerable challenges. Traditional methods have been hampered by the dependence on resource-intensive neighborhood search algorithms, leading to elevated computational demands. To overcome these limitations, we introduce the MFAF-SCNet, a novel and computationally streamlined approach for voxel-based sparse convolutional. Our key innovation is the multi-scale feature adaptive fusion (MFAF) module, which intelligently applies a spectrum of convolution kernel sizes at the network’s entry point, enabling the extraction of multi-scale features. It adaptively calibrates the feature weighting to achieve optimal scale representation for different objects. Further augmenting our methodology is the LKSNet, an original sparse convolutional backbone designed to tackle the inherent inconsistencies in point cloud distribution. This is achieved by integrating inverted bottleneck structures with large kernel convolutions, significantly bolstering the network’s feature extraction and spatial correlation proficiency. The efficacy of MFAF-SCNet was rigorously tested against three large-scale benchmark datasets—ScanNet and S3DIS for indoor scenes, and SemanticKITTI for outdoor scenes. The experimental results underscore our method’s competitive edge, achieving high-performance benchmarks while ensuring computational efficiency.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104105"},"PeriodicalIF":2.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849324002401","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic segmentation has made notable strides in analyzing homogeneous large-scale 3D scenes, yet its application to varied scenes with diverse characteristics poses considerable challenges. Traditional methods have been hampered by the dependence on resource-intensive neighborhood search algorithms, leading to elevated computational demands. To overcome these limitations, we introduce the MFAF-SCNet, a novel and computationally streamlined approach for voxel-based sparse convolutional. Our key innovation is the multi-scale feature adaptive fusion (MFAF) module, which intelligently applies a spectrum of convolution kernel sizes at the network’s entry point, enabling the extraction of multi-scale features. It adaptively calibrates the feature weighting to achieve optimal scale representation for different objects. Further augmenting our methodology is the LKSNet, an original sparse convolutional backbone designed to tackle the inherent inconsistencies in point cloud distribution. This is achieved by integrating inverted bottleneck structures with large kernel convolutions, significantly bolstering the network’s feature extraction and spatial correlation proficiency. The efficacy of MFAF-SCNet was rigorously tested against three large-scale benchmark datasets—ScanNet and S3DIS for indoor scenes, and SemanticKITTI for outdoor scenes. The experimental results underscore our method’s competitive edge, achieving high-performance benchmarks while ensuring computational efficiency.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.

期刊最新文献

Celebrating 50 years of innovation in computer graphics: Issue 127 Foreword to chinagraph 2024 special section Real-time discrete visibility fields for ray-traced dynamic scenes Single-image reflectance and transmittance estimation from any flatbed scanner Evaluating user perception toward physics-adapted avatar in remote heterogeneous spaces