SAPFormer: Shape-aware propagation Transformer for point clouds

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-03-12 DOI:10.1016/j.patcog.2025.111578

Gang Xiao , Sihan Ge , Yangsheng Zhong , Zhongcheng Xiao , Junfeng Song , Jiawei Lu

{"title":"SAPFormer: Shape-aware propagation Transformer for point clouds","authors":"Gang Xiao , Sihan Ge , Yangsheng Zhong , Zhongcheng Xiao , Junfeng Song , Jiawei Lu","doi":"10.1016/j.patcog.2025.111578","DOIUrl":null,"url":null,"abstract":"<div><div>Transformer-based networks have achieved impressive performance on three-dimensional point cloud data. However, most existing methods focus on aggregating local features in the neighborhoods of a point cloud, ignoring the global feature information. Therefore, it is difficult to capture the long-range dependencies of a point cloud. In this paper, we propose the <strong>Shape-Aware Propagation Transformer (SAPFormer)</strong>, which flexibly captures the semantic information of point clouds in geometric space and effectively extracts the contextual geometric space information. Specifically, we first design local group self-attention (LGA) to capture the local interaction information in each region. To capture the separated local region feature relationships, we propose local group propagation (LGP) to pass the information between different regions via query points. This allows features to propagate among neighbors for more fine-grained feature information. To further enlarge the receptive field, we propose the global shape feature module (GSFM) to learn global context information through key shape points (KSPs). Finally, to solve the positional information cues between global contexts, we introduce spatial-shape relative position encoding (SS-RPE), which obtains positional relationships between points. Extensive experiments demonstrate the effectiveness and superiority of our method on the S3DIS, SensatUrban, ScanNet V2, ShapeNetPart, and ModelNet40 datasets. The code is available at <span><span>https://github.com/viivan/SAPFormer-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111578"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002389","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-based networks have achieved impressive performance on three-dimensional point cloud data. However, most existing methods focus on aggregating local features in the neighborhoods of a point cloud, ignoring the global feature information. Therefore, it is difficult to capture the long-range dependencies of a point cloud. In this paper, we propose the Shape-Aware Propagation Transformer (SAPFormer), which flexibly captures the semantic information of point clouds in geometric space and effectively extracts the contextual geometric space information. Specifically, we first design local group self-attention (LGA) to capture the local interaction information in each region. To capture the separated local region feature relationships, we propose local group propagation (LGP) to pass the information between different regions via query points. This allows features to propagate among neighbors for more fine-grained feature information. To further enlarge the receptive field, we propose the global shape feature module (GSFM) to learn global context information through key shape points (KSPs). Finally, to solve the positional information cues between global contexts, we introduce spatial-shape relative position encoding (SS-RPE), which obtains positional relationships between points. Extensive experiments demonstrate the effectiveness and superiority of our method on the S3DIS, SensatUrban, ScanNet V2, ShapeNetPart, and ModelNet40 datasets. The code is available at https://github.com/viivan/SAPFormer-main.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.