Point-MPP: Point Cloud Self-Supervised Learning From Masked Position Prediction

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2024-10-24 DOI:10.1109/TNNLS.2024.3479309

Songlin Fan;Wei Gao;Ge Li

{"title":"Point-MPP: Point Cloud Self-Supervised Learning From Masked Position Prediction","authors":"Songlin Fan;Wei Gao;Ge Li","doi":"10.1109/TNNLS.2024.3479309","DOIUrl":null,"url":null,"abstract":"Masked autoencoding has gained momentum for improving fine-tuning performance in many downstream tasks. However, it tends to focus on low-level reconstruction details, lacking high-level semantics and resulting in weak transfer capability. This article presents a novel jigsaw puzzle solver inspired by the idea that predicting the positions of disordered point cloud patches provides more semantic information, similar to how children learn by solving jigsaw puzzles. Our method adopts the mask-then-predict paradigm, erasing the positions of selected point patches rather than their contents. We first partition input point clouds into irregular patches and randomly erase the positions of some patches. Then, a Transformer-based model is used to learn high-level semantic features and regress the positions of the masked patches. This approach forces the model to focus on learning transfer-robust semantics while paying less attention to low-level details. To tie the predictions within the encoding space, we further introduce a consistency constraint on their latent representations to encourage the encoded features to contain more semantic cues. We demonstrate that a standard Transformer backbone with our pretraining scheme can capture discriminative point cloud semantic information. Furthermore, extensive experiments indicate that our method outperforms the previous best competitor across six popular downstream vision tasks, achieving new state-of-the-art performance. Codes will be available at <uri>https://git.openi.org.cn/OpenPointCloud/Point-MPP</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 7","pages":"12964-12976"},"PeriodicalIF":8.9000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10734244/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Masked autoencoding has gained momentum for improving fine-tuning performance in many downstream tasks. However, it tends to focus on low-level reconstruction details, lacking high-level semantics and resulting in weak transfer capability. This article presents a novel jigsaw puzzle solver inspired by the idea that predicting the positions of disordered point cloud patches provides more semantic information, similar to how children learn by solving jigsaw puzzles. Our method adopts the mask-then-predict paradigm, erasing the positions of selected point patches rather than their contents. We first partition input point clouds into irregular patches and randomly erase the positions of some patches. Then, a Transformer-based model is used to learn high-level semantic features and regress the positions of the masked patches. This approach forces the model to focus on learning transfer-robust semantics while paying less attention to low-level details. To tie the predictions within the encoding space, we further introduce a consistency constraint on their latent representations to encourage the encoded features to contain more semantic cues. We demonstrate that a standard Transformer backbone with our pretraining scheme can capture discriminative point cloud semantic information. Furthermore, extensive experiments indicate that our method outperforms the previous best competitor across six popular downstream vision tasks, achieving new state-of-the-art performance. Codes will be available at https://git.openi.org.cn/OpenPointCloud/Point-MPP.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Point-MPP：从屏蔽位置预测中进行点云自我监督学习

掩码自动编码在许多下游任务中获得了改善微调性能的势头。然而，它往往侧重于底层的重构细节，缺乏高层语义，导致传输能力较弱。这篇文章提出了一种新的拼图游戏解决方案，其灵感来自于预测无序点云补丁的位置可以提供更多的语义信息，类似于儿童如何通过解决拼图游戏来学习。我们的方法采用mask-then-predict模式，擦除所选点补丁的位置而不是其内容。我们首先将输入点云划分成不规则的小块，并随机擦除一些小块的位置。然后，使用基于transformer的模型学习高级语义特征并回归掩码补丁的位置。这种方法迫使模型专注于学习迁移鲁棒语义，而较少关注底层细节。为了将预测与编码空间联系起来，我们进一步对其潜在表示引入一致性约束，以鼓励编码特征包含更多的语义线索。我们证明了使用我们的预训练方案的标准Transformer主干可以捕获有区别的点云语义信息。此外，大量的实验表明，我们的方法在六个流行的下游视觉任务上优于之前的最佳竞争对手，实现了新的最先进的性能。代码可在https://git.openi.org.cn/OpenPointCloud/Point-MPP上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.