Explore In-Context Learning for 3D Point Cloud Understanding

ArXiv Pub Date : 2023-06-14 DOI:10.48550/arXiv.2306.08659

Zhongbin Fang, Xiangtai Li, Xia Li, J. Buhmann, Chen Change Loy, Mengyuan Liu

{"title":"Explore In-Context Learning for 3D Point Cloud Understanding","authors":"Zhongbin Fang, Xiangtai Li, Xia Li, J. Buhmann, Chen Change Loy, Mengyuan Liu","doi":"10.48550/arXiv.2306.08659","DOIUrl":null,"url":null,"abstract":"With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. Furthermore, with a more effective prompt selection strategy, our framework surpasses the results of individually trained models.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.08659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the rise of large-scale models trained on broad data, in-context learning has become a new learning paradigm that has demonstrated significant potential in natural language processing and computer vision tasks. Meanwhile, in-context learning is still largely unexplored in the 3D point cloud domain. Although masked modeling has been successfully applied for in-context learning in 2D vision, directly extending it to 3D point clouds remains a formidable challenge. In the case of point clouds, the tokens themselves are the point cloud positions (coordinates) that are masked during inference. Moreover, position embedding in previous works may inadvertently introduce information leakage. To address these challenges, we introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds, where both inputs and outputs are modeled as coordinates for each task. Additionally, we propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator, effectively resolving the aforementioned technical issues. We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks. Furthermore, with a more effective prompt selection strategy, our framework surpasses the results of individually trained models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

探索三维点云理解的上下文学习

随着基于广泛数据训练的大规模模型的兴起，上下文学习已经成为一种新的学习范式，在自然语言处理和计算机视觉任务中显示出巨大的潜力。与此同时，在三维点云领域中，上下文学习在很大程度上仍未被探索。虽然蒙面建模已经成功地应用于2D视觉中的情境学习，但将其直接扩展到3D点云仍然是一个巨大的挑战。在点云的情况下，令牌本身就是在推理过程中被掩盖的点云位置(坐标)。此外，在以前的作品中，位置嵌入可能会在不经意间引入信息泄露。为了解决这些挑战，我们引入了一个新的框架，名为上下文点，专为3D点云中的上下文学习而设计，其中输入和输出都被建模为每个任务的坐标。此外，我们提出了联合采样模块，精心设计与一般点采样算子协同工作，有效地解决了上述技术问题。我们进行了大量的实验来验证我们提出的方法在处理广泛任务方面的多功能性和适应性。此外，通过更有效的快速选择策略，我们的框架超越了单独训练模型的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ArXiv

自引率

0.00%

发文量