Boosting Cross-Domain Point Classification via Distilling Relational Priors From 2D Transformers

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-08-08 DOI:10.1109/TCSVT.2024.3440517
Longkun Zou;Wanru Zhu;Ke Chen;Lihua Guo;Kailing Guo;Kui Jia;Yaowei Wang
{"title":"Boosting Cross-Domain Point Classification via Distilling Relational Priors From 2D Transformers","authors":"Longkun Zou;Wanru Zhu;Ke Chen;Lihua Guo;Kailing Guo;Kui Jia;Yaowei Wang","doi":"10.1109/TCSVT.2024.3440517","DOIUrl":null,"url":null,"abstract":"Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries, which greatly limits their cross-domain generalization. Recently, the transformer-based models have achieved impressive performance gain in a range of image-based tasks, benefiting from its strong generalization capability and scalability stemming from capturing long range correlation across local patches. Inspired by such successes of visual transformers, we propose a novel Relational Priors Distillation (RPD) method to extract relational priors from the well-trained transformers on massive images, which can significantly empower cross-domain representations with consistent topological priors of objects. To this end, we establish a parameter-frozen pre-trained transformer module shared between 2D teacher and 3D student models, complemented by an online knowledge distillation strategy for semantically regularizing the 3D student model. Furthermore, we introduce a novel self-supervised task centered on reconstructing masked point cloud patches using corresponding masked multi-view image features, thereby empowering the model with incorporating 3D geometric information. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification. The source code of this work is available at \n<uri>https://github.com/zou-longkun/RPD.git</uri>\n.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"12963-12976"},"PeriodicalIF":11.1000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10630838/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic pattern of an object point cloud is determined by its topological configuration of local geometries. Learning discriminative representations can be challenging due to large shape variations of point sets in local regions and incomplete surface in a global perspective, which can be made even more severe in the context of unsupervised domain adaptation (UDA). In specific, traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries, which greatly limits their cross-domain generalization. Recently, the transformer-based models have achieved impressive performance gain in a range of image-based tasks, benefiting from its strong generalization capability and scalability stemming from capturing long range correlation across local patches. Inspired by such successes of visual transformers, we propose a novel Relational Priors Distillation (RPD) method to extract relational priors from the well-trained transformers on massive images, which can significantly empower cross-domain representations with consistent topological priors of objects. To this end, we establish a parameter-frozen pre-trained transformer module shared between 2D teacher and 3D student models, complemented by an online knowledge distillation strategy for semantically regularizing the 3D student model. Furthermore, we introduce a novel self-supervised task centered on reconstructing masked point cloud patches using corresponding masked multi-view image features, thereby empowering the model with incorporating 3D geometric information. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification. The source code of this work is available at https://github.com/zou-longkun/RPD.git .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过从二维变换器中提炼关系先验来提升跨域点分类能力
对象点云的语义模式由其局部几何的拓扑结构决定。由于局部区域和全局不完整表面的点集形状变化很大,学习判别表示可能具有挑战性,这在无监督域适应(UDA)的背景下可能会变得更加严重。传统的三维网络主要关注局部几何细节,忽略了局部几何之间的拓扑结构,极大地限制了网络的跨域泛化。最近,基于变压器的模型在一系列基于图像的任务中取得了令人印象深刻的性能提升,这得益于其强大的泛化能力和可扩展性,这些可扩展性源于捕获局部补丁之间的长距离相关性。受这些成功的视觉变换的启发,我们提出了一种新的关系先验蒸馏(RPD)方法,从大量图像上训练良好的变换中提取关系先验,可以显着增强具有一致对象拓扑先验的跨域表示。为此,我们建立了一个参数冻结的预训练变压器模块,在2D教师和3D学生模型之间共享,并辅以在线知识蒸馏策略,对3D学生模型进行语义正则化。此外,我们引入了一种新的自监督任务,该任务以利用相应的掩模多视图图像特征重建掩模点云补丁为中心,从而增强模型的三维几何信息。在PointDA-10和模拟到真实数据集上的实验验证了所提出的方法在点云分类中始终如一地达到了UDA最先进的性能。该工作的源代码可从https://github.com/zou-longkun/RPD.git获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
期刊最新文献
TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation GSCodec Studio: A Modular Framework for Gaussian Splat Compression Syntax Element Encryption for H.265/HEVC Using Chaotic Map-Based Coefficient Scrambling Scheme Learning Confidence-Aware Prototypes for Weakly-Supervised Video Anomaly Detection Learned Point Cloud Attribute Compression With Cross-Scale Point Transformer and Geometry-Aware Context Prediction Entropy Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1