首页 > 最新文献

IEEE Transactions on Pattern Analysis and Machine Intelligence最新文献

英文 中文
Towards Enhanced Representation Learning for Single-Source Domain Generalization in LiDAR Semantic Segmentation. 激光雷达语义分割中单源域泛化的增强表示学习。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654352
Hyeonseong Kim,Yoonsu Kang,Changgyoon Oh,Kuk-Jin Yoon
With the success of the 3D deep learning models, various perception technologies for autonomous driving have been developed in the LiDAR domain. While these models perform well in the trained source domain, they struggle in unseen domains with a domain gap. In this paper, we propose a representation learning approach for domain generalization in LiDAR semantic segmentation, termed DGLSS++, which is designed to ensure robust performance in both the source domain and unseen domains despite training exclusively on the source domain. Our approach focuses on generalizing from a single source domain, addressing the domain shift caused by variations in LiDAR sensor configurations and scene distributions. To tackle both sparse-to-dense and dense-to-sparse generalization scenarios, we simulate unseen domains by generating sparsely and densely augmented domains. With the augmented domain, we introduce two constraints for generalizable representation learning: generalized masked sparsity invariant feature consistency (GMSIFC) and localized semantic correlation consistency (LSCC). GMSIFC aligns the internal sparse features of the source domain with those of the augmented domain at different sparsity, introducing a novel masking strategy to exclude voxel features associated with multiple inconsistent classes. For LSCC, class prototypes from spatially local regions are constrained to maintain similar correlations across all local regions, regardless of the scene or domain. In addition, we establish standardized training and evaluation protocols utilizing four real-world datasets and implement several baseline methods. Extensive experiments demonstrate our approach outperforms both UDA and DG baselines. The code is available at https://github.com/gzgzys9887/DGLSS.
随着3D深度学习模型的成功,各种自动驾驶感知技术在激光雷达领域得到了发展。虽然这些模型在训练好的源域中表现良好,但它们在具有域间隙的未知域中挣扎。在本文中,我们提出了一种用于激光雷达语义分割领域泛化的表示学习方法,称为dglss++,该方法旨在确保在源域和未见域都具有鲁棒性,尽管只在源域进行训练。我们的方法侧重于从单一源域进行泛化,解决由激光雷达传感器配置和场景分布变化引起的域移位。为了处理稀疏到密集和密集到稀疏的泛化场景,我们通过生成稀疏和密集增强的域来模拟看不见的域。在增广域上,我们引入了广义掩模稀疏不变特征一致性(GMSIFC)和局部语义相关一致性(LSCC)两个约束条件。GMSIFC将源域的内部稀疏特征与增强域的内部稀疏特征在不同稀疏度下进行对齐,引入了一种新的掩蔽策略来排除与多个不一致类相关的体素特征。对于LSCC,来自空间局部区域的类原型被限制在所有局部区域之间保持相似的相关性,而不管场景或领域如何。此外,我们利用四个真实世界的数据集建立了标准化的训练和评估协议,并实施了几种基线方法。大量实验表明,我们的方法优于UDA和DG基线。代码可在https://github.com/gzgzys9887/DGLSS上获得。
{"title":"Towards Enhanced Representation Learning for Single-Source Domain Generalization in LiDAR Semantic Segmentation.","authors":"Hyeonseong Kim,Yoonsu Kang,Changgyoon Oh,Kuk-Jin Yoon","doi":"10.1109/tpami.2026.3654352","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654352","url":null,"abstract":"With the success of the 3D deep learning models, various perception technologies for autonomous driving have been developed in the LiDAR domain. While these models perform well in the trained source domain, they struggle in unseen domains with a domain gap. In this paper, we propose a representation learning approach for domain generalization in LiDAR semantic segmentation, termed DGLSS++, which is designed to ensure robust performance in both the source domain and unseen domains despite training exclusively on the source domain. Our approach focuses on generalizing from a single source domain, addressing the domain shift caused by variations in LiDAR sensor configurations and scene distributions. To tackle both sparse-to-dense and dense-to-sparse generalization scenarios, we simulate unseen domains by generating sparsely and densely augmented domains. With the augmented domain, we introduce two constraints for generalizable representation learning: generalized masked sparsity invariant feature consistency (GMSIFC) and localized semantic correlation consistency (LSCC). GMSIFC aligns the internal sparse features of the source domain with those of the augmented domain at different sparsity, introducing a novel masking strategy to exclude voxel features associated with multiple inconsistent classes. For LSCC, class prototypes from spatially local regions are constrained to maintain similar correlations across all local regions, regardless of the scene or domain. In addition, we establish standardized training and evaluation protocols utilizing four real-world datasets and implement several baseline methods. Extensive experiments demonstrate our approach outperforms both UDA and DG baselines. The code is available at https://github.com/gzgzys9887/DGLSS.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"100 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BlindU: Blind Machine Unlearning without Revealing Erasing Data 盲机:不显示擦除数据的盲机学习
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654093
Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu
{"title":"BlindU: Blind Machine Unlearning without Revealing Erasing Data","authors":"Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu","doi":"10.1109/tpami.2026.3654093","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654093","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"38 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Continual Forgetting for Pre-trained Vision Models 预训练视觉模型的实际持续遗忘
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654115
Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang
{"title":"Practical Continual Forgetting for Pre-trained Vision Models","authors":"Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang","doi":"10.1109/tpami.2026.3654115","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654115","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"26 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation. 具有时间步长和空间动态的扩散变压器,用于高效的视觉生成。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-15 DOI: 10.1109/tpami.2026.3654201
Wangbo Zhao,Yizeng Han,Jiasheng Tang,Kai Wang,Hao Luo,Yibing Song,Gao Huang,Fan Wang,Yang You
Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior perfor mance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To overcome this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. TDW and SDT can be seamlessly integrated into DiT and significantly accelerate the generation process. Building on these designs, we present an extended version, DyDiT++, with improvements in three key aspects. First, it extends the generation mechanism of DyDiT beyond diffusion to flow matching, demon strating that our method can also accelerate flow-matching based generation, enhancing its versatility. Furthermore, we enhance DyDiT to tackle more complex visual generation tasks, including video generation and text-to-image generation, thereby broadening its real-world applications. Finally, to address the high cost of full fine-tuning and democratize technology access, we investigate the feasibility of training DyDiT in a parameter efficient manner and introduce timestep-based dynamic LoRA (TD-LoRA). Extensive experiments on diverse visual generation models, including DiT, SiT, Latte, and FLUX, demonstrate the effectiveness of DyDiT++. Remarkably, with <3% additional f ine-tuning iterations, our approach reduces the FLOPs of DiT XL by 51%, yielding 1.73× realistic speedup on hardware, and achieves a competitive FID score of 2.07 on ImageNet. The code is available at https://github.com/alibaba-damo-academy/DyDiT.
扩散变压器(Diffusion Transformer, DiT)是一种新兴的用于视觉生成的扩散模型,具有优异的性能,但其计算成本较高。我们的研究表明,这些成本主要源于静态推理范式,这不可避免地在某些扩散时间步长和空间区域引入了冗余计算。为了克服这种低效率,我们提出了动态扩散变压器(DyDiT),一种沿着时间步长和空间维度动态调整其计算的架构。具体来说,我们引入了一种时间步长动态宽度(TDW)方法,该方法根据生成的时间步长来适应模型宽度。此外,我们设计了一种空间智能动态令牌(SDT)策略,以避免在不必要的空间位置进行冗余计算。TDW和SDT可以无缝集成到DiT中,大大加快了生成过程。在这些设计的基础上,我们提出了一个扩展版本dyd++,并在三个关键方面进行了改进。首先,将DyDiT的生成机制从扩散扩展到流匹配,表明该方法还可以加速基于流匹配的生成,增强了其通用性。此外,我们增强了DyDiT来处理更复杂的视觉生成任务,包括视频生成和文本到图像的生成,从而扩展了它在现实世界中的应用。最后,为了解决全微调的高成本和技术获取的民主化,我们研究了以参数有效的方式训练DyDiT的可行性,并引入了基于时间步长的动态LoRA (TD-LoRA)。在DiT、SiT、Latte和FLUX等不同的视觉生成模型上进行的大量实验证明了dydit++的有效性。值得注意的是,通过<3%的额外微调迭代,我们的方法将DiT XL的FLOPs减少了51%,在硬件上产生了1.73倍的实际加速,并在ImageNet上实现了具有竞争力的FID得分2.07。代码可在https://github.com/alibaba-damo-academy/DyDiT上获得。
{"title":"DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation.","authors":"Wangbo Zhao,Yizeng Han,Jiasheng Tang,Kai Wang,Hao Luo,Yibing Song,Gao Huang,Fan Wang,Yang You","doi":"10.1109/tpami.2026.3654201","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654201","url":null,"abstract":"Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior perfor mance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To overcome this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions. Specifically, we introduce a Timestep-wise Dynamic Width (TDW) approach that adapts model width conditioned on the generation timesteps. In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations. TDW and SDT can be seamlessly integrated into DiT and significantly accelerate the generation process. Building on these designs, we present an extended version, DyDiT++, with improvements in three key aspects. First, it extends the generation mechanism of DyDiT beyond diffusion to flow matching, demon strating that our method can also accelerate flow-matching based generation, enhancing its versatility. Furthermore, we enhance DyDiT to tackle more complex visual generation tasks, including video generation and text-to-image generation, thereby broadening its real-world applications. Finally, to address the high cost of full fine-tuning and democratize technology access, we investigate the feasibility of training DyDiT in a parameter efficient manner and introduce timestep-based dynamic LoRA (TD-LoRA). Extensive experiments on diverse visual generation models, including DiT, SiT, Latte, and FLUX, demonstrate the effectiveness of DyDiT++. Remarkably, with <3% additional f ine-tuning iterations, our approach reduces the FLOPs of DiT XL by 51%, yielding 1.73× realistic speedup on hardware, and achieves a competitive FID score of 2.07 on ImageNet. The code is available at https://github.com/alibaba-damo-academy/DyDiT.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"47 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective 自监督人工智能生成的图像检测:相机元数据视角
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3654274
Nan Zhong, Mian Zou, Yiran Xu, Zhenxing Qian, Xinpeng Zhang, Baoyuan Wu, Kede Ma
{"title":"Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective","authors":"Nan Zhong, Mian Zou, Yiran Xu, Zhenxing Qian, Xinpeng Zhang, Baoyuan Wu, Kede Ma","doi":"10.1109/tpami.2026.3654274","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654274","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"34 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving 基于自我意识扩展的端到端自动驾驶强化细化
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3653866
Haochen Liu, Tianyu Li, Haohan Yang, Li Chen, Caojun Wang, Ke Guo, Haochen Tian, Hongchen Li, Hongyang Li, Chen Lv
{"title":"Reinforced Refinement with Self-Aware Expansion for End-to-End Autonomous Driving","authors":"Haochen Liu, Tianyu Li, Haohan Yang, Li Chen, Caojun Wang, Ke Guo, Haochen Tian, Hongchen Li, Hongyang Li, Chen Lv","doi":"10.1109/tpami.2026.3653866","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653866","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"30 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation 基于判别和扩散的生成学习的融合:边界细化遥感语义分割
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3654243
Hao Wang, Keyan Hu, Xin Guo, Haifeng Li, Chao Tao
{"title":"A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation","authors":"Hao Wang, Keyan Hu, Xin Guo, Haifeng Li, Chao Tao","doi":"10.1109/tpami.2026.3654243","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654243","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"5 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145972414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistency-Aware Spot-Guided Transformer for Accurate and Versatile Point Cloud Registration. 一致性感知点导向变压器精确和通用点云注册。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3653989
Renlang Huang,Li Chai,Yufan Tang,Zhoujian Li,Jiming Chen,Liang Li
Deep learning-based feature matching has showcased great superiority for point cloud registration. While coarse-to-fine matching architectures are prevalent, they typically perform sparse and geometrically inconsistent coarse matching. This forces the subsequent fine matching to rely on computationally expensive optimal transport and hypothesis-and-selection procedures to resolve inconsistencies, leading to inefficiency and poor scalability for large-scale real-time applications. In this paper, we design a consistency-aware spot-guided Transformer (CAST) to enhance the coarse matching by explicitly utilizing geometric consistency via two key sparse attention mechanisms. First, our consistency-aware self-attention selectively computes intra-point-cloud attention to a sparse subset of points with globally consistent correspondences, enabling other points to derive discriminative features through their relationships with these anchors while propagating global consistency for robust correspondence reasoning. Second, our spot-guided cross-attention restricts cross-point-cloud attention to dynamically defined "spots"-the union of correspondence neighborhoods of a query's neighbors in the other point cloud, which are most likely to cover the true correspondence of the query ensured by local consistency, eliminating interference from similar but irrelevant regions. Furthermore, we design a lightweight local attention-based fine matching module to precisely predict dense correspondences and estimate the transformation. Extensive experiments on both outdoor LiDAR datasets and indoor RGB-D camera datasets demonstrate that our method achieves state-of-the-art accuracy, efficiency, and robustness. Besides, our method showcases superior generalization ability on our newly constructed challenging relocalization and loop closing benchmarks in unseen domains. Our code and models are available at https://github.com/RenlangHuang/CASTv2.
基于深度学习的特征匹配在点云配准方面显示出很大的优势。虽然粗到细的匹配架构很普遍,但它们通常执行稀疏且几何不一致的粗匹配。这迫使后续的精细匹配依赖于计算上昂贵的最优传输和假设-选择过程来解决不一致性,导致大规模实时应用的效率低下和可扩展性差。在本文中,我们设计了一个一致性感知的点引导变压器(CAST),通过两个关键的稀疏注意机制显式地利用几何一致性来增强粗匹配。首先,我们的一致性感知自我注意选择性地计算点云内对具有全局一致对应的点的稀疏子集的注意,使其他点能够通过它们与这些锚的关系获得判别特征,同时传播全局一致性以进行稳健的对应推理。其次,我们的点引导交叉注意将交叉点云的注意限制在动态定义的“点”上——一个查询的邻居在另一个点云中的对应邻域的联合,这些邻域最有可能覆盖由局部一致性保证的查询的真实对应,消除了来自相似但不相关区域的干扰。此外,我们设计了一个轻量级的基于局部关注的精细匹配模块,以精确预测密集对应并估计转换。在室外激光雷达数据集和室内RGB-D相机数据集上进行的大量实验表明,我们的方法达到了最先进的精度、效率和鲁棒性。此外,我们的方法在我们新构建的具有挑战性的重新定位和闭环基准上展示了卓越的泛化能力。我们的代码和模型可在https://github.com/RenlangHuang/CASTv2上获得。
{"title":"Consistency-Aware Spot-Guided Transformer for Accurate and Versatile Point Cloud Registration.","authors":"Renlang Huang,Li Chai,Yufan Tang,Zhoujian Li,Jiming Chen,Liang Li","doi":"10.1109/tpami.2026.3653989","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653989","url":null,"abstract":"Deep learning-based feature matching has showcased great superiority for point cloud registration. While coarse-to-fine matching architectures are prevalent, they typically perform sparse and geometrically inconsistent coarse matching. This forces the subsequent fine matching to rely on computationally expensive optimal transport and hypothesis-and-selection procedures to resolve inconsistencies, leading to inefficiency and poor scalability for large-scale real-time applications. In this paper, we design a consistency-aware spot-guided Transformer (CAST) to enhance the coarse matching by explicitly utilizing geometric consistency via two key sparse attention mechanisms. First, our consistency-aware self-attention selectively computes intra-point-cloud attention to a sparse subset of points with globally consistent correspondences, enabling other points to derive discriminative features through their relationships with these anchors while propagating global consistency for robust correspondence reasoning. Second, our spot-guided cross-attention restricts cross-point-cloud attention to dynamically defined \"spots\"-the union of correspondence neighborhoods of a query's neighbors in the other point cloud, which are most likely to cover the true correspondence of the query ensured by local consistency, eliminating interference from similar but irrelevant regions. Furthermore, we design a lightweight local attention-based fine matching module to precisely predict dense correspondences and estimate the transformation. Extensive experiments on both outdoor LiDAR datasets and indoor RGB-D camera datasets demonstrate that our method achieves state-of-the-art accuracy, efficiency, and robustness. Besides, our method showcases superior generalization ability on our newly constructed challenging relocalization and loop closing benchmarks in unseen domains. Our code and models are available at https://github.com/RenlangHuang/CASTv2.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"50 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLeak: Multi-Target Privacy Stealing Attack against Split Learning. SLeak:针对分裂学习的多目标隐私窃取攻击。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3654092
Xiaoyang Xu,Wenzhe Yi,Juan Wang,Hongxin Hu,Mengda Yang,Ziang Li,Yong Zhuang,Yaxin Liu,Mang Ye
Split Learning (SL) is a distributed learning framework that has gained popularity for its privacy-preserving nature and low computational demands. However, recent studies have the potential that a server adversary to carry out inference attacks, compromising the privacy of victim clients. Nevertheless, upon re-evaluating prior studies, we found that existing methods rely on overly strong assumptions to enhance their performance, resulting in a significant decline in effectiveness under more realistic scenarios. In this work, we provide new insights into the inherent vulnerabilities of SL. Specifically, we discover that both the smashed data and the server model contain the client's representation preference, which the server adversary can exploit to build a substitute client that approximates the target client's unique feature extraction behavior. With a well-trained substitute client, the server can perfectly steal the target client's functionality, training data, and labels. Building on this observation, we introduce Split Leakage (SLeak), a new threat that targets multiple privacy stealing objectives against SL. Notably, SLeak does not depend on strong privacy priors and only requires partial same-domain auxiliary public data to conduct the attacks. Experimental results on diverse datasets and target models show that SLeak surpasses the state-of-the-art method across multiple metrics. Moreover, ablation studies further confirm its robustness and applicability under various scenarios and assumptions.
Split Learning (SL)是一种分布式学习框架,因其保护隐私的特性和较低的计算需求而受到欢迎。然而,最近的研究表明,服务器攻击者有可能进行推理攻击,从而损害受害者客户端的隐私。然而,在重新评估之前的研究后,我们发现现有的方法依赖于过于强大的假设来提高其性能,导致在更现实的场景下有效性显著下降。在这项工作中,我们对SL的固有漏洞提供了新的见解。具体来说,我们发现被破坏的数据和服务器模型都包含客户端的表示偏好,服务器攻击者可以利用这些偏好来构建一个替代客户端,该客户端近似于目标客户端的独特特征提取行为。使用训练有素的替代客户机,服务器可以完美地窃取目标客户机的功能、训练数据和标签。在此观察的基础上,我们引入了分裂泄漏(SLeak),这是一种针对SL的多个隐私窃取目标的新威胁。值得注意的是,SLeak不依赖于强隐私先验,只需要部分相同域的辅助公共数据来进行攻击。在不同数据集和目标模型上的实验结果表明,SLeak在多个指标上都优于最先进的方法。此外,烧蚀研究进一步证实了该方法在各种情景和假设下的稳健性和适用性。
{"title":"SLeak: Multi-Target Privacy Stealing Attack against Split Learning.","authors":"Xiaoyang Xu,Wenzhe Yi,Juan Wang,Hongxin Hu,Mengda Yang,Ziang Li,Yong Zhuang,Yaxin Liu,Mang Ye","doi":"10.1109/tpami.2026.3654092","DOIUrl":"https://doi.org/10.1109/tpami.2026.3654092","url":null,"abstract":"Split Learning (SL) is a distributed learning framework that has gained popularity for its privacy-preserving nature and low computational demands. However, recent studies have the potential that a server adversary to carry out inference attacks, compromising the privacy of victim clients. Nevertheless, upon re-evaluating prior studies, we found that existing methods rely on overly strong assumptions to enhance their performance, resulting in a significant decline in effectiveness under more realistic scenarios. In this work, we provide new insights into the inherent vulnerabilities of SL. Specifically, we discover that both the smashed data and the server model contain the client's representation preference, which the server adversary can exploit to build a substitute client that approximates the target client's unique feature extraction behavior. With a well-trained substitute client, the server can perfectly steal the target client's functionality, training data, and labels. Building on this observation, we introduce Split Leakage (SLeak), a new threat that targets multiple privacy stealing objectives against SL. Notably, SLeak does not depend on strong privacy priors and only requires partial same-domain auxiliary public data to conduct the attacks. Experimental results on diverse datasets and target models show that SLeak surpasses the state-of-the-art method across multiple metrics. Moreover, ablation studies further confirm its robustness and applicability under various scenarios and assumptions.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"20 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VRP-UDF: Towards Unbiased Learning of Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors. VRP-UDF:基于体绘制先验的多视图图像无符号距离函数的无偏学习。
IF 23.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1109/tpami.2026.3653901
Wenyuan Zhang,Chunsheng Wang,Kanle Shi,Yu-Shen Liu,Zhizhong Han
Unsigned distance functions (UDFs) have been a vital representation for open surfaces. With different differentiable renderers, current methods are able to train neural networks to infer a UDF by minimizing the rendering errors with the UDF to the multi-view ground truth. However, these differentiable renderers are mainly handcrafted, which makes them either biased on ray-surface intersections, or sensitive to unsigned distance outliers, or not scalable to large scenes. To resolve these issues, we present a novel differentiable renderer to infer UDFs more accurately. Instead of using handcrafted equations, our differentiable renderer is a neural network which is pre-trained in a data-driven manner. It learns how to render unsigned distances into depth images, leading to a prior knowledge, dubbed volume rendering priors. To infer a UDF for an unseen scene from multiple RGB images, we generalize the learned volume rendering priors to map inferred unsigned distances in alpha blending for RGB image rendering. To reduce the bias of sampling in UDF inference, we utilize an auxiliary point sampling prior as an indicator of ray-surface intersection, and propose novel schemes towards more accurate and uniform sampling near the zero-level sets. We also propose a new strategy that leverages our pretrained volume rendering prior to serve as a general surface refiner, which can be integrated with various Gaussian reconstruction methods to optimize the Gaussian distributions and refine geometric details. Our results show that the learned volume rendering prior is unbiased, robust, scalable, 3D aware, and more importantly, easy to learn. Further experiments show that the volume rendering prior is also a general strategy to enhance other neural implicit representations such as signed distance function and occupancy. We evaluate our method on both widely used benchmarks and real scenes, and report superior performance over the state-of-the-art methods.
无符号距离函数(udf)一直是开放曲面的重要表示形式。使用不同的可微分渲染器,目前的方法能够训练神经网络通过将UDF的渲染错误最小化到多视图的真实情况来推断UDF。然而,这些可微分渲染器主要是手工制作的,这使得它们在光线表面交叉点上有偏差,或者对无符号距离异常值敏感,或者不能扩展到大型场景。为了解决这些问题,我们提出了一种新的可微分呈现器来更准确地推断udf。我们的可微渲染器不是使用手工制作的方程,而是一个以数据驱动的方式预训练的神经网络。它学习如何将无符号距离渲染成深度图像,从而获得先验知识,称为体渲染先验。为了从多个RGB图像中推断出未见场景的UDF,我们将学习到的体渲染先验推广到映射RGB图像渲染中alpha混合中推断的无符号距离。为了减少UDF推理中采样的偏差,我们利用辅助点采样先验作为射线-表面相交的指示,并提出了在零水平集附近更精确和均匀采样的新方案。我们还提出了一种新的策略,利用我们预先训练的体绘制作为一般的表面细化器,它可以与各种高斯重建方法集成,以优化高斯分布和细化几何细节。我们的研究结果表明,学习到的体绘制先验是无偏的、鲁棒的、可扩展的、3D感知的,更重要的是,易于学习。进一步的实验表明,体积绘制先验也是增强其他神经隐式表征(如符号距离函数和占用)的一般策略。我们在广泛使用的基准测试和真实场景中评估了我们的方法,并报告了优于最先进方法的性能。
{"title":"VRP-UDF: Towards Unbiased Learning of Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors.","authors":"Wenyuan Zhang,Chunsheng Wang,Kanle Shi,Yu-Shen Liu,Zhizhong Han","doi":"10.1109/tpami.2026.3653901","DOIUrl":"https://doi.org/10.1109/tpami.2026.3653901","url":null,"abstract":"Unsigned distance functions (UDFs) have been a vital representation for open surfaces. With different differentiable renderers, current methods are able to train neural networks to infer a UDF by minimizing the rendering errors with the UDF to the multi-view ground truth. However, these differentiable renderers are mainly handcrafted, which makes them either biased on ray-surface intersections, or sensitive to unsigned distance outliers, or not scalable to large scenes. To resolve these issues, we present a novel differentiable renderer to infer UDFs more accurately. Instead of using handcrafted equations, our differentiable renderer is a neural network which is pre-trained in a data-driven manner. It learns how to render unsigned distances into depth images, leading to a prior knowledge, dubbed volume rendering priors. To infer a UDF for an unseen scene from multiple RGB images, we generalize the learned volume rendering priors to map inferred unsigned distances in alpha blending for RGB image rendering. To reduce the bias of sampling in UDF inference, we utilize an auxiliary point sampling prior as an indicator of ray-surface intersection, and propose novel schemes towards more accurate and uniform sampling near the zero-level sets. We also propose a new strategy that leverages our pretrained volume rendering prior to serve as a general surface refiner, which can be integrated with various Gaussian reconstruction methods to optimize the Gaussian distributions and refine geometric details. Our results show that the learned volume rendering prior is unbiased, robust, scalable, 3D aware, and more importantly, easy to learn. Further experiments show that the volume rendering prior is also a general strategy to enhance other neural implicit representations such as signed distance function and occupancy. We evaluate our method on both widely used benchmarks and real scenes, and report superior performance over the state-of-the-art methods.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"60 1","pages":""},"PeriodicalIF":23.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Pattern Analysis and Machine Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1